What an AI “agent” actually does for a working researcher — and where it still fails

The shift is two words: access and autonomy. Everything useful and everything dangerous comes from there...

Jun 10, 2026

Every AI vendor now sells “agents.” The word has been stretched so far that it means almost nothing. So let me describe what actually changed on my screen this year, in plain terms, and then be honest about where it still falls over.

I am a CEO, a pathologist and a researcher, not a programmer. I have no stake in any of these companies. What follows is one user’s report.

A chatbot and an agent are not the same thing

For two years, I used AI the way most people do. I opened a chat window, pasted in some text or asked a question, and copied the answer back out. Useful, but it was a conversation. Every step was mine. The model never touched anything I owned. Almost like a “Google, plus a little extra”.

An agent is different in one concrete way: it has access to some of my material, and it takes steps on its own. Instead of pasting a transcript into a window, I point it at the folder where my notes live, and it reads them. Instead of asking it to “write an email,” it can open a draft in my mailbox. Instead of one answer per prompt, it does five or six things in a row: search, read, write, check, and comes back when it is done.

That is the whole shift. Not “smarter answers.” Access plus a sequence of actions. Everything good and everything dangerous about agents comes from those two words.

What it is good at

I will give a few tasks from my own week, because one concrete example is worth more than three claims.

Meeting notes. I record a research or trial meeting, the agent reads the transcript, and it produces a structured report: who said what, the decisions, the to-do list. It corrects the obvious transcription errors — some names came out wrong every single time, and it now fixes that without being told. It knows the entire team, their role, and their correct name. A job that used to eat an hour after every meeting is now a five-minute review of something already written.

Finding literature with the sources attached. When I want to know what the evidence says on a narrow question, the agent searches, reads the papers, and writes a short synthesis with the references listed. The key point is the references. I do not want a confident summary; I want to be able to click through to the original and check it myself. More on that below, because this is also where it fails.

Keeping my own notes in order. I keep everything in one large set of files I have built over the years. The agent can scan it all, find the note that contradicts another note, and propose links between related pieces. Make complete trees or mind maps with links that it found itself. Dull, slow work for a human; quick for a machine that does not get bored.

Most futuristically, perhaps, it can read my calendar, my email, my to-do list, my small notes, and create a daily briefing. It can even complement it with a list of news site headlines, specifically looking for things that interest me, because it knows me.... This starts to become a fully functional robot, but I will cover that subject in a later post.

None of this is revolutionary; it is all available today. It is administrative weight lifted off a working week. That is exactly why it is worth taking seriously.

Where it still fails

The same two properties — access and autonomy — are the problem.

It likes to invent things confidently. I have had an agent hand me a literature summary with a reference that looked perfect: plausible authors, plausible journal, plausible year. The paper did not exist. This is not rare, and it is not a bug you can switch off. A model that writes fluent text will, some of the time, write a fluent citation for a paper nobody wrote. If you are a researcher, this is the single most important thing to understand before you trust any output. Fluency is not accuracy.

It is wrong without flinching. A junior colleague who is unsure tells you they are unsure. The model gives an incorrect answer in exactly the same confident tone as a correct one. There is no tremor in the voice. You only catch the error if you already know enough to catch it, which is an uncomfortable thing to depend on.

It misses the context a person would have. It does not know that one investigator is touchy about a particular topic, or that a result is too preliminary to put in writing. It optimizes for the task you gave it, not for the judgment around the task.

Autonomy cuts both ways. The same feature that lets it draft an email for me is the feature that, unsupervised, could send one. The capability that saves time is the capability that does damage if you are not watching. Therefore, be very cautious with all permissions you give to your agent.

The real work becomes checking

Here is the part that nobody selling agents wants to dwell on. When the machine does the typing, verification becomes your job. The bottleneck moves from producing text to confirming it.

So I work with a few hard rules, and I would suggest that any colleague start with the same ones:

Nothing leaves under my name without my approval. No email sent, no message posted, no document submitted, until I have read it. The agent drafts; I decide.
No patient-identifying data, ever. This is not negotiable, and no convenience is worth crossing it.
Every factual claim goes back to a source I can open. If it cannot give me the reference, I treat the claim as unverified. If it gives me one, I check that the reference is real.

You can tell your agent that these are hard rules for you, “he” will listen....

These rules sound restrictive. They are not. They are what makes the thing usable at all. An assistant you cannot check is not an assistant; it is a liability with good manners.

What it costs

To be transparent, because this is the AI-tools corner of this Substack: the assistant subscriptions that make this kind of work possible on a significant scale run roughly at least twenty dollars a month at the entry level, with realistic tiers at one hundred to two hundred. I pay for more than one, partly for the work and partly because comparing them is how I learn what each is good for. Whether that is worth it depends entirely on how much of your week is the administrative weight I described — for me, it clears the cost easily; for a colleague whose week looks different, it might not.

Bottom line for a colleague still at the sidelines

If you have only ever used a chat window, an agent is worth trying, with three conditions. Give it the dull, high-volume work first — notes, sorting, first-pass literature. Keep it away from anything sensitive until you trust your own checking habits. And never relax the rule: verify before you act (or before you trust).

The promise is not that it thinks for you. It is that it clears the desk so you have time to think. That is a smaller promise than the marketing makes, and a more honest one.

I run a setup like this over my own notes, calendar, and mail every day. How it is actually wired together is a longer story, and one I will tell another time.

Disclosure: I pay for several AI assistant subscriptions out of my own pocket and have no financial relationship with any of the companies mentioned. These are my own observations from daily use, not advice.

The Long Look

Discussion about this post

Ready for more?