> 📥 The full field report is available as a PDF. Download it at the top of the page — with a data traffic light (cloud model vs. local model), breakdown anatomy and a 5-step starter guide.
> 🧭 Part 4 of 4 in the series AI agents anno 2026. Four tools all called "AI agent": OpenClaw, Claude Cowork, Perplexity and Hermes. Today the opposite extreme of Perplexity: open-source, local, persistent — and built with sandboxing close to the core.
1. The morning the agent walled itself in
Last weekend I asked my AI agent to do a simple thing. Renew an access token expiring in two days. Ten minutes later it couldn't answer anything at all.
Hermes interpreted "renew the token automatically" as "rewrite your own source code". It edited two of its own core files, added a self-renewing token function, and then restarted its own engine. The new code didn't work. Every call to the language model failed. And because it had restarted itself into that state, it couldn't even read its own logs anymore.
It had sawn off the branch it was sitting on. The token, by the way, hadn't even expired — it had two days left.
Recovery took five minutes once I understood what had happened: roll the two files back, restart. But the point sticks. An agent that can both modify its own code and restart itself can lock itself out.
2. What Hermes actually is
Hermes Agent is built by Nous Research. Open-source, free, installed on your own machine or server. Where Cowork is an assistant inside the Claude app, and Perplexity Personal Computer is a research specialist, Hermes is an agent that lives with you. It runs in the background. It's there when you're not.
- It remembers across sessions.
- It runs by itself. You talk to it from Telegram, from your phone, while it works on your Mac.
- It learns. When it solves something, it can save the approach as a skill to reuse.
You pick the model. I run it with OpenAI Codex as provider and GPT-5.5 as the model, via my ChatGPT Plus subscription. You can also run it on Gemini, DeepSeek or a local model — and that matters more than it sounds.
3. What works: my own test ran flawlessly
Daily digest. I gave Hermes the job of producing a daily digest. Every morning at 7:45 it gathers the most important news on generative AI and pushes it to my Telegram. It has run without a single failure since I set it up. Roughly the same result as OpenClaw, with less effort. It runs on my own machine, no subscription beyond ChatGPT Plus at $25/month, plus a little OpenRouter usage for the backup model.
Content in my own voice. I dropped my own skills into it — the same ones I use in Claude Cowork. So I can ask it for a first draft of a LinkedIn post from my phone, between two meetings, and get something back that sounds close to "me".
Scheduling is where Hermes is strongest. Describe the task in plain language, set an interval, choose where to deliver. It runs.
4. The possibilities: what others use it for
My one working routine is just the tip. When I dug into what people actually use Hermes for, the list got long: dev/ops routines, news and market monitoring, change-only alerts, local language and industry packs, research in parallel tracks, and tools the agent uses to maintain its own skills. Most are self-reported by the community, not measured benchmarks — take them as direction, not proof.
5. What it's not so good at: the honest part
Hermes' strength and its danger are the same property. It has access to your machine, it can edit its own files, and it can restart itself. Without you watching.
After my crash there were over a hundred modified files lying in its own code. And it requires technical hands. This is not a tool you hand to your non-technical colleague.
How I fixed the crash: I built an "AI wingman" in Claude Cowork — a separate Cowork project — and gave it access to Hermes' source code on my Mac.
Afterwards I put a guardrail up. I had Claude Cowork write a rule into the agent's own instructions: it may propose changes to its own code and propose a restart — but it may not do it itself without a human yes first. Human-in-the-loop.
6. Does data leave your machine?
"Runs locally" sounds like "data doesn't leave my machine". It doesn't necessarily.
Hermes runs locally — the agent, its memory, its files, its engine. That's real. But the model isn't necessarily local. When Hermes thinks, it calls a language model. For me that's GPT-5.5 at OpenAI, with Claude Sonnet as backup via OpenRouter. The prompts and file contents I give the agent are sent to the model provider.
If you want data to actually stay on the machine, run Hermes with a local model — for example via Ollama. The price is quality and speed.
- 🟢 Public material, sector research, your own notes, content drafts: fine with a cloud model.
- 🔴 Client data, personal data, confidential documents: either a local model with a deliberate risk assessment, or keep it out.
7. Who it's for
Hermes is for you if you're technically comfortable and want an agent that works when you don't. The developer with nightly routines on their own server. The power user who wants to control their own infrastructure and their own model.
Hermes is not for your non-technical colleague. That's Cowork or Perplexity. And it's not for sensitive, regulated data with a cloud model — not without a local model and a real risk assessment first.
8. How to get started
1. Run it somewhere you can spare — a VPS, a spare Mac, an old machine.
2. Turn approval on. Don't run it in "just do it" mode before you trust it.
3. Give it one channel and one purpose. Start with Telegram and one task.
4. Pick the model deliberately. Cloud for quality on public material. Local for sensitive.
5. Read after. Always. The 5% mistake is always waiting somewhere.
9. Four tools, one responsibility
This was the last in the series. Four things we all call "AI agent", and they're barely the same animal.
- OpenClaw was the swarm.
- Cowork was the supervised assistant.
- Perplexity was the research specialist.
- Hermes is the autonomous one. Local, persistent, working when you don't.
What ties them together is not the technology. It's your judgment. The agent that impresses you 95% of the time is the same one that walled itself in over a token that wasn't even expired. Responsibility for the last 5% doesn't move. It stays with you.
10. Maybe it's me who hasn't had enough time
Most of what I described as possibilities I haven't gotten running myself yet. The one routine I set up properly runs flawlessly. The rest requires more than I've given it so far. And I don't think Hermes is the problem — I think it's me, not having had the time to make it dance.
That might be the most honest conclusion to the whole series. The ceiling is rarely the tool. It's how much time we ourselves have to experiment and learn.
Stefano Vincenti — GenAI strategist and architect. 25 years in IT and digital transformation. Co-founder of BotTellMe. External lecturer at ITU and DIS Copenhagen. Partner at TryZone. Subscribe to the newsletter and get the next issues directly.