AI agents in practice: Field report from my own agent swarm

Part 1 of a four-part series on AI agents anno 2026. What works and what doesn't — based on four agents running on an old gaming PC that deliver three news stories to my Telegram every morning at 7:30. No demos. No hype.

> 📥 The full field report is available as a PDF. Download it at the top — with tables, demo-vs-production overview and next steps. (PDF in Danish.)

> 🧭 Part 1 of 4 in the series AI agents anno 2026. Four tools that are all called AI agents but solve very different problems: OpenClaw, Claude Cowork, Perplexity and Hermes. We start with OpenClaw — my entire agent swarm runs on it.

Every morning at 7:30, three news stories land in my Telegram. I didn't write them. I didn't order them. Four AI agents on an old gaming PC researched, fact-checked and finished writing them while I slept.

It sounds like a demo. It isn't. It has been running every single morning for months without me touching anything.

A huge amount is being said about AI agents right now. Half is hype: agents take over everything before Christmas. The other half is fear: agents go rogue and hack your systems. Both are too easy.

Here's what I have actually seen. From my own agent swarm. From two days on stage with project leaders building their first agents. And from the companies I advise day to day.

My own setup

Four agents. An orchestrator, a researcher, a validator and a copywriter. An old gaming PC. A cron job. Output to Telegram every morning.

Agent	Role	What it does
StefOpenClawAI	Orchestrator	Runs the swarm. Delegates and aggregates. Reviews the final check before anything is delivered.
Researcher	Researcher	Scans the web for the most important AI news in the last 24 hours. Must find everything.
Validator	Skeptic	Devil's advocate. Throws out the thin and the misleading.
Copywriter	Copywriter	Writes up the three best stories in exactly the style I want. Delivers to Telegram.

It all runs on OpenClaw, triggered by a cron job every morning. The underlying models are ordinary and cheap: OpenAI Codex on a fixed token price in my $20/month Plus subscription. GPT-5.5 where it needs muscle, mini models where it doesn't.

StefOpenClawAI builds the subagents itself. What I learned: get good at explaining what you want — the goal. Let the agents define how themselves. They are best at writing the system prompts for the subagents they need.

What works is boring

An agent network that just does the same thing, properly, every single morning, is harder than it sounds. And that's exactly where the value sits.

1. Role separation

Each agent has one job. The researcher must find everything. The validator must doubt everything. The copywriter must write well. When I initially let one agent do it all, the output wasn't good enough. With clear boundaries, output improved dramatically.

2. Orchestrator as quality gate

StefOpenClawAI doesn't just delegate and assemble. It reviews. The final check catches the weak story and keeps the standard even from day to day.

3. A built-in skeptic

The agent people most often skip is the validator. It feels like wasted time — it doesn't produce anything itself. But it's precisely the one that decides whether you can trust the output or have to check it yourself every morning.

The most important quality signal: no surprises. Stability is harder than variation. And it's stability that creates value in production.

What doesn't work yet

Five edges that aren't written about honestly enough:

The autonomous agent that handles it all. Lives in demos and keynotes, not in production. What works is the opposite: narrow, well-defined tasks with a clear start and end.
Security. I deliberately run my swarm on a separate gaming PC. An estimated ~20 % of plugins in OpenClaw's ClawHub library are malicious. McKinsey's own internal AI platform Lilli was hacked by an AI agent.
Governance at scale. Free agent frameworks have no real audit trails, no role management that holds when many people use the same system, no compliance framing worth mentioning. For solo on your own hardware: fine. For a large organization rolling out to a team: wait.
Agents without a human in the loop. My validator catches a lot. Not everything. Judgment is exactly what the machine is worst at.
The agent as a project. "We're building an agent" gets treated as a product decision with a steering committee and deadline. An agent is a way to get one specific task done. Treat it like a large software project and you've made it too heavy before it even starts.

The core of how I work: The agent does the work. The human owns the judgment. A quick check at the end is necessary. Constant monitoring is a sign that the task wasn't cut sharply enough.

Demo vs. production

Demo	Production
Impresses once in front of an audience.	Runs every morning for months without failing.
Measured on wow.	Measured on no surprises.
The API key works.	The API key suddenly expires on a Thursday evening.
Framework is freshly installed.	You have to upgrade because of security holes.
You clap and move on.	You troubleshoot an evening or weekend through.

What saved me was one simple thing. I set up a Claude in a separate project as a technical sparring partner, fed it the documentation so it became an OpenClaw expert. The pairing — the agent that builds plus an AI that knows the framework cold — was the real accelerator.

Start with the task, not with an agent

Solo or small business: build now. A working agent network on your own separate machine costs almost nothing compared to another SaaS subscription. You learn more in a weekend than ten webinars give you. Keep it on separate hardware, be strict about permissions, don't connect it to anything you can't afford to expose.

Large organization: experiment, don't roll out yet. Leave the immature frameworks alone when it comes to team rollouts. But don't wait to learn. Find one process you repeat every week with clear input and clear output. That is your first agent.

Example — Royal Unibrew. Not an AI startup, but a brewing group with roots back to 1856, running five AI agents in daily production. Each with a name, a face and a role. When the agents got names and faces, internal usage rose four-fold. That's behavioral design, not magic.

Capability gap

The bottleneck sits with the humans. Can your people cut a task so cleanly that an agent can solve it?

Clear input — which source, which document, which message, when.
Clear output — format, length, tone, where it lands.
Clear limits — what the agent must not do and where it should stop and ask.

Next steps

1. Today: Pick one task you do every week. Write it down with input, output and limits in three lines.

2. This week: Build on your own separate hardware. Solo: grab OpenClaw, run it on an old PC. Large org: build the first experiment as proof of concept, not rollout.

3. In parallel: Find a technical sparring partner. Set up a Claude as OpenClaw expert in a separate project. Switch over every time you get stuck.

Next in the series: Claude Cowork. The opposite approach to OpenClaw — no server, no isolated environment, a non-technical colleague up and running in minutes.

Stefano Vincenti — GenAI strategist and architect. Co-founder of BotTellMe. External lecturer at ITU and DIS Copenhagen. Partner at TryZone. Subscribe to the newsletter and get the next three issues directly.