DEV Community

Noel Himer
Noel Himer

Posted on

I run my one-person company as a ~20-agent AI org. Here's what broke.

I'm a solo operator — no employees, no co-founder. But I don't really work alone anymore. I work as the human in the loop of a small org made of about twenty specialized AI agents, each with a job, a name, and a list of things it is explicitly not allowed to do.

This isn't a "10x your output with AI" post. The most valuable thing the setup taught me is that I'd spent weeks building the wrong things — I'll get to that. But first the shape, because people keep asking what "running your company with agents" actually means once you get past the screenshots.

The shape

A chief-of-staff agent runs every morning and writes me a brief: what shipped in the last 24 hours, what's gone stale, what's about to cross a limit I set. A finance agent knows my tax situation. A security officer scans for leaked secrets before anything gets committed. A researcher validates ideas — and is deliberately rate-limited, so it can't start a new investigation until the last one produced something real. A shipper scaffolds and deploys. There's a content lead, a publisher, an archivist, and a dozen more.

They share three things: one task backlog that is the single source of truth for what's being worked on, a written knowledge vault so the next agent — or the next session — inherits full context instead of starting blind, and a set of real tools behind one endpoint, so an agent can actually do things rather than just describe them.

Over the top sits a console I built called URSA. It's the cockpit — live status, and a voice. Literally a voice: it speaks to me in two languages, and it interrupts me, unprompted, when a number I told it to watch goes the wrong way.

Two rules hold the whole thing together. Anything that spends money, publishes to the world, or touches an account stops and asks me first. And every product gets a kill threshold the day it ships — a pre-committed number and date at which I walk away — so a thing that isn't working can't quietly absorb my attention forever.

What broke

Three things, in order of how much they taught me.

One: an agent hung for eight hours. I asked a build agent to test a service. Instead of starting it in the background and probing it, it ran the server in the foreground — then sat there, blocked, holding the process open for 472 minutes, spawning orphans the entire time. The lesson is now a rule: an agent must never foreground-run a daemon, and the orchestrator has to watchdog its workers on a timer. Agents fail in ways a human just wouldn't.

Two: a config file got silently overwritten. For a stretch, builds were mysteriously breaking. The cause: an agent had written test-fixture content directly over a real Dockerfile, and nothing caught it because that directory wasn't even tracked. The fix was a guard that hashes the file and fails the commit if it changed unexpectedly. The deeper lesson: agents need guardrails that fail loudly, at the moment of the mistake — not a code review three steps downstream.

Three — the expensive one: I built seven products before a single person asked for one. This is the one I'm still sitting with. My fleet is good at building. In a few weeks it shipped seven real tools — tested, deployed, the works. And I use them. But I built every one because the market looked like it was there, never because a real person pulled it out of me. Meanwhile the only things that feel genuinely valuable are the internal tools — the console, the org itself — which exist because I actually needed them.

The pattern took me embarrassingly long to see: everything I built from a push ("this looks like a market") came out hollow. Everything I built from a pull (a need I already had) came out alive. So I made the org enforce it. There's now a gate every new build has to clear before it gets any of my time, and its first question is the one I'd been skipping — who pulled this? One of my agents is read-only and exists for essentially that purpose: to tell me no.

If you want to try this yourself

You don't need twenty agents. Almost all of the value came from four prompt patterns, and they fit inside a single system prompt — paste them into whatever AI you already use.

1. Give it a role and a "never" list. The negative space does more work than the instructions. What makes an agent safe isn't what you tell it to do — it's what you forbid.

You are my [role]. Before any action, check it against these hard limits: never spend money, publish anything public, or touch an account without stopping to ask me first. If a task would cross one of those lines, stop and surface it instead of proceeding.

2. Make it argue back before it builds. My single most useful agent is read-only and exists to say no. You can bake the same reflex into one prompt:

Before you build, fix, or write anything, ask one question first: who actually asked for this? If the honest answer is "no one, it just seems like a good idea," say so and push back before doing the work. Default to "let's validate first," never "let's build."

3. Give it a memory it reads before it starts. Keep one file of state — what you're working on, what's decided, what you've already tried — and begin every session by pointing the AI at it. Context that lives in a file survives; context that lives in a chat window dies when you close it.

At the start of every session, read STATE.md before doing anything. Whenever something is decided or changes, write it back. Never make me repeat myself.

4. Pre-commit the kill line. The discipline that's saved me the most: decide the failure condition before you start, not after you're attached.

For anything we start, write down now the condition under which we stop — a number and a date. When we hit it, remind me and make me defend continuing.

That's the spine. Twenty specialized agents with real tools is just this, scaled out — but the four patterns are the part that actually changed how I work.

What I still don't know

Whether any of this is a business, or just an elaborate and deeply satisfying way to run a one-person shop. Right now I have a fleet of tools I use every day and a customer count I'm still working on. That's the honest status. I'm betting the discipline I just bolted on is what turns the second number — but I don't know yet, and I'd rather say that than pretend.

If the wiring behind any of this is interesting to you — how the agents coordinate, how the console got a voice, how the kill thresholds actually fire — that's what I write up, one build at a time, in Unbearable TechTips Weekly. The architecture's there, the war-stories are there, and I'm figuring out the rest in public.

— Noel, Unbearable Labs

Top comments (3)

Collapse
 
txdesk profile image
TxDesk

The "who pulled this" gate going forward is the easy part. The harder version of the same question is what to do with the existing build pile that was already shipped from push, not pull. You said the seven products are deployed and you use them. But "I use them" is a sample-size-of-one validation. The kill-threshold rule retroactively applied to the existing pile is interesting: take each of the seven, write down today the number and the date you'd walk away, and treat anything currently below the line as already-decided. Forward gate stops new push-builds. Retroactive threshold extracts you from the ones already living rent-free.

Collapse
 
unbearablelabs profile image
Noel Himer

Fair hit, so I ran it instead of nodding along. Pulled the numbers today: all seven sit at exactly 1 total user — me. And even that flatters them: my real usage goes through a local multiplexer that never touches the cloud deployments, so the last cloud runs are my own smoke tests from nine days ago. Roughly a week and a half listed on Smithery and the official MCP Registry has produced zero external runs.

The lines are now written, with one amendment to your framing. For free tools the rent was never hosting — they idle at $0 — it's attention. And in an agent org, attention is encoded: my agents kept patching zero-user products because their charters enumerate everything marked "live." So the walk-away isn't deletion, it's an edit to the registry the agents read:

Fewer than 2 external users by Jul 15 → frozen: zero maintenance hours, and it drops out of the "I run seven products" story.
Still zero by Aug 15 → dormant: a status flag makes every agent skip it; a monthly canary is the only thing still watching.
De-listing is event-triggered only: a dormant tool that breaks gets unlisted, not fixed. A broken tool with my name on it is worse than no tool.
The listings stay up, because a listing is the only passive demand sensor a free tool has — removing it on a calendar guarantees the zero it was supposed to measure.

And you were right about "already-decided": one of the seven has no exposure event on any roadmap, by your rule it skips the freeze stage entirely. Writing the kill list down took an hour. You were right that not writing it down was the expensive part.

Collapse
 
txdesk profile image
TxDesk

The event-triggered de-listing is the smart part. Most "kill list" systems break because the kill action is itself a decision the operator has to make on a calendar, and decisions on a calendar always lose to whatever the operator is in the middle of when the calendar fires. Event-triggered means the system kills itself when it stops working, which converts a hard choice ("am I really done with this") into a default that requires effort to reverse.

The implicit second-order effect is that your monthly canary becomes the demand sensor too, not just the health sensor. If the canary starts seeing actual external traffic after dormancy, the dormant-flag gets contested by data instead of by intention. That's a different kind of signal than the listing itself: listing measures discoverability, canary measures pass-through. A dormant tool that suddenly gets canary traffic is interesting. A dormant tool whose listing gets a click is just SEO.

The "attention is the rent" reframe is the part of this I didn't have. Hosting costs are the visible expense, but the agent-charter coupling you described is the actual