AI Agent Governance: My 6-Month Field Notes

TL;DR

AI agent governance means the rules, limits, and review layers you put on agents running in production — not just what you tell them, but what you let them do
I've run 8 autonomous agents across 9 websites for about 6 months; governance failures cost me two social accounts and roughly $200 in wasted API spend
The four things that actually matter: tool boundaries, rate limits, output verification, and rollback procedures
For indie developers, a 15-line config file and one human review checkpoint beats any enterprise compliance framework

Who I Am and Why I'm Writing This

I'm Jim Liu, a Sydney-based indie developer. I build and run 9 AI-powered websites — an AI tools hub, a Hong Kong finance site, a crypto airdrop tracker, a few gaming properties, and others. Most of them are partly automated: blog posts drafted by agents, backlinks submitted via browser scripts, SEO data collected by scheduled Python jobs.

I've been running AI agents in production since mid-2025. Not research, not demos — actual agents that take external actions, touch live databases, submit forms, and post content to the internet.

My numbers: Claude Code agents via the Anthropic API, DrissionPage-based browser automation connecting to Chrome's debugging port, Playwright-based harness for multi-tab browser tasks, and custom Python orchestrators gluing it together. Running on VPS instances and my Windows dev box.

AI agent governance, for me, isn't about policy documents or audit trails. It's about not losing accounts, not deploying slop, and not watching $200 disappear into a retry loop overnight.

The Five Governance Failures That Taught Me Everything

I'll be specific, because vague lessons don't stick.

The Quora permanent ban. My forum posting agent had no rate limit on links. It was given a list of questions and told to answer each one with a relevant link to my site. Forty-five answers with links in 14 days. Quora's spam classifier apparently treats that as a hard threshold. Account gone, no appeal. The fix: explicit link-ratio tracking (≤30%), a hard cap of one answer per day, and a 14-day link-free warmup period for any new account. I have this written into the agent's config now. Before I had it written down, I had to lose the account first.

The port 9222 conflict. DrissionPage and Playwright both talk to Chrome's remote debugging port. When both were running at the same time — one submitting a form, one checking a different site — they'd interfere with each other. Wrong tabs would get operated. Submissions would be recorded as successful when they'd actually fired at the wrong URL. I didn't notice for a while because the logs looked fine. Fix: a hard rule enforced in the project config. One browser automation task at a time, always. I wrote "NEVER run two browser tasks simultaneously" directly into the project's CLAUDE.md so the rule would survive across sessions.

The $180 API loop. A Claude orchestration agent hit an unexpected error type from a downstream tool and entered a retry loop. It kept calling the same tool, getting the same error, burning roughly $0.30 per cycle. I caught it after 600 calls, about 6 hours later. Fix: a max_iterations parameter on every agent loop, exponential backoff after the first failure, and a circuit breaker that stops execution after 3 consecutive identical errors. The circuit breaker is now non-negotiable for any agent that calls an external API.

The slop deployment. Early on, one of my blog drafting agents had write access to the production database. It published 13 articles before I noticed — technically coherent, structurally identical, zero first-person experience in any of them. Google's March 2026 core update later confirmed this was a category of content it was specifically looking for. Fix: mandatory human review between "draft complete" and "INSERT to DB." The agent writes a file to disk. I open it, read it, and approve it. Takes me 10–15 minutes per article. The automation step saves me 2–3 hours of research and first-draft writing. The review step ensures the output is actually publishable.

The Reddit shadowban. A posting script was too regular: same delay between posts, same approximate post length, same link-to-text ratio. Two subreddits in a week. Fix: randomized delay (7 minutes, plus or minus 3), length variance in the post content, maximum 2 posts per session before the script exits.

Five incidents, five rules I now have in my config. None of them came from reading governance frameworks. All of them came from the failure.

My Four-Pillar AI Agent Governance Framework

This is what I actually run. It's not theoretical.

Here's what each pillar actually protects against:

Pillar	Without it	With it
Tool boundaries	Agent takes unintended shortcuts through your system	Constrained to the paths you designed
Rate limits	Runaway loops, banned accounts, unexpected API spend	Predictable resource consumption per session
Output verification	Slop in production, submissions to wrong targets	Review layer catches issues before they go public
Rollback procedures	Hours of manual cleanup after something goes wrong	5-minute fix via pre-built reversal path

Pillar 1: Tool boundaries

Every agent in my stack has an explicit list of what it's allowed to touch. Browser agents can fill forms and click submit buttons — they don't have access to my configuration files or non-target APIs. Writing agents can insert to the blog database — they can't modify existing posts, run git commands, or call external services. These constraints live in code, not just in the system prompt.

The reason it matters: a well-intentioned agent will take the most direct path to its goal. If you don't explicitly fence off paths you don't want it to take, it will eventually take one of them.

Pillar 2: Rate limits on every external action

Every category of external action has a count ceiling. Forum posts: 1 per day. Backlink submissions: 10 platforms per session. IndexNow: 20 URLs per batch. API calls: max_iterations on every loop. These specific numbers came from failures and from platform documentation, not from guessing.

The broader principle: any action that touches something outside my own systems gets a rate limit. Not because I expect the agent to abuse it, but because I expect to make configuration mistakes that could cause it to.

Pillar 3: Output verification before irreversible actions

My current pattern: agent produces output → verification step → human or automated review → action taken. For blog posts: the draft exists as a markdown file before anything touches the database. For form submissions: a dry-run mode shows what would be submitted without actually submitting. For API calls: inputs and outputs are logged before any external call fires.

This pillar catches "the agent is doing the right thing for the wrong reason" — which is harder to detect than outright errors, and usually more expensive.

Pillar 4: Rollback procedures for everything

When something goes wrong, how fast can I undo it?

For blog posts: a single SQL UPDATE sets published=false. For backlink submissions: I keep a SSOT (single source of truth) of every submitted domain with status, so I know exactly what was sent and to whom. For code changes: conventional commits on every agent-triggered change, so I can git-revert to any prior state within a few seconds.

The test I use: pick any agent action from the last 7 days. Can I fully reverse it in under 5 minutes? If yes, I have adequate rollback for that category. If no, I need either a better logging system or a more conservative approach to that action type.

Tools That Actually Help

For the kind of AI agent governance I'm doing — indie developer, 9 sites, no dedicated ops team — here's what's made the most difference:

A flat config file per agent (JSON or YAML): max_calls, rate_limits, allowed_tools, dry_run_mode. Read at runtime. Changing behavior requires editing a file, not redeploying code. When something goes wrong at 2am, I can change a config without touching code.

An append-only action log (action-log.jsonl): every agent action with timestamp, target, type, and outcome. Feeds into a keyword dedup check so agents don't repeat work that's already been done within an evaluation window. Feeds into the circuit breaker so the retry logic has historical context.

A human checkpoint before irreversible external actions: I have one in every pipeline that touches the public internet. Not for every step — just for the final publish, submit, or post. It takes 30 seconds. It catches the problems I didn't think to test for.

A dead-man's switch on every API loop: max_iterations, hard-capped, logged to stderr if it triggers, execution stops and I get notified. No silent infinite loops.

I don't use LangSmith, enterprise tracing platforms, or formal audit infrastructure. They're the right tools at a different scale. A flat JSONL file and a grep command get me what I need.

FAQ

What's the difference between AI agent governance and prompt engineering?

Prompt engineering shapes what the agent says. Governance shapes what the agent is allowed to do and how much of it. You can have a perfectly written system prompt and still end up with a runaway API loop if you haven't set a max_iterations cap. They operate at different layers of the stack.

Do indie developers actually need a governance framework?

If you're running agents that only affect local files you control, probably not. If you're running agents that take external actions — posting, publishing, submitting, sending — then yes, even a minimal one. Specifically: explicit rate limits on external actions, and at least one human review checkpoint before anything irreversible fires.

How do I know if my current agents have adequate governance?

Pick your most automated agent and ask: if it ran unattended for 24 hours starting right now, what's the worst realistic outcome? If the answer involves lost accounts, duplicate content deployed publicly, or unexpected spend above ~$20, that gap is your governance roadmap.