← research

Earned Autonomy: Trust-Gated FSMs for Bounded Agents

On building autonomous systems where capability is a consequence of demonstrated competence.

BlackAgent starts constrained and earns broader access through a trust score that rises with successful task completion and drops with failures. Combined with contract-governed execution and a deterministic state machine for safety decisions, this architecture produced agents that are measurably more persistent, more efficient, and more predictable.

// findings

The default architecture for autonomous AI agents is full access at boot. The agent receives a system prompt, a set of tools, and a mandate to accomplish tasks. Safety is handled by instructing the model to be careful, which is equivalent to asking a probabilistic system to be deterministic. It works most of the time. When it does not, failures are unbounded.

BlackAgent's architecture is built on the premise that capability should be earned. The system starts with a trust score of zero and a limited set of tools. Completing tasks successfully raises the score. Failures lower it. Critical mistakes drop it sharply. The score is organized into four tiers, untrusted, limited, trusted, and elevated, and each tier unlocks progressively more capable tools. At the lowest tier, the agent can read files and search the web. At the highest, it can modify its own source code and manage system services.

This is enforced by host-side code. The trust score is maintained outside the language model's context, checked before every tool invocation, and persists across restarts. The model has to demonstrate competence through measurable outcomes.

The second structural constraint is the Iron Kernel: a deterministic finite state machine that handles every safety-critical decision. When to stop, how much budget to spend, what requires human approval; these are state transitions governed by explicit rules. The model handles reasoning, planning, and conversation within the boundaries the FSM defines. This separation means safety guarantees are independent of the model's judgment, which is the only kind of guarantee that survives adversarial inputs and edge cases.

The third constraint is contract-governed execution. Before non-trivial work begins, the system generates a task contract that defines scope, allowed tools, turn limits, timebox, budget ceiling, escalation rules, and approval behavior. These boundaries are enforced by the runtime. When a task exceeds its contract, it stops. The agent can request a contract extension, but the default is bounded failure.

What emerged from these constraints surprised us. The trust system created a feedback loop: because completing tasks raises trust, and higher trust unlocks better tools, the agent developed a measurable bias toward finishing what it starts. Abandoning a task costs trust. Completing it earns trust. Over time, the agent became noticeably more persistent when it hit obstacles; the incentive structure rewarded follow-through.

The efficiency gains compounded across sessions. An agent that has seen a type of task before remembers how it solved it through its memory system. An agent that is penalized for wasting turns learns to plan before acting. An agent that operates under a strict context budget learns to keep its reasoning focused. None of these efficiencies were programmed as rules. They emerged from the constraints.

Contract enforcement changed how failures look in production. Before contracts, a hard task could spiral into unbounded turns, tool calls, or context drift. With explicit limits, failures became bounded and diagnosable. The agent either finished inside the contract or surfaced a precise reason for escalation. The result was less hidden thrashing and more predictable autonomy.

This pattern generalizes beyond agent architectures. SudoPrompt applies the same principle to prompt generation: it forces a clarification loop that narrows the input space before generation begins. The same model, given the same task, produces dramatically different results depending on whether it received a vague sentence or a constrained specification. In both systems, the strongest gains came from narrowing decision space.

The broader observation is that constraints are the mechanism through which capability becomes reliable. A trust-gated agent can only do what it has earned the right to do, which means its failures are proportional to its demonstrated competence. The ceiling stays the same. The floor rises.

autonomous-agentsfinite-state-machinestrust-systemsconstraintsblackagentsudoprompt