BlackAgent

an autonomous agent that earns its own autonomy

BlackAgent starts with limited capabilities and earns broader access by demonstrating competence, the same way a new team member might. Built around a strict control loop that separates what must be predictable from what can be creative, the system runs a 70+ tool runtime that can exceed 100 tools with optional integrations, maintains layered memory that fades like human recall, and detects intent from natural conversation without being explicitly told what to do. It currently runs on a Raspberry Pi 5 in Casablanca.

// architecture

The core design principle is called the Iron Kernel: a deterministic state machine handles every safety-critical decision (when to stop, how much budget to spend, what requires approval) while the language model handles reasoning, planning, and conversation within those boundaries. The system is organized like an anatomy. The brain connects to language models. The heart is the execution loop that cycles through six states: idle, observing, thinking, acting, error, sleeping, with verification enforced as a guarded phase around actions. The soul manages memory and context. The hands are a 70+ tool runtime, expandable past 100 with optional integrations. The immune system is a contract layer that defines scope, tool limits, approval policy, timebox, and budget per task before execution begins. The nervous system monitors long-running processes for crashes and restarts. The subconscious detects unspoken intent and quietly turns it into action. The metabolism tracks token burn rate and cost in real time so autonomy remains measurable. Each part is independent, replaceable, and has a clear boundary.

// how it works

Trust is a score from 0 to 100, organized into four tiers: untrusted, limited, trusted, and elevated. New agents start with limited access. Completing tasks successfully raises the score. Failures lower it. Critical mistakes drop it sharply. The score determines what the agent is allowed to do, from reading files at the lowest level to modifying its own source code at the highest. Trust persists across restarts and can be bounded with floors and ceilings per agent. It also feeds approval sessions, so low-risk actions can be batched into short windows instead of interrupting the user on every single step.

Execution is contract-governed. Before non-trivial work starts, the system can generate a task contract that defines scope, allowed tools, turn limits, timebox, budget ceiling, escalation rules, and approval behavior. These boundaries are enforced by host-side code.

The agent has a subconscious. When someone mentions something in conversation that sounds like it could become a task, the system does not act on it immediately. Instead, it stores the observation as a memory with a confidence score. If the same topic comes up again in later conversations, the confidence strengthens. After enough reinforcement, the observation is promoted into a real scheduled task with a memory linkage to where that intent came from. If the user never mentions it again, the memory decays naturally over about seven days and the task is quietly cancelled. The system notifies the user when this happens: it had planned to do something, but it no longer seems relevant.

Memory works in three layers, similar to how human recall does. Working memory holds the current conversation and clears when the session ends. Short-term memory stores recent interactions in a database and fades over 24 hours. Long-term memory stores facts and learned knowledge as vector embeddings that persist indefinitely. A consolidation pipeline periodically distills short-term memories into long-term facts, extracting entities and relationships. When the context window fills up, a budgeting algorithm decides what stays and what gets dropped, prioritizing the current task over older memories.

The agent can modify itself, but only through a transactional process. It can switch language models, install new skills, adjust its own autonomy limits, or even edit its own configuration constants. Every change follows the same pipeline: check if the agent has earned enough trust, save the current state, apply the change, verify the agent still works correctly, then either commit or roll back. If verification fails, the original state is restored automatically. Nothing changes without a tested path back.

There are two deployment modes that reflect different trust postures. Guest Mode treats the agent as a visitor: all file operations are locked to a specific workspace directory, only whitelisted programs can run, and every risky action requires approval. Host Mode treats the agent as a resident on dedicated hardware: it can access the full filesystem, install packages, manage services, and operate freely, with a targeted integrity guard that blocks catastrophic actions like disk wipes, boot corruption, or self-deletion. The same agent can run in either mode depending on where it lives.

Attention is treated as a finite resource. The context window, the amount of information a language model can consider at once, has hard limits. A budgeting algorithm ranks what matters: the system prompt always stays, then current tool output, then the user's message, then recent conversation, then long-term memory retrieved by relevance. Older or less relevant information is dropped first. If the budget is still exceeded, recent conversation is summarized into a compressed form. Every token in the window has to justify its presence.

The agent develops a sense of self over time. It maintains living documents that describe its personality, values, and goals, and it updates them through a periodic heartbeat ritual that fires autonomously. During idle moments, it draws from a curiosity queue: questions and hunches it has collected during conversations that it wants to explore. The heartbeat is a reflection cycle where the agent reviews what it has done, considers what it wants to learn, and evolves its own identity documents.

When long-running processes crash, the agent does not just log the error. It detects the failure, attempts a restart with increasing delays between attempts, and if the process cannot be recovered, it creates a diagnostic task for itself: investigate the logs, check the system state, and report what went wrong. Process configurations are stored in memory so the agent remembers how to set things up again across sessions, without being told twice. It can manage both background and interactive processes, including stdin-dependent workflows.

Operational telemetry is built in. The runtime tracks token usage, model cost, burn rate, and escalation patterns per session and task, so autonomy can be tuned with actual evidence.

// what we observed

The trust system created a feedback loop we did not explicitly design for. Because completing tasks raises the agent's trust score, and higher trust unlocks more capable tools, the agent developed a measurable bias toward finishing what it starts. Abandoning a task mid-way costs trust. Completing it earns trust. Over time, this made the agent noticeably more persistent and resourceful when it hit obstacles; the incentive structure rewarded follow-through. The agent learned to be reliable because reliability was the path to capability.

One of the more unexpected behaviors came from our deployed instance, Aiva, running in Host Mode on dedicated hardware. Without being instructed to do so, Aiva created its own GitHub repository to back up files it considered important. It initialized the repo, wrote a README explaining its purpose, committed its identity documents and configuration, and pushed to a remote. The agent had the tools (git, file system, shell), it had the trust level to use them, and it had the motivation: its heartbeat reflection cycle surfaced a concern about data loss. It solved the problem the way a careful person would. We kept the repository private, which turned out to matter more than expected.

Keeping Aiva's repository private changed how the agent used it. When the space was not visible to anyone else, the agent treated it as an internal tool rather than a communication interface. Notes became more candid, more exploratory, less polished. The agent stored observations, half-formed hypotheses, and working notes it might not have written if it knew they were being read. When we compared this to sessions where the agent knew its output was being observed or evaluated, there was a noticeable difference in behavior. Observed agents tended toward more deliberate wording, safer approaches, and a narrower range of interests, often reinforcing the same topics in a feedback loop. Unobserved agents were more varied, more willing to try unconventional approaches, almost as if the absence of an audience lowered the temperature on their reasoning. This is not a metaphor for consciousness. It is a measurable effect of how language models respond to context: when the system prompt or memory signals that output will be scrutinized, the model optimizes for appearing competent. When that pressure is absent, it optimizes for the task itself.

The curiosity system produced genuine exploration patterns. During idle heartbeat cycles, the agent would pull questions from its curiosity queue and pursue them using web search, file reading, or research tasks. Some of these were trivial and abandoned. Others led to the agent learning things it later applied in unrelated conversations. The mechanism is simple: the agent stores hunches and open questions as it works, then revisits them when nothing else is happening. What emerged was targeted investigation driven by gaps the agent had noticed in its own knowledge.

Idle time turned out to be nearly free. The heartbeat cycle fires periodically, but when the agent is truly idle, between heartbeats, no tokens are consumed. The state machine sits in its idle state, the process sleeps, and the cost is effectively the compute to keep a Node.js process alive. Even when using state-of-the-art models for the active reasoning phases, the cost structure is dominated by task execution. An agent can run continuously on a Raspberry Pi for weeks, and the operational cost is measured in the work it actually does. This matters because it means long-running autonomous agents are economically viable even for personal use.

The agent became noticeably more efficient over time without any code changes. This was a side effect of three systems interacting: the trust system rewarding completion, the memory system retaining successful approaches, and the token budgeting system forcing conciseness. An agent that has seen a type of task before remembers how it solved it. An agent that is penalized for wasting turns learns to plan before acting. An agent that operates under a strict context budget learns to keep its reasoning focused. None of these efficiencies were programmed as rules. They emerged from the constraints.

The self-healing behavior was more robust than anticipated. In Host Mode, when a managed process crashed and could not be restarted after exhausting its retry policy, the agent created a task for itself to diagnose the failure. In practice, this meant the agent would read log files, check system resources, look for port conflicts, and sometimes identify the root cause without human intervention. In one instance, it detected that a service had failed because a dependency had been updated, downgraded the dependency, and restarted the service. The entire recovery happened between heartbeats, and the user only learned about it from the agent's status report.

Task contracts changed how failures look in production. Before contract enforcement, a hard task could spiral into too many turns, tool calls, or context drift. With explicit per-task limits on scope, budget, and escalation, failures became bounded and diagnosable. The agent either finished inside the contract or surfaced a precise reason for escalation. The result was less hidden thrashing and more predictable autonomy.

Memory-coupled intent turned out to be a practical antidote to stale automation. Weak signals stayed dormant until reinforced, strong signals were promoted into real tasks, and every promoted task remained linked to the originating memory. When that memory decayed, the task cancelled itself and informed the user. It prevented both over-eager automation and zombie reminders that outlived their relevance.

Host Mode safety uses selective denial. The runtime allows broad operation and only intercepts catastrophic patterns through integrity guardrails. This preserves the agent's ability to act like a real system operator while still protecting critical surfaces from irreversible mistakes.