SudoPrompt

a translator between what you mean and what the model needs to hear

AI is a gold in, gold out machine. If you feed it vague input, it does not fail gracefully. It amplifies the vagueness and returns something confidently mediocre. Most people experience this as the model being bad. It is not. The model is doing exactly what it was asked to do. The problem is that most people do not know how to ask. SudoPrompt is a pre-production layer that sits between human intent and model execution. Instead of letting the user fire off a half-formed idea and hope for the best, it forces a clarification loop, called the Sudo Check, that interrogates the request, surfaces missing constraints, and only then generates. It does not guess what you meant. It asks until it knows. Three modules handle different domains: Visual Studio for image and video prompts, Agent Forge for system prompts and agent architectures, and Docs Builder for PRDs, implementation plans, and technical documentation.

// architecture

The system runs on Next.js with server-side streaming through the Vercel AI SDK. When a user submits input, a router selects the appropriate module based on the active skill, and each module is backed by a specialized system prompt template that defines a persona, a workflow, output constraints, and a security footer. The core loop works like this: the model receives the input along with its template. If the input lacks specificity, the model outputs a structured JSON response with sudoCheck set to true and an array of contextual questions. The frontend parses this from the stream using a balanced brace extraction algorithm, a state machine that counts opening and closing braces while respecting string escaping, because greedy regex fails on nested JSON. The questions appear as a modal with clickable options and text fields. When the user answers, those answers are serialized and appended directly to the system prompt as additional context, with an explicit instruction not to ask again. The model then generates with full constraints. Session persistence uses event-driven callbacks on the onFinish handler rather than useEffect watchers, which eliminates the race conditions that cause data loss when a user closes a tab mid-generation. Supabase handles storage with row-level security isolating each user's sessions.

// how it works

The Sudo Check is an adversarial interruption. When the model detects that a request is too vague to produce something specific, it refuses to generate and instead asks hard questions. For image prompts, it asks about art style with specific artist references, lighting setup, camera angle, lens type, color palette, and mood, each as a single focused choice. For agent architectures, it asks about personality, tools, safety constraints, and output format. For documentation, it asks about the problem statement, target users, and success criteria. The questions are contextual to the module. The system feels like a senior consultant pushing back on a brief.

Each module operates through a dedicated system prompt template that defines a complete persona and workflow. Visual Studio's direct skill acts as a precision prompt engineer who structures output into seven layers: subject, style, environment, lighting, camera, color, and technical parameters. The storyboard skill enforces a strict one-panel-per-response constraint, preventing the model from generating all nine panels at once in a lower-quality batch. The video sequence skill produces Frame A, Frame B, and a stitch description with camera motion, speed, and duration. Agent Forge's code agent skill produces XML-wrapped system prompts with chain-of-thought reasoning. Its custom agent skill designs full architectures with role definitions, capabilities, behavioral guidelines, and edge case handling. Docs Builder produces structured markdown across four skills: PRDs, implementation plans, task breakdowns, and READMEs. Every template follows the same structure: role definition, then task, then workflow, then critical rules, then output format.

The system actively resists the biases that language models default to. The image prompt templates maintain a forbidden terms list: words like cinematic, stunning, breathtaking, epic, and masterpiece are explicitly banned because models gravitate toward them when given freedom, producing homogeneous output. The model can only use these terms if the user specifically requests them. This is a countermeasure against the tendency of language models to optimize for impressive-sounding language over precise description. The same principle applies across modules: the agent templates prevent generic personality descriptions, and the documentation templates prevent vague requirement statements.

The system accepts images, audio files, text documents up to fifty thousand characters, and code in any major language, all processed alongside text input to influence generation. Images are compressed on the client and sent as base64 to the model. Audio is transcribed and analyzed. Code files are extracted and included as context. A user can drop a reference image alongside a text description and get a prompt that accounts for both. For documentation, a user can paste an entire codebase and get a README that reflects the actual implementation. Multimodal input is how the system gathers the specificity it needs.

Session persistence was engineered to never lose work. The common pattern in React applications is to watch for state changes with useEffect and save when something updates. This creates race conditions: if a user closes a tab while a save is in flight, or if two rapid state changes trigger overlapping writes, data is lost. SudoPrompt avoids this entirely by using event-driven callbacks. When the model finishes generating, the onFinish handler fires exactly once and writes the result directly to Supabase. Session references are held in refs rather than state closures, so the callback always has the current value. The sidebar loads session metadata only, not full message histories, keeping navigation fast even with hundreds of sessions. An in-memory cache with a sixty-second TTL prevents redundant database queries without requiring complex invalidation.

// what we observed

The most significant observation was how dramatically the clarification loop changed output quality. Before building SudoPrompt, generating a usable image prompt took an average of five attempts. The user would write something, get a mediocre result, adjust the prompt, try again, and iterate until something landed. With the Sudo Check forcing specificity upfront, the first or second generation was usually good enough to use. The improvement came from the input becoming more precise. The same model, given the same task, produced dramatically different results depending on whether it received a vague sentence or a constrained specification. This confirmed the core thesis: the bottleneck in most AI workflows is input quality.

Using SudoPrompt extensively for AI-generated video production revealed something about consistency that has implications beyond image generation. Maintaining visual coherence across dozens of frames, same character, same lighting, same environment from different angles, is nearly impossible with freeform prompting because each prompt introduces subtle drift. The structured output format, with its seven explicit layers, acted as an anchor. Once the style, lighting, and camera were locked in during the Sudo Check, every subsequent frame inherited those constraints. The model stopped inventing new interpretations because the constraints left no room for invention. This is what it means to turn a probabilistic system into something approaching deterministic behavior: narrowing its decision space until the remaining choices are all acceptable.

The adversarial questioning pattern taught us something about how people interact with language models. Most users approach AI the way they approach a search engine: type something short, hit enter, evaluate the result. The Sudo Check interrupts this habit deliberately. It asks questions that the user did not think to answer, surfacing assumptions they did not know they were making. Over time, frequent users began writing more specific initial prompts, anticipating the questions the system would ask. They were learning to communicate with language models more effectively. The tool was training its users through repeated exposure to the gap between what they said and what they actually meant.

The forbidden terms list produced a measurable effect on output diversity. When language models are allowed to use words like cinematic, stunning, or masterpiece without constraint, they converge on a narrow band of aesthetically similar outputs. This is not a bug in the model. It is a consequence of training data: these words appear overwhelmingly in contexts that describe a specific visual style, and the model reproduces that association faithfully. By banning these terms and forcing the user to specify what they actually want, the range of generated outputs widened significantly. Images became more varied, more specific, and more reflective of the user's actual intent. The same principle applied to agent prompts: banning generic personality descriptors like helpful and professional forced users to define actual behavioral characteristics.

The most unexpected use case came from using SudoPrompt to build other AI tools. When designing system prompts for autonomous agents, the Agent Forge module forced a level of architectural thinking that freeform prompting skipped over. Instead of writing a system prompt as a stream of consciousness, the Sudo Check asked about edge cases, error handling, tool definitions, and safety constraints before a single line was generated. The result was a better-designed agent, because the clarification process doubled as an architectural review. The tool designed for prompt generation had become a design tool, forcing structured thinking about systems that most people build through trial and error.