Claude Code VS OpenCode

Architecture, Design & The Road Ahead

An in-depth multi-dimensional analysis comparing three AI coding agent systems — OpenCode, Oh-My-OpenCode, and Claude Code — from shared architectural patterns to distinctive innovations, culminating in best practices for agent design.

Language: 中文版 (Chinese Edition)

The Three Systems

System	Role	Philosophy
OpenCode	Open-source foundation	Model-agnostic, multi-interface, programmable
Oh-My-OpenCode	Orchestration layer (OpenCode plugin)	Multi-agent, extreme autonomy, “human intervention = failure signal”
Claude Code	Commercial benchmark (Anthropic)	Safety-first, enterprise-ready, deep model-tool co-optimization

What This Book Covers

Part I — The evolution from code completion to autonomous agents
Part II — Shared architecture: ReAct loops, tools, sessions, LLM abstraction, MCP, configuration
Part III — What makes each system unique
Part IV — Deep dive: how Oh-My-OpenCode builds a 130K-LOC orchestration layer as an OpenCode plugin
Part V — Head-to-head comparison across philosophy, tools, orchestration, extensibility, security
Part VI — Best practices for agent design distilled from all three systems
Part VII — The future of coding agents, and a thought experiment: designing “Oh-My-Claude-Code”

Who This Book Is For

Senior engineers interested in AI agent architecture — whether you’re building your own coding agent, evaluating tools for your team, or just want to understand how these systems work under the hood.

Generated with the assistance of AI. Source code analyzed from OpenCode, Oh-My-OpenCode, and Claude Code repositories.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 1 — The Evolution of Coding Agents
Token Usage: ~8,500 tokens

1.1 From Code Completion to Autonomous Agents

The modern coding agent did not appear all at once. It emerged through three distinct eras, each defined by a different answer to a simple question: how much initiative should the machine take? In the first era, the machine only suggested text. In the second, it could discuss code. In the third, it could act on a codebase, run tools, inspect failures, revise its plan, and continue until a task was complete. That progression, from suggestion to collaboration to delegated execution, is the core story of AI coding systems between 2021 and 2026.

Era One: Code Completion, 2021-2022

The first major wave of AI coding products was the code completion era. GitHub Copilot and Tabnine became the signature products of this phase. Technically, these systems were remarkable, but architecturally they were narrow. Their job was to predict the next token or the next span of code given local context: the current file, the cursor location, a few surrounding lines, perhaps a function signature or comment.

This was the purest expression of language modeling applied to programming. The model did not inspect a repository, decide what to edit, run tests, or reason about a bug report. It merely continued the text already on the screen. If a developer wrote a comment such as // sort users by last login descending, the model might generate a plausible implementation. If the developer started a for loop or a React component, the model could autocomplete the rest.

This was useful because software contains vast amounts of repetition. Boilerplate, API glue code, test fixtures, serializers, data mappers, CRUD handlers, and configuration formats all contain statistical regularities. Next-token prediction is surprisingly powerful in such environments. But the autonomy level was effectively zero. The model had no persistent goal, no planning loop, and no ability to observe consequences in the external world. It could not notice that a test failed because it never ran a test. It could not compare files because it had no tool to read them. It could not decide between multiple paths because it had no internal execution loop tied to action.

That limitation matters. Code completion systems improved typing speed and reduced context switching, but they did not yet behave like agents. They behaved like unusually capable editors. The human remained fully responsible for problem decomposition, file navigation, correctness checking, and integration.

Era Two: Chat Copilot, 2023-2024

The second era began when instruction-following chat models entered the coding workflow. ChatGPT, GitHub Copilot Chat, Cursor Chat, and similar interfaces changed the interaction model from “complete this line” to “help me with this programming task.” This was a profound shift. Developers no longer had to encode intent indirectly through comments and partial code. They could issue explicit instructions in natural language.

The technical foundation changed as well. These systems were not just next-token predictors trained to continue code. They were instruction-tuned assistants optimized to follow requests, explain output, answer questions, summarize files, propose refactors, and generate multi-step responses. A developer could ask, “Why does this function allocate so much memory?” or “Refactor this into a strategy pattern,” and receive a structured answer.

Autonomy increased, but only slightly. The model could reason across a broader prompt, explain alternatives, and synthesize larger code fragments. Yet the workflow was still mostly synchronous and human-driven. The assistant did not usually choose when to inspect another file, when to run the test suite, or when to stop. The human asked. The model answered. If there was iteration, it depended on another user turn.

This produced what might be called low-autonomy coding assistance. The system could participate in problem solving, but it still lacked independent execution. Even when integrated into IDEs, many chat copilots were essentially sophisticated request-response systems wrapped around code search and editing utilities. They made developers more effective, but they did not yet “own” a task in the way an engineer delegates work to a teammate.

Era Three: Agentic Coding, 2024-2026

The third era is the era of agentic coding. Here the crucial innovation is not that models got smarter in isolation, but that they were embedded inside systems that can perceive, act, and iterate. Products and frameworks such as Devin, Claude Code, OpenHands, OpenCode, and related agents moved from single-turn assistance toward bounded autonomy.

An agentic coding system usually has access to a workspace, a shell, file operations, search tools, test execution, and often browser or issue-tracker integrations. It does not merely emit text for the human to paste. It directly interacts with the development environment. This creates a new feedback loop: inspect repository, form hypothesis, edit files, run tests, observe errors, update plan, repeat.

That loop is what makes the system agentic. The model is no longer only generating candidate code; it is coordinating a sequence of actions toward a goal. The goal may be stated as “fix issue #281,” “migrate this endpoint,” or “add retry logic and tests.” The agent decomposes the task, chooses tools, handles intermediate failures, and terminates only when it believes the objective has been satisfied or blocked.

This is high autonomy relative to previous eras, though not infinite autonomy. The better systems still operate within permission boundaries, tool limits, runtime budgets, and human review constraints. But compared to autocomplete and chat copilots, the difference is architectural, not cosmetic.

The Devin Moment

Every market has a moment when a category becomes legible. For coding agents, that moment was Devin’s launch and the benchmark discussion around it. In March 2024, Cognition reported that Devin resolved 13.86% of issues on a sampled SWE-bench evaluation. At the time, that result mattered less as an absolute number than as a proof of category. It showed that a system framed as an autonomous software engineer could outperform prior unassisted baselines by a wide margin on real repository tasks.

What happened next was even more important. Within roughly two years, frontier agentic systems pushed reported SWE-bench-class performance toward the high seventies, with numbers such as 79.6% becoming part of the public discussion. Whether one focuses on Devin specifically, on later Cognition iterations, or on the broader field, the directional message is unmistakable: the benchmark moved from 13.86% to roughly 79.6% in about two years. That is one of the fastest visible capability jumps in applied software engineering AI.

This was the “Devin Moment” because it forced both engineers and investors to update their mental model. AI coding was no longer just about suggestion quality. It was about closed-loop task execution.

ReAct: Reason + Act

To understand why agentic systems differ so much from chat copilots, we need one concept that is still absent from most traditional CS textbooks: ReAct. The term comes from Yao et al. (2022), short for Reason + Act. The idea is straightforward but powerful. Instead of generating a single answer in one shot, the model alternates between reasoning traces and actions in a loop.

In a coding context, a simplified ReAct cycle looks like this:

Reason about what to do next.
Take an action using a tool.
Observe the tool result.
Update the internal plan.
Repeat until done.

For example, if the goal is to fix a failing test, the agent may reason that it first needs to locate the relevant module, then call a search tool, then inspect the file, then edit the implementation, then run tests, then interpret the failure output, then make a second edit. The essential point is that the model is not reasoning in a vacuum. It is reasoning in contact with the environment.

This differs from traditional programming abstractions in a subtle way. In a classical algorithm, the control flow is explicitly authored by a human. In a ReAct-based agent, the control flow is partly generated at runtime by the model, under the constraints of the host system. The host defines the tools, permissions, and stopping conditions; the model decides how to sequence them. That is why system design matters so much in modern agents.

What SWE-bench Measures

For non-specialists, SWE-bench can be understood as a benchmark built from real GitHub issues and real repository histories. Instead of asking a model to solve toy algorithm puzzles, SWE-bench asks whether a system can take a genuine software engineering problem, modify the codebase, and pass the relevant tests.

More concretely, a SWE-bench instance includes an issue description, a repository snapshot from the time of the issue, and tests that verify whether the correct behavior has been restored. The agent or model must inspect the code, infer what needs to change, make edits, and produce a patch that satisfies the test harness. This is much closer to real software maintenance than standalone coding exams.

That is why SWE-bench became such an important reference point. It captures several abilities at once: code understanding, repository navigation, bug localization, editing accuracy, and iterative debugging. A high score does not mean an agent is a universal software engineer, but it does mean the agent can complete a nontrivial subset of real-world maintenance tasks under evaluation conditions.

From Model Capability to System Capability

The historical lesson of these three eras is that intelligence in coding tools is no longer usefully described only by model size or benchmarked reasoning. The decisive question is whether the system can turn language capability into reliable software work. Code completion systems could not. Chat copilots could only partially. Agentic systems can, at least within bounded environments.

That is why the next chapters focus not only on models, but on architecture, scaffolding, permissions, tools, and orchestration. The evolution from Copilot-style completion to autonomous agents is not just a story of better predictions. It is a story of building systems that can observe, act, and recover. In software engineering, that difference is everything.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 1 — The Evolution of Coding Agents
Token Usage: ~7,400 tokens

1.2 Market Landscape and Technical Divide

The market for AI coding systems is no longer a side category within developer tools. It is becoming one of the central battlegrounds of applied AI. By 2025, market research firms were already sizing the AI code assistant market at roughly $4.7 billion, with forecasts projecting approximately $14.62 billion by 2033. Exact numbers should always be treated cautiously because market reports vary in methodology, but the directional signal is clear: coding assistance has become one of the few AI product categories with large-scale enterprise willingness to pay.

That willingness to pay exists because the value proposition is unusually legible. If a tool can reduce implementation time, speed debugging, lower onboarding cost, improve test coverage, and compress the path from ticket to merge request, teams can attach direct economic meaning to the product. This makes AI coding agents different from many consumer AI experiences that generate engagement but not clear budget lines.

A Rapidly Stratifying Vendor Landscape

The space is already stratified into incumbents, editor-native challengers, agent-first startups, and open-source frameworks. The leading names are not interchangeable, because each competes on a different layer of the stack.

System	Company / Project	Approximate Market Signal	Primary Positioning
GitHub Copilot	GitHub / Microsoft	~$800M ARR	Enterprise-scale coding assistant integrated into GitHub and Microsoft workflows
Cursor	Anysphere	$100M+ ARR, later reported well beyond that	AI-native editor with strong UX and fast product iteration
Claude Code	Anthropic	Strategic product within Anthropic	Commercial coding agent tightly coupled to Claude models
Devin	Cognition	High-profile agent product	Autonomous software engineer framing, task execution over codebases
OpenCode	Open-source project	Community traction, no closed revenue narrative	Open foundation for agentic coding across providers
Aider	Open-source project	Strong practitioner adoption	Lightweight terminal-first pair programming and repository editing
OpenHands	Open-source project	Research and community traction	Open autonomous software development framework

These entries reveal an important truth: this is not one market but several overlapping ones. GitHub Copilot competes as enterprise infrastructure. Cursor competes as the AI-native development environment. Claude Code and Devin compete as higher-autonomy coding agents. OpenCode, Aider, and OpenHands compete as open systems, reference architectures, and modifiable substrates.

Why the Market Is Splitting in Two

The most important technical divide in this market is not IDE versus terminal, or startup versus incumbent. It is commercial closed systems versus open-source systems.

Commercial systems usually optimize for polished user experience, safe defaults, integrated billing, managed authentication, compliance features, telemetry, and predictable support. They can move quickly when they control the full product surface: model access, editor integration, cloud execution, permissions, pricing, and release cadence. For many enterprise buyers, this matters more than ideology. Procurement prefers a vendor that can sign contracts, publish trust documentation, and provide a clear escalation path.

Open-source systems optimize for a different set of values: inspectability, modifiability, model choice, workflow control, and architectural transparency. In an open system, an engineering team can inspect the prompt assembly logic, add tools, change providers, patch context management, or fork the permission model. That makes open systems attractive not only to hobbyists, but also to serious platform teams building internal agents.

This leads to a structural contrast:

Commercial systems often exhibit model lock-in. Their best experience usually depends on the vendor’s preferred model stack, safety assumptions, pricing model, and hosted control plane.
Open systems are more often model-agnostic. They can route to different providers, self-hosted endpoints, or local models depending on cost, privacy, or task type.
Commercial systems usually deliver better polished UX out of the box.
Open systems provide source access and therefore deeper adaptability.

Neither side wins universally. A large bank may choose a commercial product because governance and support outweigh hackability. A research lab or AI platform team may choose open infrastructure because experimentation speed and architectural control matter more.

The Autonomy Spectrum

The second major divide is the autonomy spectrum. “AI coding tool” is too broad a label because it collapses fundamentally different products into one bucket. A useful taxonomy looks like this:

Autocomplete: predicts local code continuations. Minimal initiative.
Chat: answers coding questions and generates code on request. Low initiative.
Guided: can perform bounded edits or flows with human confirmation. Moderate initiative.
Semi-autonomous: can execute multi-step tasks with tool use, but under tighter supervision. High initiative within limits.
Fully autonomous: aims to own tasks end-to-end, including planning, editing, testing, and iteration. Maximum initiative within system boundaries.

Most first-generation coding assistants lived in the first two categories. The most interesting systems in 2025 and 2026 operate in categories four and five. This shift matters because revenue potential often rises with autonomy. Autocomplete saves keystrokes. Semi-autonomous and autonomous systems can save hours or even days of engineering time if they are reliable enough.

But autonomy also raises the bar for architecture. An autocomplete system can tolerate occasional nonsense because the human sees every suggestion before accepting it. A fully autonomous system cannot. Once a product edits files, runs shell commands, or manages long task chains, reliability depends on far more than model quality. Permissioning, context selection, rollback behavior, tool design, and failure handling all become first-class product concerns.

Commercial Strengths: UX, Safety, and Distribution

Commercial leaders have three strong advantages.

First, they can invest heavily in product polish. Cursor is a strong example of how much adoption can be unlocked by making AI feel native inside the editing experience. GitHub Copilot benefits from extraordinary distribution through GitHub, Visual Studio Code, and Microsoft’s enterprise footprint. Claude Code benefits from tight integration with Anthropic’s own model roadmap and safety infrastructure.

Second, commercial vendors can build managed safety layers. This includes permission workflows, administrative controls, audit surfaces, secure defaults, and organization-wide settings. These are not glamorous features, but they become decisive the moment an AI system is allowed to execute commands or touch production-adjacent repositories.

Third, they control packaging. Enterprises rarely want to assemble their own agent platform out of shell scripts, prompt files, SDK adapters, and open-source repos. They want something that can be rolled out with policy.

Open-Source Strengths: Transparency, Portability, and Innovation Speed

Open-source systems have a different advantage: they are where architectural experimentation often happens first. Multi-agent orchestration, provider abstraction, custom tool injection, prompt layering, hook systems, and local workflow integration often evolve faster in open projects because users can directly modify the product.

OpenCode, Aider, and OpenHands illustrate three versions of this. Aider shows how much value can be delivered with a sharp terminal workflow and repository-aware editing. OpenHands demonstrates open experimentation in autonomous software tasks. OpenCode provides a broader substrate for model-agnostic agentic coding, which becomes especially interesting when orchestration layers such as Oh-My-OpenCode are built on top.

Open systems also mitigate supplier risk. If a commercial product depends on one provider’s proprietary stack, customers inherit that stack’s pricing, rate limits, roadmap, and policy boundaries. Model-agnostic open systems let organizations swap providers or combine them. In a fast-moving market, that flexibility is strategically valuable.

What the Revenue Numbers Actually Mean

The revenue signals around GitHub Copilot and Cursor are useful, but they should be interpreted carefully. They do not merely show customer demand. They show where the market believes workflow ownership will accumulate.

GitHub Copilot’s scale indicates that broad, enterprise-distributed coding assistance is already real business. Cursor’s rapid ARR growth indicates that developers will switch tools if the interface, responsiveness, and AI workflow are materially better. The attention around Devin indicates that people are willing to pay not only for suggestions, but for delegated execution. Claude Code demonstrates that model vendors themselves increasingly want to own the agent layer instead of remaining upstream suppliers.

In other words, the market is converging on a new thesis: the winning product may not be the one with the best base model alone. It may be the one that best packages model capability into a reliable software engineering workflow.

The Real Technical Divide

That observation leads to the deepest divide of all. The field is no longer separating only by brand, price, or deployment model. It is separating by system design philosophy.

One camp treats AI coding as enhanced text generation with some workflow features added. The other treats it as an agentic runtime for software work. The first camp can still build excellent products. The second camp is more likely to define the long-term category if agent reliability keeps improving.

This book is concerned with that second camp. The important comparisons are not just who has the best chat response or the nicest editor. The important questions are: who controls tools well, who manages context well, who balances autonomy with safety, who supports extension, and who can turn model intelligence into repeatable engineering outcomes.

That is the technical divide beneath the market divide. Once seen clearly, much of the industry becomes easier to interpret.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 1 — The Evolution of Coding Agents
Token Usage: ~6,300 tokens

1.3 Scaffolding Matters More Than the Model

One of the most important lessons in the coding-agent era is also one of the least intuitive: the model is not the whole system, and often not even the main differentiator. A strong model inside a weak agent architecture can underperform a weaker model inside a better-designed system. This is the meaning of the claim that scaffolding matters more than the model.

This is not a slogan. It is increasingly supported by empirical observation. Morph LLM and other practitioners have reported cases where the same underlying model performs dramatically differently depending on the surrounding agent design, sometimes with gaps as large as 17 benchmark problems on the same evaluation set. When a single model can swing that far without changing the weights, the explanation cannot primarily be “the model got smarter.” The explanation must lie in the system wrapped around it.

What “Scaffolding” Means

Scaffolding is not a standard textbook computer science term in this context, so it requires explicit definition. In AI coding agents, scaffolding means the entire operational structure around the language model that allows it to do useful work. That structure typically includes:

tool definitions and tool quality
context selection and context compression
system prompts and behavioral constraints
memory and session persistence
permission handling
error recovery strategies
retry policies
planning loops
stopping criteria
output parsing and validation
repository search mechanisms
file editing strategies

In older software terminology, one might loosely compare scaffolding to runtime orchestration, middleware, and control logic combined. But that analogy is incomplete because the LLM is not a deterministic function. The scaffolding exists to shape a probabilistic planner into something operationally reliable.

That is why the term matters. It names the layer where agent products now win or lose.

Why the Same Model Produces Different Agents

Suppose two products both use the same frontier model. One gives the model a blunt prompt, a weak file-search tool, noisy tool output, no recovery logic, and an oversized context window stuffed with irrelevant history. The other gives the model carefully designed tools, compact and relevant context, explicit permission boundaries, high-signal feedback, and a robust think-act-observe loop. These are not equivalent systems, even if the model is identical.

The second system will usually perform better because it reduces decision entropy. It helps the model spend its reasoning budget on the task itself rather than on navigating a poorly structured environment. In human terms, this is obvious. A senior engineer with a broken terminal, missing documentation, and chaotic logs will perform worse than the same engineer with good tooling and a clean workflow.

Coding agents are no different. Their failure modes are often environmental:

choosing the wrong file because retrieval is weak
wasting tokens on irrelevant context
producing invalid edits because the editing primitive is poor
looping because stop conditions are unclear
failing to recover because test output is too noisy
asking for unnecessary permissions because risk classification is crude

None of these failures is fundamentally a model-weights problem. They are systems problems.

The Shift from Model Comparison to System Comparison

This insight reframes how serious engineers should evaluate coding agents. The right unit of comparison is no longer just the model name on the box. It is the full system.

In the earlier LLM era, it was reasonable to ask: which model writes better code? That question still matters, but it is no longer sufficient. In the agent era, the more relevant question is: which system most reliably turns model capability into completed engineering work?

This is a deeper question. It includes not only generation quality, but also:

how the agent reads large repositories
how it chooses between tools
how it handles permissions
how it recovers from failed tests
how much irrelevant context it drags into each turn
how it coordinates subagents or background tasks
how it balances autonomy with user control

Once this shift is understood, many industry debates look different. A product demo that highlights only the raw eloquence of the model may miss the real battle. The real battle is whether the architecture creates a reliable software process around the model.

Architecture as the New Differentiator

This is why architecture has become the primary differentiator in coding agents. Model progress is still rapid, but frontier models are increasingly accessible through APIs, cloud platforms, and provider abstraction layers. If multiple products can access similarly capable models, then durable advantage moves upward in the stack.

That higher layer includes prompt assembly, permission design, tool ergonomics, session handling, and orchestration logic. Some systems are optimized for a single model vendor and can tune every assumption around that stack. Others are model-agnostic and invest in translation layers, capability detection, and provider portability. Some systems expose hooks and plugin APIs. Others prioritize locked-down reliability.

All of these are architectural choices. They determine what kind of agent the model becomes.

A Concrete Example: Error Recovery

Consider one practical case: failing tests. A naive agent may run the tests, receive 800 lines of output, and paste the whole result back into context. A better system may summarize the failure, isolate the relevant stack trace, link it to the edited files, and ask the model to focus only on the changed behavior. A stronger system may go further and automatically preserve test files, validate the patch format, and prevent the model from overwriting unrelated code.

In all three cases, the model may be identical. Yet the probability of successful recovery is not identical. Better scaffolding improves signal quality and reduces the chance of flailing.

This is why benchmarking agent systems is so difficult. If an evaluation only reports the model family, it hides a large fraction of the real explanatory variables.

Scaffolding as Applied Software Engineering

There is also a meta-lesson here. Building a good coding agent is itself a software engineering problem. The frontier is not only about bigger models or better pretraining data. It is about careful system construction.

In fact, many of the best ideas in agent design look like classic engineering disciplines in updated form:

permission systems resemble security engineering
context management resembles cache design and memory management
tool design resembles API design
retry and fallback logic resembles distributed systems resilience
hook systems resemble extensibility architecture
benchmarking and observability resemble production operations

This matters because it means strong coding agents will not be built by model researchers alone. They will be built by teams that understand product architecture, developer workflows, systems reliability, and human-computer interaction.

Implication for This Book

This chapter provides the conceptual lens for the rest of the book. We are not primarily comparing model brands. We are comparing systems. OpenCode, Oh-My-OpenCode, and Claude Code matter not only because of which models they can call, but because of how they structure action, memory, permissions, tools, and extensibility.

That is why a serious technical comparison cannot stop at benchmark tables or model provider lists. It must inspect the scaffolding. Once autonomy enters the loop, scaffolding stops being implementation detail. It becomes the product.

The strategic consequence is straightforward: in the coding-agent market, raw model capability is increasingly necessary but insufficient. The winners will be those who build the most effective operational envelope around the model. Architecture is no longer support machinery. It is the source of performance.

From Scaffolding to Harness Engineering

Appendix Addendum: generated by openai/gpt-5.4
Token Usage for this added section: ~2,900 tokens

The phrase “scaffolding matters more than the model” began as a sharp intuition. It explained why two products built on similar frontier models could behave like entirely different engineering systems. By late 2025 and early 2026, however, that intuition started to harden into something more formal: an engineering discipline increasingly referred to as Harness Engineering.

This shift matters because “scaffolding” is useful but still somewhat informal. It names the surrounding structure, but it does not yet sound like a design discipline with reusable principles, measurable outcomes, and operational patterns. “Harness engineering” does. It says that the wrapper around the model is not an implementation detail. It is the main object of design.

Origin Story: From Intuition to Named Discipline

Several public moments helped crystallize the idea.

First, Viv Trivedy systematized the concept in September 2025 through the phrase “HaaS — Harness as a Service.” The important move here was not only the name. It was the claim that teams should stop treating agent reliability as a matter of prompt tweaking and instead treat it as a matter of designing the runtime envelope around the model.

Second, Mitchell Hashimoto independently gave the idea a memorable operational definition on February 5, 2026: “anytime you find an agent makes a mistake, you take the time to engineer a solution such that the agent never makes that mistake again.” That sentence is concise, but it captures the heart of the discipline. Harness engineering is not mainly about admiring emergent behavior. It is about converting repeated failure into deterministic infrastructure.

Third, OpenAI amplified the idea on February 11, 2026 when it described a team that built a million-line product with 0 hand-written code using harness engineering. Regardless of how one interprets the exact boundaries of “hand-written,” the rhetorical impact was clear: if a sufficiently engineered harness can repeatedly constrain, verify, and recover model behavior, then large-scale software production becomes less about one brilliant prompt and more about a robust execution environment.

Fourth, Martin Fowler’s website discussed the concept on February 17, 2026, which mattered because it translated the idea into language legible to mainstream software architects. Once Fowler’s ecosystem starts analyzing a term, the term is no longer just social-media jargon. It has entered software architecture discourse.

What Is a Harness?

The metaphor comes from a horse harness. A harness does not create the horse’s strength. It channels it. It turns raw power into directed work. Without a harness, the horse may still be strong, but strength alone does not guarantee useful motion. With a harness, the animal’s energy can be coordinated, bounded, and connected to a larger system.

In AI, the harness is the runtime environment wrapping the LLM. It includes instructions, tools, permissions, validators, memory policies, hooks, retrieval logic, sub-agent boundaries, and verification loops. If we want a compact formula, it is this:

coding agent = AI model(s) + harness

That equation is deceptively important. It means the model is only one term. A better model inside a bad harness may lose to a slightly weaker model inside a superior harness. Once that point is accepted, engineering effort naturally shifts from “which model do we buy?” to “what environment do we build?”

The Seven Principles of Harness Engineering

The emerging literature and practice can be summarized in seven principles.

1. Environment over Model

The first principle is blunt: “It’s not a model problem. It’s a configuration problem.” This does not mean models do not matter. It means many observed failures should be diagnosed at the environment layer first. When an agent edits the wrong file, misses a repository rule, loops on noisy output, or forgets to run tests, the problem is usually not lack of intelligence in the abstract. The problem is that the environment made the wrong action too easy.

2. Deterministic Constraints

A harness should add hard edges around probabilistic behavior. Examples include custom linters, schema validation, patch-format validation, permission gates, post-edit hooks, and repository-specific policy checks. Good harnesses do not merely ask the model to behave. They make bad behavior structurally harder.

3. Progressive Disclosure

Not everything belongs in the initial context. A high-quality AGENTS.md should often be an index, not an encyclopedia. Skills, docs, and rules should load on demand. This keeps the working set smaller and reduces context dilution. In classical systems terms, this resembles lazy loading and hierarchical lookup rather than eager monolithic configuration.

4. Back-Pressure

A serious harness forces the agent to face the consequences of its own actions. Tests, type checks, builds, and validators are forms of back-pressure: they push reality back into the reasoning loop. Without back-pressure, the model can remain internally coherent while being externally wrong. With back-pressure, reality becomes part of the control system.

5. Context Firewall

Sub-agents are not only about parallelism. They are also about context isolation. A specialized sub-agent can work inside a narrow task boundary, preventing unrelated transcript drift from contaminating its reasoning. This is why the context-firewall idea matters so much. It limits context rot, reduces distraction, and makes specialization cheaper.

6. Knowledge as Code

If a rule matters, it should live in the repository. If it is not in the repo, it effectively does not exist for the agent. This principle treats operational knowledge the way modern engineering treats infrastructure: explicit, versioned, reviewable, and reproducible. Tribal knowledge is the enemy of reliable agents.

7. Entropy Garbage Collection

Agents learn bad habits from local history. They copy stale patterns, obsolete commands, dead conventions, and accidental structures. Harness engineering therefore requires periodic cleanup: deleting outdated examples, pruning stale rules, and removing misleading patterns that the model may imitate. In other words, agent environments accumulate entropy and need garbage collection.

Mapping the Idea to OpenCode, OMO, and Claude Code

These three systems can be understood as three different answers to the harness question.

Claude Code represents a closed harness with deep model-harness coupling. This is its biggest architectural strength. Claude Opus is not merely exposed through a general API shell; it is deeply aligned with the Claude Code environment, including workflow assumptions, tool schemas, memory structures, permission logic, and safety rails. In the strongest version of this pattern, the model is effectively post-trained against the harness. That means the model does not just know how to code; it knows how to behave inside this exact runtime. Claude Code also appears to have the most mature hooks-and-skills discipline among the three, especially in the way persistent instructions, memory, and operational workflows interact.

OpenCode represents an open harness. Its strength lies in adaptability. Rather than tightly fusing one model with one runtime, it builds a general agent shell that can work across multiple model providers and exposes extension surfaces such as plugins, tool registries, and composable configuration. This makes OpenCode especially important for harness engineering as a field, because it turns the harness itself into an object that users can inspect and modify. It is less vertically integrated than Claude Code, but more transparent and hackable.

Oh-My-OpenCode (OMO) is best understood as a harness configuration layer on top of OpenCode. In practice, it is one of the clearest implementations of HaaS. OMO does not invent the entire runtime from scratch; instead, it aggressively engineers the runtime through orchestration, hooks, prompt assembly, skill loading, rule injection, wisdom accumulation, and agent specialization. That is why OMO lowers the barrier to harness engineering. It packages many harness ideas that would otherwise require significant custom work.

This is also why OMO matters disproportionately in any comparison. It demonstrates that a large fraction of agent performance and usability can be created above the base model and above the base host, in the harness layer itself.

Why the Term Matters

The strategic shift from “scaffolding” to “harness engineering” is not cosmetic. “Scaffolding” tells us that the model is not enough. “Harness engineering” tells us what to do next.

It tells teams to stop treating agent errors as isolated incidents and start treating them as design opportunities. It tells them to convert soft advice into deterministic mechanisms. It tells them to formalize knowledge, constrain entropy, isolate context, and instrument feedback loops. Most importantly, it reframes the core product question. The question is no longer merely: how smart is the model? The question becomes: how well does the system transform model intelligence into repeatable engineering work?

That is the deeper lesson of this era. Frontier models provide raw cognitive power. Harness engineering decides whether that power becomes reliable software production.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 1 — The Evolution of Coding Agents
Token Usage: ~7,100 tokens

1.4 The Three Systems Under Study

This book focuses on three systems because together they expose the most important design space in modern coding agents: an open foundation, a high-complexity orchestration layer, and a commercial vertically integrated product. Those three systems are OpenCode, Oh-My-OpenCode, and Claude Code.

They should not be treated as interchangeable products that happen to share a category label. They represent different architectural bets about how an AI coding agent should be built, extended, governed, and deployed. OpenCode asks what a general-purpose open agent substrate should look like. Oh-My-OpenCode asks how far that substrate can be pushed through plugins, hooks, and multi-agent orchestration. Claude Code asks what happens when a model vendor integrates agent design, safety systems, and commercial delivery into one stack.

OpenCode: The Open-Source Foundation

OpenCode is the foundational system in this comparison. It is an open-source TypeScript and Bun codebase that functions less like a narrow assistant and more like an extensible agent platform. Its importance lies not only in what it can do directly, but in what it makes possible for downstream systems such as Oh-My-OpenCode.

Architecturally, OpenCode is built around a provider abstraction layer, a session engine, a tool system, and a plugin surface. The provider layer is especially significant because it supports a broad set of model backends through Vercel AI SDK integration and related adapters. In practice, that means OpenCode can work with more than twenty providers or provider-like endpoints, including Anthropic, OpenAI, Google, AWS Bedrock, Azure OpenAI, Ollama, LM Studio, vLLM, OpenRouter, and others. This immediately places it on the model-agnostic side of the market divide discussed in the previous section.

That model agnosticism is not an incidental feature. It changes the product’s strategic role. OpenCode is not primarily trying to force the user into one vendor’s model economics or one hosted stack. Instead, it acts as a coordination layer above models. This makes it especially relevant for engineers who care about portability, experimentation, hybrid local/cloud setups, or minimizing supplier dependence.

OpenCode is also notable for its interface breadth. It is not limited to one terminal binary. The inspected repository shows a multi-interface design spanning CLI or TUI usage, web surfaces, desktop packaging, SDK-based programmatic embedding, and ACP, an agent communication protocol. That range matters because it suggests OpenCode is not merely a command-line tool; it is an attempt at a reusable agent runtime that can appear in multiple developer contexts.

The tool system is correspondingly broad. The repository inspection identifies roughly two dozen core tools, including file reading, editing, bash execution, grep, globbing, LSP-backed operations, patch application, and web retrieval. In an agentic system, this tool layer is where the abstract model becomes an operational developer. OpenCode therefore deserves attention not as a UI product alone, but as a host environment for controlled software work.

From the perspective of this book, OpenCode is best understood as the foundation layer: open, extensible, provider-agnostic, and deliberately shaped to support richer orchestration above it.

Oh-My-OpenCode: The Orchestration Layer

If OpenCode is the foundation, Oh-My-OpenCode is the argument that foundation alone is not enough. Oh-My-OpenCode, or OMO, is not a fork in the usual sense. It is a large plugin-based orchestration layer built on top of OpenCode. That relationship is central. OMO does not replace the host system; it composes with it.

The scale of that composition is one of the most striking facts in this comparison. Repository inspection shows approximately 1,134 TypeScript files and roughly 129,754 lines of code in the OMO codebase. That is a very large amount of logic to implement as an extension layer. It means OMO is not a thin customization package. It is effectively a second system living inside the extension surfaces of the first.

This is why OMO deserves special attention from senior engineers. It demonstrates that the architecture of the host matters enormously. A host with shallow extension points can only support cosmetic customization. A host with deep, well-placed hooks can support an entire orchestration framework. OMO exists because OpenCode’s architecture is extensible enough to host it.

OMO’s design philosophy is notably more interventionist than that of simpler assistants. One of its explicit ideas is that human intervention is often treated as a failure signal. In other words, if the user has to step in frequently, the system interprets that as evidence that the orchestration layer should have handled more of the process itself. Whether one agrees with that philosophy in all settings, it is a clear agentic stance: reduce dependency on manual steering.

Repository inspection shows OMO organized around agents, hooks, tools, configuration, and feature modules. Public descriptions of OMO often emphasize large counts such as 11 agents and 41 hooks. In the inspected repository snapshot used for this book, the verifiable built-in counts were lower: 8 builtin agents and 37 hook directories. That discrepancy is itself informative. It reminds us that agent systems evolve quickly, marketing numbers drift, and source inspection is often more reliable than secondary summaries.

Even with the conservative repository counts, OMO is still a sophisticated multi-agent orchestration system. The inspected built-in agents include roles such as Sisyphus, Hephaestus, Oracle, Librarian, Explore, Metis, Momus, and Multimodal Looker. These names reflect specialization. Instead of assuming one monolithic agent prompt can do all work equally well, OMO partitions responsibilities across role-shaped agents with different prompts and behavior patterns.

The hook system is just as important. OMO multiplexes a large semantic hook layer onto a much smaller host hook surface. This is a powerful design pattern. Rather than demanding that the host expose dozens of lifecycle events directly, OMO uses the host’s core hooks as insertion points and builds its own higher-level execution structure on top. That gives OMO a distinct identity while still remaining a plugin.

Another notable aspect is compatibility strategy. The inspection documents indicate that OMO includes loaders for Claude Code-style agents, commands, plugins, and MCP configuration paths. That is not merely a convenience feature. It is a migration argument. OMO is effectively saying that an open system should lower switching cost for users coming from a commercial one.

In short, OMO is the orchestration thesis in code form: tools are not enough, one generic agent is not enough, and the real leverage lies in structured coordination.

Claude Code: The Commercial Vertical Stack

Claude Code represents a different path. Where OpenCode emphasizes openness and OMO emphasizes orchestration on top of openness, Claude Code emphasizes a commercially integrated stack from Anthropic. It is also built in TypeScript and Bun, but the similarities with OpenCode end quickly once one looks at product posture.

Claude Code is designed as a commercial coding agent tied closely to Anthropic’s model ecosystem, safety posture, and product packaging. The inspected local codebase is large, with roughly two thousand TypeScript files and more than half a million lines of source. That alone signals a broad product surface. But the more important observation is architectural integration.

Claude Code includes a substantial tool surface, with inspection indicating roughly 56 core tools and more than 100 commands, specifically about 126 slash commands in the local snapshot. This points to a product optimized not merely for raw model interaction, but for structured workflows. Commands are a form of operational affordance. They compress repeated interaction patterns into discoverable entry points. A large command system often indicates maturity in user workflow design.

Claude Code also appears more enterprise-ready by construction. The inspected notes point to commercial-grade permissioning, cost tracking, service-layer organization, plugin support, hooks, skills, and background task systems. One especially important detail is the presence of an ML-based permission classifier. This is a deeply commercial feature. It suggests the system is trying to reduce unnecessary prompts without giving up safety boundaries, which is exactly the kind of tradeoff that becomes critical when autonomous tools are deployed in real teams.

In a traditional tool, permissions are often rule-based and static. In an agentic tool, static rules alone can become too blunt. Ask for permission too often and the product becomes unusable; ask too rarely and it becomes unsafe. A learned permission classifier is therefore not just an implementation detail. It is a sign that Claude Code is investing in operational usability at scale.

Claude Code also differs from the two open systems in product ownership. Anthropic controls the model roadmap, the agent product, and much of the surrounding commercial surface. This allows tighter optimization but also increases vendor coupling. From a buyer’s perspective, that may be an advantage or a drawback depending on priorities. For teams that want a supported, integrated experience, it is attractive. For teams that prioritize modifiability and provider independence, it is constraining.

Relationship Among the Three

The relationship between these systems can be stated simply.

OpenCode is the foundation.
Oh-My-OpenCode is an orchestration layer atop that foundation.
Claude Code is an independent commercial path.

But that simple summary hides an important structural lesson. OpenCode and OMO are compositionally related. Claude Code is not downstream of either. It is a separate line of evolution. This matters because it lets us compare two different ways of reaching advanced coding-agent behavior.

One path is open substrate plus extension. That is OpenCode plus OMO. The other path is vertically integrated commercial development. That is Claude Code.

These paths produce different kinds of strengths. The open path tends to optimize for transparency, portability, and extensibility. The commercial path tends to optimize for product polish, integrated safety, and enterprise packaging. Neither path is automatically superior. The deeper question is what tradeoffs each path makes, and what those tradeoffs teach us about designing the next generation of coding agents.

Why These Three Matter Together

Taken together, these systems form a particularly useful study set.

OpenCode shows how to design a reusable open agent substrate. OMO shows how far orchestration, specialization, and hook-driven extension can go without forking the host. Claude Code shows what a commercially optimized agent looks like when the model vendor itself owns the product surface.

That combination is analytically powerful because it prevents a shallow debate. We are not comparing three copies of the same product with different branding. We are comparing three architectural positions in the market.

This is also why the rest of the book alternates between common patterns and distinctive choices. All three systems participate in the same broader era of agentic coding. All three use tools, context assembly, and some form of ReAct-like loop. But they diverge sharply in provider strategy, extensibility philosophy, orchestration depth, and safety implementation.

Those divergences are the real subject of the book. The systems matter not only as products, but as design arguments.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 2 — The Three Systems at a Glance
Token Usage: ~10,500 input + ~1,700 output

2.1 OpenCode: The Open-Source Foundation

OpenCode is the substrate in this three-system comparison. If Claude Code represents a polished commercial product and Oh-My-OpenCode represents an aggressive orchestration layer, OpenCode is the underlying open engine: modular, type-heavy, multi-interface, and intentionally extensible. Its importance is not just that it is open source. Its importance is that it exposes enough internal surface area for a third party to build an entire higher-order agent system inside it without forking the host runtime.

At the repository root, https://github.com/anomalyco/opencode/tree/main/package.json defines the baseline stack: TypeScript 5.8.2, Bun 1.3.10, and Turbo 2.8.13. That combination matters. TypeScript gives strong compile-time guarantees across the codebase, Bun reduces dependency on Node-oriented tooling overhead and accelerates local execution, and Turbo turns the repository into a scalable monorepo rather than a loose collection of packages. The monorepo orchestration itself is declared in https://github.com/anomalyco/opencode/tree/main/turbo.json, while workspaces are managed through Bun at the root package layer.

The framework choices are also highly revealing. OpenCode is built around Vercel AI SDK for LLM abstraction, Hono for HTTP serving, Drizzle ORM for SQLite persistence, Zod for schema validation, and Solid.js for both terminal and web UI layers. This is not an arbitrary collection. Each library corresponds to a specific architectural value. Vercel AI SDK normalizes access to more than twenty providers through a shared streaming interface. Hono keeps the server layer lean. Drizzle preserves static typing at the persistence boundary. Zod makes runtime validation explicit. Solid.js optimizes for reactive UI without the full weight of a conventional React stack.

The package layout shows OpenCode’s intended breadth. The central runtime lives in https://github.com/anomalyco/opencode/tree/main/packages/opencode. The browser-facing application is in packages/app. The desktop wrapper is in packages/desktop, where Tauri and Rust provide native packaging. The JavaScript SDK is in packages/sdk/js, and the plugin-facing interface is anchored in packages/plugin/src/index.ts. Together, these packages make OpenCode less like a single CLI and more like an agent platform with multiple access surfaces.

Those surfaces are unusually broad for an open coding agent. OpenCode supports a classic CLI, a reactive TUI built with Solid.js terminal components under /packages/opencode/src/cli/cmd/tui/, a web app built with Solid.js, Tailwind, and Vite under /packages/app, a desktop application through Tauri under /packages/desktop, and an HTTP API implemented with Hono under /packages/opencode/src/server/. This multi-interface architecture is strategically important: the execution core is not tightly coupled to one presentation layer. In systems terminology, the UI is a client of the agent runtime, not the runtime itself.

OpenCode’s internal organization relies heavily on what can be called a namespace organization pattern. Instead of flattening behavior into ad hoc modules, the codebase groups concepts into stable domains such as Agent, Tool, Session, Provider, Plugin, and Bus. For senior engineers, this matters because it reduces conceptual entropy. The code starts to read like an operating system for agent workflows rather than a chat app with utilities attached.

One of the best examples of this philosophy is instance management. In https://github.com/anomalyco/opencode/tree/main/packages/opencode/src/project/instance.ts, OpenCode implements instance-based context management. An “instance” is not just a singleton service; it is an execution context bound to project state, cached resources, and lifecycle events. This design makes multi-project operation practical and lets the system treat project scope as a first-class runtime boundary.

Eventing is equally deliberate. The bus system under /packages/opencode/src/bus/ uses typed event definitions rather than stringly typed broadcast calls. That makes the event bus effectively type-safe, a useful property in a long-running agent system where hidden event contracts can otherwise rot quickly. In conventional software engineering terms, OpenCode uses a message bus to decouple subsystems without surrendering all compile-time guarantees.

Zod appears throughout the system, not only at external interfaces but inside core architecture. Tool schemas, config schemas, message structures, and plugin contracts all flow through explicit validation. This matters for agentic software because LLM-adjacent systems constantly cross the boundary between structured and unstructured data. Without validation, silent schema drift becomes a correctness bug. OpenCode’s use of Zod turns those boundaries into checked gates.

The built-in agent roster is split between user-facing agents and internal utility agents. In https://github.com/anomalyco/opencode/tree/main/packages/opencode/src/agent/agent.ts, the repository defines seven built-in agents: build, plan, general, explore, compaction, title, and summary. For everyday use, four are the most important. build is the main execution agent. plan is a read-only planner. general is a broader utility executor. explore is a read-focused codebase investigator. The remaining three are infrastructure agents used for session management and summarization.

The tool system is also mature rather than minimal. Under https://github.com/anomalyco/opencode/tree/main/packages/opencode/src/tool/, OpenCode ships roughly 22 built-in tools, including Bash, Read, Write, Edit, Glob, Grep, LSP, Task, TodoWrite, TodoRead, WebFetch, WebSearch, ApplyPatch, MultiEdit, and plan-mode tools. The important part is not only count but execution model. In /packages/opencode/src/tool/tool.ts, tool execution receives rich contextual input: sessionID, messageID, agent identity, abort signal, message history, permission callbacks, and metadata channels. This is a good example of agent-computer interface design: tools are not dumb functions; they are runtime-aware capabilities.

Permissions are part of this design. Tools are permission-aware, meaning the runtime can reason about safety at invocation time rather than only trusting the prompt layer. That is a subtle but important distinction. Prompt instructions can guide. Permission systems can enforce.

Persistence is another core strength. Sessions are stored in SQLite through Drizzle ORM, with the schema implemented in https://github.com/anomalyco/opencode/tree/main/packages/opencode/src/session/session.sql.ts. The underlying message abstraction is not a flat chat log. In /packages/opencode/src/session/message-v2.ts, MessageV2 is defined as a multi-part structure containing parts such as text, reasoning, tool, file, snapshot, and patch. From a CS perspective, this is a structured log format rather than plain transcript text. That distinction is crucial because it lets the runtime reconstruct not only what the agent said, but what it inspected, changed, and reasoned about.

OpenCode’s protocol support further extends its role as an integration platform. Under /packages/opencode/src/mcp/, it implements a full MCP client with stdio, SSE, and HTTP transports, plus OAuth-related files such as auth.ts, oauth-callback.ts, and oauth-provider.ts. It also includes ACP, the Agent Client Protocol, under /packages/opencode/src/acp/, which supports editor integrations such as Zed. MCP gives it a universal capability bridge. ACP gives it a structured client bridge.

The plugin model is one of the clearest reasons OpenCode matters historically. In https://github.com/anomalyco/opencode/tree/main/packages/plugin/src/index.ts, plugins are modeled as async functions returning hook implementations such as auth, event, tool, chat.params, and chat.headers, along with more advanced lifecycle points. This is a powerful pattern because it preserves laziness: plugins can initialize resources on demand, then expose only the hooks they need. In effect, the host exports a stable kernel and plugins register interrupt handlers.

Finally, OpenCode is packaged for multiple deployment channels. The project supports npm, Homebrew, Scoop, and Pacman, and infrastructure configuration exists through SST in https://github.com/anomalyco/opencode/tree/main/sst.config.ts and the infra/ directory for cloud deployment. That deployment story matters because it signals operational seriousness. OpenCode is not only a hackable local tool. It is designed to run as distributed infrastructure.

In summary, OpenCode earns the label “open-source foundation” not merely because the code is visible, but because the architecture is composable. It has a strong type system, a broad interface surface, a structured session model, real protocol support, and a plugin system powerful enough to host an entire orchestration layer. That makes it the most important baseline in this comparison: not the most autonomous system, not the most commercialized one, but the one that exposes the deepest architectural seams.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 2 — The Three Systems at a Glance
Token Usage: ~11,000 input + ~1,900 output

2.2 Oh-My-OpenCode: The Orchestration Revolution

If OpenCode is the engine, Oh-My-OpenCode is the attempt to turn that engine into an autonomous engineering organization. The most important fact about the project is architectural, not cosmetic: it is built as a plugin that runs inside OpenCode, not as a fork that replaces it. That single decision proves the expressive power of the host platform and also explains why Oh-My-OpenCode can move so quickly. It inherits OpenCode’s interfaces, session model, tools, and plugin contracts, then layers a far more opinionated control plane on top.

The scale of the plugin is already a statement. Repository analysis shows roughly 1,134 TypeScript files and about 129,754 lines of code and related TypeScript surface area when tests and surrounding code are counted at full project scale, making it less a “plugin” in the casual sense and more a resident subsystem. The main integration point is https://github.com/code-yeongyu/oh-my-openagent/tree/main/src/index.ts, while /src/plugin-interface.ts and /src/create-hooks.ts map OpenCode hook points into OMO’s own orchestration mechanisms.

The philosophy is explicit in https://github.com/code-yeongyu/oh-my-openagent/tree/main/docs/ultrawork-manifesto.md. The Ultrawork Manifesto treats human intervention as a failure signal, argues that AI-generated code should be indistinguishable from senior-engineer code, and evaluates token cost against productivity instead of against frugality alone. This is a very different worldview from ordinary assistant software. The implicit benchmark is not “Did the model answer correctly?” but “Did the system absorb enough operational burden that the human mostly supplied intent?”

Technically, that worldview materializes as a three-layer orchestration architecture, documented in https://github.com/code-yeongyu/oh-my-openagent/tree/main/docs/guide/understanding-orchestration-system.md. The first layer is Planning, where Prometheus, Metis, and Momus operate. The second layer is Execution, coordinated by Atlas. The third layer is Workers, where specialized agents actually perform the task. In classical distributed-systems language, this is a planner-conductor-worker pipeline. In agent language, it is a deliberate attempt to replace one large generic prompt with a controlled society of smaller roles.

The agent roster is unusually specific, both in behavior and model allocation. OMO’s documented set includes Sisyphus as the main orchestrator on Claude Opus 4.6, Hephaestus on GPT 5.3 Codex as a deep worker, Oracle on GPT 5.2 as a read-only consultant, Librarian on GLM-4.7 for external search, Explore on Grok Code Fast 1 for rapid codebase search, Sisyphus-Junior on Sonnet 4.5 as a focused executor, Prometheus on Opus 4.6 as planner, Metis on Opus 4.6 as pre-planning analyst, Momus on GPT 5.2 as plan reviewer, and Atlas as orchestration conductor. This list is rooted in files such as /src/config/schema/agent-names.ts, /src/agents/builtin-agents.ts, and the feature documentation under /docs/features.md.

The design choice here is worth underlining. OMO does not merely diversify prompts. It diversifies model assignments by role. That is a practical answer to an old systems problem: specialized components should not all be optimized for the same constraint. Some tasks need broad reasoning, some need cheap iteration, some need external retrieval, and some need visual processing. OMO treats model selection as a scheduling problem rather than a branding problem.

This is why its semantic category system matters. In https://github.com/code-yeongyu/oh-my-openagent/tree/main/src/config/schema/categories.ts, categories such as visual-engineering, ultrabrain, quick, and deep describe intent, not explicit model names. The system then resolves those intents to models, variants, temperatures, effort levels, and tool policies. The rationale is subtle but important: if users or higher-level prompts always choose models directly, the system inherits model self-perception bias and configuration sprawl. By routing through semantic categories, OMO decouples task semantics from backend assignment.

Hooking is where OMO becomes revolutionary. The plugin defines 41 active hooks across 5 tiers, with named hook inventory living in https://github.com/code-yeongyu/oh-my-openagent/tree/main/src/config/schema/hooks.ts and orchestration wiring in /src/create-hooks.ts. The five major layers are commonly described as Session, Tool-Guard, Transform, Continuation, and Skill. Concretely, this means OMO can intercept or modify context before model calls, guard tools before and after execution, enforce continuation logic, attach skill-specific behavior, and inject session-aware control logic. This is no longer simple prompt customization. It is runtime policy programming.

Several innovations emerge from this hook-heavy design.

First is Ultrawork mode, usually triggered by keywords and special control flow. Its goal is to suppress half-finished assistant behavior and push the system toward end-to-end completion. Second is the Ralph Loop, implemented under /src/hooks/ralph-loop/, which keeps the agent in a relentless continuation cycle until the task is genuinely complete. Third is the Todo Continuation Enforcer under /src/hooks/todo-continuation-enforcer/, which prevents the agent from stopping while todos remain incomplete. In software engineering terms, this is a liveness constraint: the runtime tries to guarantee forward progress instead of relying on polite prompt instructions.

Fourth is Wisdom Accumulation, implemented around .sisyphus/notepads/ and related support code such as /src/hooks/sisyphus-junior-notepad/. This mechanism stores learnings, decisions, verification outcomes, and issues discovered during execution. In CS textbook terms, this is a derived memory layer: not raw transcript replay, but distilled operational state. It reduces repeated mistakes across subtasks and gives the orchestrator something closer to working memory rather than mere conversation history.

Fifth is the Tmux visual multi-agent system under /src/features/tmux-subagent/. Instead of hiding concurrency behind a single terminal transcript, OMO can expose multiple agent panes in real time. This changes the ergonomics of trust. The user can literally watch agents work in parallel, which turns orchestration from an invisible scheduler into an observable process.

Another major feature is skill-embedded MCPs. Rather than treating skills as static prompt fragments, OMO allows skills to carry MCP services with them. This is implemented through features such as /src/features/skill-mcp-manager/ and the OpenCode skill loader integration. The consequence is architectural convergence: documentation, prompt logic, and external tool connectivity can travel as one unit.

Background agents are also treated as first-class citizens. Under https://github.com/code-yeongyu/oh-my-openagent/tree/main/src/features/background-agent/manager.ts and /src/features/background-agent/concurrency.ts, OMO manages session-spawned background tasks with a concurrency limit of 5 concurrent agents per model/provider by default. Parent sessions receive completion notifications, and child sessions can be resumed through session IDs. This is an important distinction from simple subagent invocation: OMO treats asynchronous delegation as durable work, not as a hidden nested call.

Finally, OMO’s Claude Code compatibility layer is strategically brilliant. It can load Claude Code plugins, commands, agents, and MCP definitions from .opencode/-style compatibility directories and .mcp.json files, via loaders such as /src/features/claude-code-plugin-loader/loader.ts, /src/features/claude-code-agent-loader/loader.ts, /src/features/claude-code-command-loader/loader.ts, and /src/features/claude-code-mcp-loader/loader.ts. This lowers migration cost and positions OMO not as a closed ecosystem but as a compatibility super-layer.

The net effect is striking. OpenCode gives a host runtime. OMO turns that runtime into a managed labor system with planning, review, delegation, memory, observability, and continuity enforcement. Its greatest strength is not any single agent or hook. It is the fact that it reconceives a coding assistant as an organization whose failures should be routed into orchestration logic rather than dumped onto the user. That is why “orchestration revolution” is not marketing language here. It is a fairly literal description of what the software is trying to do.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 2 — The Three Systems at a Glance
Token Usage: ~10,800 input + ~1,850 output

2.3 Claude Code: The Commercial Benchmark

Claude Code is the commercial reference point in this comparison because it demonstrates what happens when an AI coding agent is pushed beyond “useful developer tool” into “managed product surface.” It is not merely a model shell. It is a full CLI environment with strong permissions, extensive tool coverage, cost accounting, compaction strategies, team-oriented memory, and multiple operating modes. If OpenCode is a platform kernel and Oh-My-OpenCode is an orchestration experiment, Claude Code is the benchmark for productization.

At the stack level, Claude Code is built in TypeScript and runs on Bun, with its main CLI entry point in https://github.com/anthropics/claude-code/tree/main/src/entrypoints/cli.tsx. That file already reveals a product mindset. It contains feature-gated fast paths, aggressive startup branching, and lazy module loading so that trivial operations like version reporting do not pay the cost of booting the full application. This is a minor detail only if one has never shipped a CLI at scale. For a tool that users invoke dozens or hundreds of times per week, startup latency is part of the user experience contract.

The user interface is far more ambitious than a minimal terminal wrapper. Claude Code contains a custom Ink-based terminal UI implementation, with approximately 100 files under src/ink/ and a broad component tree extending across roughly 148 component-oriented directories and substructures when the wider UI and command surfaces are included. The implementation uses React for terminal rendering but does not stop at stock Ink primitives. The repository includes layout, rendering, focus, cursor, selection, wrapping, and bidi support, indicating that terminal UX is treated as an application platform rather than as plain text streaming.

From a capability perspective, the tool inventory is one of Claude Code’s clearest differentiators. The repository contains 61 built-in tool directories under https://github.com/anthropics/claude-code/tree/main/src/tools/, representing more than seventy concrete tool classes. Representative tools include BashTool, FileReadTool, FileWriteTool, FileEditTool, WebFetchTool, WebSearchTool, browser-related tooling, MCPTool, AgentTool, task tools such as TaskCreateTool, TaskGetTool, TaskListTool, and TaskOutputTool, SendMessageTool, and LSPTool. The breadth matters because it reduces the need to encode workflow as prompt tricks. Instead, more operations become explicit capabilities.

The permission model is arguably Claude Code’s strongest architectural feature. It exposes four main user-facing modes in common discussion—default, auto, bypass, and plan—while internal types include additional states such as acceptEdits and dontAsk, as seen in https://github.com/anthropics/claude-code/tree/main/src/types/permissions.ts. More important than the labels is the layered enforcement approach. There is an ML-based YOLO classifier in /src/utils/permissions/yoloClassifier.ts, a bash command safety classifier in /src/utils/permissions/bashClassifier.ts, dangerous pattern detection in /src/utils/permissions/dangerousPatterns.ts, and OS-level sandboxing via bubblewrap- or Seatbelt-style controls mediated through /src/utils/sandbox/sandbox-adapter.ts. The often-cited effect is roughly 84% permission prompt reduction, not because the product abandons safety, but because it shifts more decisions into policy engines and classifiers.

This is an instructive software engineering move. In simpler agents, permissioning is binary and reactive: ask the user or do nothing. In Claude Code, permissioning is predictive, risk-weighted, and mode-sensitive. The result is lower friction without fully surrendering control. In systems terms, Claude Code optimizes both safety and throughput by introducing a smarter approval plane.

Session handling is similarly productized. Instead of SQLite, Claude Code persists conversations in JSONL under ~/.claude/sessions/, managed by code such as https://github.com/anthropics/claude-code/tree/main/src/utils/sessionStorage.ts and /src/utils/sessionRestore.ts. The core transcript abstraction is based on a TranscriptMessage-style log, and the product supports operational features such as resume, compact, and snip. JSONL is a notable choice. It is simpler and more inspectable than a relational store, easier to append to, and better aligned with transcript-like event streams. The tradeoff is weaker ad hoc query structure compared to a database-backed message graph.

On the model side, Claude Code prioritizes Anthropic first-party access while also supporting enterprise deployment paths such as AWS Bedrock, Google Vertex, and Foundry-style provider indirection, visible in files such as /src/utils/model/model.ts, /src/utils/model/providers.ts, and /src/utils/model/bedrock.ts. The model resolution process is not a single constant; it is a priority chain with capability checks, aliases, and environment-aware fallback logic. This is what one expects in a commercial system that must operate across direct SaaS usage and enterprise cloud boundaries.

Its MCP implementation is also unusually complete. The client in https://github.com/anthropics/claude-code/tree/main/src/services/mcp/client.ts is approximately 3,348 lines long and implements a broad MCP 1.0 feature set including official registry interactions, OAuth, elicitation, permissions, and in-process transport support. This is significant because many products “support MCP” only at the level of launching a few stdio servers. Claude Code treats MCP as a serious extension substrate.

The command system is another marker of maturity. Under https://github.com/anthropics/claude-code/tree/main/src/commands/, the repository contains 100+ slash commands—in this snapshot, about 112 command directories—covering prompt-oriented commands, direct actions, and interactive workflows. This moves Claude Code closer to a shell environment or editor command palette than a single-purpose chat agent.

Commercialization also shows up in money handling. In https://github.com/anthropics/claude-code/tree/main/src/cost-tracker.ts, Claude Code performs built-in USD cost calculation, tracks model usage, and supports per-session budgeting through concepts such as maxBudgetUsd. It also tracks cache read and cache write token behavior. This is not just billing trivia. Cost telemetry is part of control theory for agent systems. Once an agent can reason, browse, delegate, and compact, cost becomes an operational metric akin to CPU time.

Project memory is richer than many open systems. Claude Code supports multiple memory types—commonly described as user, feedback, project, and reference—and extends into team memory sharing via files under /src/memdir/ and related team memory utilities. In other words, the product treats memory not as one notebook but as a typed hierarchy. That is a sign of organizational rather than purely personal usage.

Compaction is also multi-strategy rather than monolithic. The repository includes auto-compact, snip-compact, micro-compact, session memory compact, and broader context-collapse logic under /src/services/compact/. This is a hard-won product lesson. There is no single best compression strategy for every conversation. Sometimes one needs summarization. Sometimes one needs selective deletion. Sometimes one needs token-level condensation. Claude Code encodes that pluralism directly into the runtime.

Enterprise support rounds out the picture: remote managed settings, team memory, policy-aware limits, and a plugin marketplace orientation. But perhaps the most interesting aspect is that Claude Code is not truly single-agent anymore. Through AgentTool, background DreamTasks-style task execution, and SendMessageTool for inter-agent communication, it supports a practical multi-agent architecture with isolated contexts. It also exposes higher-level modes such as Coordinator mode, Bridge mode for remote control and IDE-style interaction, and Assistant mode under the KAIROS family of functionality.

That said, Claude Code’s strength comes with a tradeoff. It exposes enormous capability, but through a more curated surface than OpenCode. It supports orchestration, but in a more controlled and product-safe way than OMO. It is therefore the commercial benchmark precisely because it is optimized for the difficult middle: enough autonomy to be powerful, enough safety to be deployable, and enough product structure to be governable inside organizations.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 2 — The Three Systems at a Glance
Token Usage: ~9,200 input + ~1,450 output

2.4 Quick Comparison Matrix

The three systems can be understood as occupying different points in the design space of AI coding agents. OpenCode is the extensible open runtime. Oh-My-OpenCode is the orchestration super-layer built inside that runtime. Claude Code is the commercial benchmark with stronger product controls. The table below compresses the key differences.

Dimension	OpenCode	Oh-My-OpenCode	Claude Code
Language	TypeScript 5.8.2	TypeScript	TypeScript / TSX
Runtime	Bun 1.3.10	Runs inside OpenCode as plugin	Bun
Architecture type	Open monorepo agent platform	Multi-agent orchestration layer on host runtime	Commercial CLI agent platform
Model support	20+ providers via Vercel AI SDK in `https://github.com/anomalyco/opencode/tree/main/packages/opencode/src/provider/provider.ts`	Multi-model role routing via semantic categories in `https://github.com/code-yeongyu/oh-my-openagent/tree/main/src/config/schema/categories.ts`	Anthropic first-party plus Bedrock, Vertex, Foundry-style provider chain in `https://github.com/anthropics/claude-code/tree/main/src/utils/model/`
Agent count	7 built-ins total; 4 user-facing core agents (`build`, `plan`, `general`, `explore`)	10+ named orchestration agents plus Sisyphus-Junior and multimodal roles	Subagent support via AgentTool; agent count is product-defined rather than fixed roster
Tool count	~22 built-in tools under `https://github.com/anomalyco/opencode/tree/main/packages/opencode/src/tool/`	26+ custom tools layered atop OpenCode tools	61 built-in tool directories under `https://github.com/anthropics/claude-code/tree/main/src/tools/`
Hook system	Plugin hooks in `/packages/plugin/src/index.ts`; rich host lifecycle	41 hooks across 5 tiers in `/src/config/schema/hooks.ts` and `/src/create-hooks.ts`	Product hooks and command surfaces, but less open than OpenCode’s plugin kernel
UI	CLI, Solid.js TUI, Solid.js web app, Tauri desktop, HTTP API	Inherits OpenCode UI and adds tmux visual multi-agent panes	Custom Ink terminal UI with large component stack
Storage	SQLite via Drizzle ORM; MessageV2 multi-part persistence	Inherits OpenCode session base plus extra notepads and orchestration state	JSONL transcripts in `~/.claude/sessions/` with metadata
Permission model	Permission-aware tools and runtime checks	Inherits host permissions, adds tool guards and continuation constraints	Multi-mode permissions, YOLO classifier, safety classifier, dangerous pattern detection, sandboxing
MCP support	Full MCP client with stdio/SSE/HTTP and OAuth	Uses host MCP and adds skill-embedded MCPs plus compatibility loaders	Full MCP 1.0 client with registry, OAuth, elicitation, permissions, in-process transport
Context management	Instance-based context management; structured MessageV2 parts	Context injection, todo enforcement, wisdom accumulation, continuation loops	Resume, compaction, snip, micro-compact, memory-aware context collapse
Cost tracking	Basic provider abstraction; not the main differentiator	Token cost discussed philosophically as productivity tradeoff	Built-in USD cost tracking, budgeting, cache token accounting
Extensibility	Plugin SDK, JS SDK, MCP, ACP, multi-package platform	OpenCode plugin plus categories, hooks, skills, Claude Code compatibility	Commands, plugins, MCP, managed settings, enterprise-oriented extension model
Deployment	npm, brew, scoop, pacman, SST-backed infra	npm plugin deployment into OpenCode ecosystem	Official product distribution and enterprise deployment channels
License posture	Open-source foundation	Open-source orchestration plugin	Commercial / proprietary benchmark

The first point of comparison is where each system is strongest.

OpenCode excels at architectural openness. Its biggest advantage is not raw autonomy but composability. The monorepo structure under https://github.com/anomalyco/opencode/tree/main/, the SDK and plugin packages, the typed tool context, the SQLite-backed session model, and the MCP plus ACP support together make it a genuine platform. For engineers who want to understand or modify the internals of a coding agent, OpenCode is the best substrate of the three. It exposes enough structure that one can study or extend real runtime behavior rather than merely configure prompts.

Oh-My-OpenCode excels at orchestration and autonomy. Its strength is that it treats “assistant failure” as an orchestration bug to be solved. The three-layer architecture, the semantic category system, background sessions, Ralph Loop, Todo Continuation Enforcer, and .sisyphus/notepads/ wisdom accumulation all push toward a system that can sustain momentum with less human micromanagement. It is the most aggressive design in the comparison. For teams interested in multi-agent workflow design, it is the most conceptually ambitious system.

Claude Code excels at product maturity and operational control. It combines broad tool coverage, cost governance, session recovery, compaction strategies, enterprise settings, and a serious safety stack. Its permission model is the best-developed of the three, and its terminal experience is the most polished. For organizations that care about deployability, repeatability, compliance, and supportability, Claude Code defines the benchmark.

The second point of comparison is where each system is limited.

OpenCode’s limitation is that openness does not automatically produce opinionated autonomy. It provides many core ingredients, but by itself it is less forceful than OMO in planning, delegation, and continuity enforcement. A platform can be architecturally elegant while still requiring downstream products or plugins to impose workflow discipline.

Oh-My-OpenCode’s limitation is complexity. Once a system introduces many agents, many hooks, category routing, background sessions, skill-embedded MCPs, and continuation enforcement, it also introduces more failure surfaces. Every orchestrator eventually faces a classic distributed-systems problem: coordination logic can become more complicated than the work it coordinates. OMO gains power by accepting this complexity debt.

Claude Code’s limitation is reduced openness. It is highly capable, but more controlled. Extension exists, but not with the same kernel-like transparency as OpenCode. Multi-agent features exist, but within a product boundary. For researchers and systems builders, this means Claude Code is exemplary to study but less ideal as a freely moldable substrate.

This leads to the most useful high-level frame for the whole book: the tradeoff triangle of openness, safety, and autonomy.

If a system maximizes openness, it tends to resemble OpenCode. Internal seams are visible, extension is powerful, and architectural learning is easy. But openness alone does not guarantee operational safety or autonomous completion.

If a system maximizes autonomy, it starts to resemble Oh-My-OpenCode. It pushes more planning, delegation, and continuation into the machine. But greater autonomy typically increases orchestration complexity and can pressure the safety model.

If a system maximizes safety and governability, it moves toward Claude Code. More permissions, more budgeting, more compaction control, more enterprise policy. But stronger governance often means less radical openness and slower experimentation at the deepest runtime layers.

The central design lesson is not that one corner of the triangle wins absolutely. It is that every serious coding agent must choose where to sit inside it. OpenCode, OMO, and Claude Code are valuable precisely because they occupy three different but adjacent positions. Studying them together gives us not only a comparison of products, but a map of the design space itself.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 3 — The Core Loop: The ReAct Paradigm
Token Usage: ~11,500 input + ~1,650 output

3.1 The Think-Act-Observe Cycle

The deepest commonality across OpenCode, Oh-My-OpenCode (OMO), and Claude Code is not the exact model, the user interface, or even the tool inventory. It is the agent loop: the system receives an instruction, thinks about what to do next, decides whether to call a tool, observes the result, and then repeats until it can produce a final answer. This is the modern coding-agent form of the ReAct paradigm—short for Reason + Act.

graph TD
    A["👤 User Input"] --> B["📋 System Prompt Assembly"]
    B --> C["🧠 LLM Inference"]
    C --> D{Output Type?}
    D -->|"Text Only"| E["✅ Return Response"]
    D -->|"Tool Call"| F["🔧 Tool Execution"]
    F --> G["📎 Append Result to Context"]
    G --> C
    D -->|"Reasoning"| H["💭 Extended Thinking"]
    H --> C
    
    style A fill:#4a9eff,color:#fff
    style C fill:#ff6b6b,color:#fff
    style F fill:#51cf66,color:#fff
    style E fill:#ffd43b,color:#000

In classic computer science textbooks, one usually finds concepts such as control loops, event loops, finite-state machines, and search procedures. The ReAct loop is related to all of them, but it is not identical to any one of them. It is best understood as a language-model-driven control loop in which the transition policy is generated dynamically in natural language and structured tool calls rather than being fully hard-coded ahead of time.

At a high level, the shared cycle looks like this:

User request
   ↓
Assemble system prompt + conversation + tool schemas
   ↓
LLM inference begins streaming
   ↓
Emit reasoning / text / tool-call blocks
   ↓
If tool call appears: execute tool
   ↓
Append tool result to conversation state
   ↓
Ask model again with updated context
   ↓
Repeat until no more tool calls are needed
   ↓
Final answer

The same idea can be written as generic pseudocode:

messages = [system_prompt, prior_history, user_message]

while not stopped:
    response = LLM(messages, tools)

    stream response parts to UI

    if response contains tool_calls:
        for each tool_call in response:
            result = execute(tool_call)
            messages.append(tool_result(result))
        continue

    messages.append(response)
    stopped = true

return final_response

This pseudocode is deceptively simple. Most of the engineering difficulty lies in how each system represents response parts, how it streams them, how it retries failures, and how it decides when the loop should continue or stop.

From user input to loop state

The first phase is prompt assembly. Before the model can “think,” the runtime must decide what the model sees. In all three systems, this includes at least five categories of input:

a system prompt or instruction stack,
the prior conversation history,
the latest user message,
the available tool schemas,
runtime options such as model, temperature, and permission context.

In OpenCode, this assembly is visible in session/llm.ts. The LLM.stream() function gathers provider configuration, merges model and agent options, constructs the system prompt array, triggers plugin hooks such as experimental.chat.system.transform and chat.params, resolves tools, and then passes the final bundle into Vercel AI SDK’s streamText(). That means prompt assembly is not just string concatenation. It is an instruction pipeline with transforms, provider-specific options, and tool filtering.

Claude Code performs the same conceptual step in a more productized way. QueryEngine.ts owns the query lifecycle, while the much larger query.ts manages the live loop. The system prompt, user context, system context, tool permission context, and ToolUseContext are combined into the per-turn execution state. Claude Code’s loop is therefore not just “messages plus tools”; it is “messages plus tools plus policy plus budget plus UI state plus resumability metadata.”

OMO inherits OpenCode’s basic prompt-and-stream machinery, but adds orchestration hooks around it. At the plugin interface level, OMO installs tool.execute.before, tool.execute.after, and experimental.chat.messages.transform. This is important: OMO does not replace the host ReAct loop. It bends it. It intercepts tool arguments before execution, modifies tool outputs after execution, and transforms message history before the next model call.

Thinking, acting, observing

Once prompt assembly is complete, the second phase is LLM inference. Here the model begins streaming a structured response. In older chat systems, a response was mainly plain text. In coding agents, a response is often a heterogeneous stream of parts:

natural language text,
reasoning or thinking blocks,
tool invocation blocks,
tool-result follow-up context,
finish markers with usage and stop reasons.

OpenCode’s SessionProcessor in session/processor.ts shows this clearly. It iterates over stream.fullStream and handles multiple event types: reasoning-start, reasoning-delta, reasoning-end, text-start, text-delta, text-end, tool-input-start, tool-call, tool-result, tool-error, start-step, and finish-step. In other words, the OpenCode runtime treats one assistant turn as a stream of typed sub-events, not a single blob.

This is the operational meaning of ReAct in a production agent: the “think” phase and the “act” phase are not separate API requests by default. They can both appear inside one streamed assistant trajectory.

Claude Code exposes the same structure in Anthropic-native terms. query.ts collects assistant messages, inspects tool_use blocks, runs tools, gathers tool results, and feeds them back into the next loop iteration. Its state object tracks messages, toolUseContext, turnCount, recovery flags, compaction state, and transitions. The loop continues while tool-use blocks imply follow-up work. The core idea is the same as OpenCode’s, but Claude Code names and organizes the control surface differently.

OMO again inherits the host behavior, but adds extra leverage points. Its messages-transform.ts chains context injection and thinking-block validation before the next model call. Its tool-execute-before.ts can rewrite tool arguments, inject rules, or even reinterpret task delegation metadata. Its tool-execute-after.ts can truncate outputs, enrich metadata, recover from errors, or trigger reminders. So OMO’s contribution is not a new ReAct loop; it is a hook-augmented ReAct loop.

Extended thinking as a non-standard concept

One term that deserves special explanation is extended thinking, especially in Claude-family systems. This is not a standard textbook computer science term like stack, heap, parser, or scheduling policy. It is a newer agent-era concept.

The simplest way to define it is this: extended thinking is a model mode in which the system allocates explicit budget for intermediate reasoning before or alongside visible output and tool use. In implementation terms, it usually means the API and runtime preserve or budget for “thinking” blocks, often with special validity rules.

Why call this non-standard? Because in traditional CS, internal reasoning is usually modeled as algorithmic state transitions hidden inside the program. Here, the “reasoning” is partly exposed as a first-class streamed artifact. It behaves almost like an intermediate representation, but it is probabilistic, language-native, and model-generated rather than compiler-generated.

Claude Code’s query.ts includes comments about the “rules of thinking”: thinking blocks must be preserved consistently across a turn, especially when tool use occurs. OMO even includes a thinkingBlockValidator in its message-transform pipeline, which shows how fragile this structure can be. Once a system exposes reasoning as a protocol artifact, it must enforce invariants around that artifact.

So, extended thinking can be understood as a new kind of runtime-managed deliberation budget. It is not identical to chain-of-thought as discussed in research papers, and it is not identical to a classical planner either. It is closer to a managed reasoning channel inside the agent loop.

Comparing the three implementations

The three systems share the same abstract loop, but their implementation emphasis differs.

OpenCode is the clearest architectural baseline. session/processor.ts is explicit about the lifecycle of reasoning parts, text parts, tool parts, status changes, snapshot tracking, retry behavior, and compaction triggers. Combined with session/llm.ts, it presents the ReAct loop as a clean streaming state machine over Vercel AI SDK’s streamText() interface.

Claude Code implements a heavier-weight version of the same pattern. QueryEngine.ts manages persistent session state and configuration; query.ts runs the live loop, processes tool_use blocks, updates ToolUseContext, manages stop hooks, retries, compaction, token budgets, and fallback paths. The loop is more entangled with product concerns—permissions, memory extraction, prompt suggestion, streaming fallback, and resume semantics—but it is still fundamentally ReAct.

OMO is best described as a loop amplifier. It relies on OpenCode’s streaming processor, yet inserts itself at strategically powerful seams. tool.execute.before modifies intent before action. tool.execute.after modifies observation before reinsertion into context. experimental.chat.messages.transform modifies the remembered history before the next reasoning step. This makes OMO architecturally significant: it demonstrates that once a host agent exposes the right hooks, major changes in autonomous behavior can be achieved without forking the whole loop.

Why this cycle matters

The Think-Act-Observe cycle is the reason coding agents feel qualitatively different from ordinary chatbots. A chatbot can answer questions. A coding agent can enter a loop, inspect files, run commands, reflect on results, and then decide what to do next. The intelligence is therefore not only in one model completion. It is in the composition of many bounded completions plus tool-mediated state transitions.

That is also why the ReAct loop has become the canonical execution model for coding agents in 2025–2026. It is simple enough to implement, general enough to support many workflows, and extensible enough to host very different product philosophies. OpenCode embodies the open-source baseline, Claude Code the commercial control-rich version, and OMO the orchestration-heavy extension model. But under all three sits the same core idea: think, act, observe, and continue until the work is actually done.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 3 — The Core Loop: The ReAct Paradigm
Token Usage: ~11,500 input + ~1,700 output

3.2 Streaming Architecture

If the ReAct loop is the brainstem of a coding agent, then streaming is its nervous system. Without streaming, the model would think in silence, tools would run in silence, and the user would see only a delayed final result. That is unacceptable for serious coding work. Developers need to watch progress, inspect partial output, catch mistakes early, and maintain trust that the system is still alive.

Streaming architecture therefore answers a practical question: how does partial model output travel from provider to runtime to UI while preserving structure, correctness, and responsiveness? OpenCode, Claude Code, and OMO all solve this problem, but they do so in different styles.

Why streaming is harder for agents than for chat

A normal chat application can often get away with token streaming that appends plain text to a chat bubble. Coding agents are much more difficult because the stream contains multiple semantic layers at once:

visible text for the user,
invisible or semi-visible reasoning blocks,
tool-use requests that may arrive before the text is complete,
progress messages from long-running tools,
post-tool assistant continuation,
errors, retries, and fallback events.

This means agent streaming is not just a UI nicety. It is a protocol translation problem between the model’s event stream and the human-facing interface.

OpenCode: `streamText()` into structured session parts

OpenCode’s streaming design is the clearest baseline. In session/llm.ts, the runtime delegates provider interaction to Vercel AI SDK’s streamText(). This gives OpenCode a normalized streaming abstraction across multiple model providers. That choice matters because the project supports a broad provider matrix; the agent runtime wants one common stream interface even if the underlying vendors differ.

The key architectural move happens in session/processor.ts. SessionProcessor iterates over stream.fullStream and handles each event as a typed part update. Text is accumulated incrementally through text-start, text-delta, and text-end. Reasoning is accumulated with analogous reasoning-* events. Tool calls are represented as tool parts that move from pending to running to completed or error states. Step boundaries are tracked with start-step and finish-step, and usage metadata is recorded at the end of the step.

This means OpenCode does not stream raw provider events directly to the UI. It streams them into an internal message-part database model first. That model is then broadcast through the event bus. The UI layers—TUI, web, desktop, and API consumers—can progressively render the same underlying state.

Architecturally, this is elegant for two reasons.

First, it decouples provider streaming from rendering. The model stream updates session state; the UI observes session state. Second, it supports multi-part accumulation naturally. Text, reasoning, and tool blocks coexist as siblings rather than being flattened into a single mutable string.

OpenCode also adds reliability work around streaming. In provider/provider.ts, the wrapSSE() helper wraps text/event-stream responses with a read timeout. If no SSE chunk arrives within the configured window, the stream is aborted with an error. This is a concrete solution to a real agent problem: providers and proxies sometimes stall mid-stream, and a coding agent cannot wait forever without signaling failure.

Claude Code: custom streaming into an Ink terminal renderer

Claude Code takes a more vertically integrated approach. Instead of leaning on a generic multi-provider SDK abstraction plus a neutral event bus, it operates through its own query pipeline and custom terminal rendering stack.

The main loop in query.ts yields a rich stream of message objects—assistant messages, progress messages, attachment messages, tool summaries, tombstones, request-start events, and more. These are consumed by the terminal application, whose entry point sits in main.tsx and whose rendering system is built on a substantial Ink/React component tree. In practice, Claude Code progressively updates the terminal as the query loop emits new objects.

This is not simply “print each token as it arrives.” Claude Code performs structured progressive rendering. Assistant messages may be streamed and later tombstoned if a streaming fallback occurs. Tool-use blocks may be backfilled before yield so the displayed stream and transcript stay coherent. Streaming tool execution can begin while the model is still emitting later content. Boundary messages for microcompact or stop hooks can appear mid-turn. The terminal view is therefore rendering a live execution graph, not just a chat transcript.

This is where the custom Ink architecture becomes important. A large terminal UI stack allows Claude Code to treat the CLI as a full reactive application rather than a simple stdout log. The prompt input, message list, tool loaders, progress lines, warnings, dialogs, and state banners all coexist in one render tree. The user’s reference to “148 component directories” captures the scale of that investment: Claude Code’s stream is rendered inside a product-grade terminal frontend, not a thin shell.

Claude Code also contains significant streaming resilience logic. In services/api/claude.ts, the runtime detects cases where a stream ends without valid assistant content and triggers fallback behavior. It also tracks streaming stalls and records them for diagnostics. This is the same class of reliability issue OpenCode addresses with SSE timeout wrapping, but handled inside Claude Code’s own API and recovery infrastructure.

OMO: inherited streaming, altered message flow

OMO does not replace OpenCode’s basic stream transport. It inherits the host’s use of streamText(), evented session updates, and progressive rendering across OpenCode’s interfaces. But OMO changes what the stream means before it reaches the model again or before tool outputs settle into context.

The most important addition is experimental.chat.messages.transform, wired in plugin-interface.ts and implemented through plugin/messages-transform.ts. That handler invokes the context injector and thinking-block validator over the message stream representation. In other words, OMO can alter the conversation state between iterations without rewriting the lower-level streaming machinery.

The tool.execute.before and tool.execute.after hooks further modify stream-adjacent behavior. Before execution, OMO can rewrite tool arguments, inject instructions, and reinterpret task-routing data. After execution, it can truncate outputs, add metadata, repair errors, or run reminder logic. So while OpenCode’s stream carries raw tool observations, OMO can post-process those observations before they influence the next reasoning step or the rendered trace.

That distinction is subtle but powerful. OMO shows that there are two layers of streaming architecture:

the transport layer, where bytes and typed events arrive from the model, and
the semantic layer, where those events are transformed into remembered context.

OpenCode exposes both layers well enough that OMO can innovate primarily at the second layer.

The multi-part accumulation problem

All three systems must solve the problem of partial accumulation. When a model response is streaming, the runtime rarely has the whole answer at once. It may have half a sentence, the beginning of a reasoning block, or a tool-use frame whose argument object is still incomplete.

OpenCode solves this explicitly with distinct start/delta/end handlers for text and reasoning. Claude Code solves it with mutable arrays of assistant messages, toolUseBlocks, toolResults, and optional streaming tool executors. OMO inherits OpenCode’s accumulation semantics but adds transformation stages that assume the intermediate structure remains valid.

This is why streaming architecture in coding agents is closer to incremental parsing than to simple output buffering. The runtime is constantly deciding whether a partial artifact is renderable, executable, storable, or still incomplete.

Backpressure, partial tool results, and concurrent streams

Three streaming challenges deserve special attention.

1. Backpressure

Backpressure means the producer emits data faster than downstream consumers can safely process it. In agent systems, the producer may be the LLM stream, while consumers include the session store, UI renderer, transcript recorder, and tool executor. If the runtime does too much work on every token, the interface becomes sluggish or the stream falls behind.

OpenCode mitigates this by normalizing the stream into durable part updates and letting UIs subscribe through the bus. Claude Code mitigates it by routing everything through a controlled query generator and reactive renderer rather than ad hoc printing. OMO inherits the host risk but adds more hook work, which increases semantic power while also raising the chance of downstream overhead.

2. Partial tool results

A tool may finish before the model has fully finished its assistant turn, or a tool may run long enough that the user needs progress updates. Claude Code explicitly supports streaming tool execution through StreamingToolExecutor, allowing completed tool results to be surfaced while the broader turn is still in flight. OpenCode models tools as structured parts with running/completed/error states. OMO’s post-tool hooks can further rewrite completed outputs before they become stable context.

The hard part is preserving coherence. A partial tool result must remain associated with the correct tool call ID, the correct assistant turn, and the correct follow-up reasoning chain.

3. Multi-agent concurrent streams

This challenge becomes especially acute in orchestration-heavy systems. OMO supports background agents and multi-agent execution; Claude Code also has subagent and task-oriented flows. Once multiple agents can stream simultaneously, the system must separate their transcripts, correlate notifications correctly, and avoid mixing outputs into the wrong UI surface.

This is not merely a rendering issue. It is a state-isolation issue. Every concurrent stream needs its own abort signal, message history, tool execution context, and persistence strategy. Otherwise, one agent’s stream can corrupt another’s control loop.

Comparative lesson

The three systems represent three different philosophies of streaming design.

OpenCode emphasizes architectural clarity: provider stream in, typed parts out, event bus distribution, many frontends on top. It is the cleanest reference model.

Claude Code emphasizes product integration: query loop, custom streaming logic, UI-specific progressive rendering, fallback handling, and a rich terminal application. It is the most polished end-user streaming experience.

OMO emphasizes semantic intervention: inherit the transport, then transform messages and tool results to shape the next reasoning cycle. It demonstrates that the most powerful streaming customizations may happen not in the transport itself, but in the interpretation layer around it.

In short, streaming in coding agents is not only about speed. It is about maintaining a faithful, interruptible, structured representation of thought and action while the system is still in motion. That requirement is one of the major reasons modern coding agents are systems problems, not just prompt problems.

Deep Dive: Type-Safe Event Bus

To understand why OpenCode’s streaming architecture feels unusually clean, it helps to look beneath the visible stream and study the event bus. An event bus is the internal communication channel that lets one subsystem announce that something happened so other subsystems can react. In a serious agent runtime, many “departments” need this mechanism at once: the UI needs to know when new text arrives, the tool executor needs to announce when a tool starts or finishes, the MCP manager needs to signal when a remote server disconnects, the session store needs to persist state changes, and analytics or logging layers may want a copy of everything.

This is easiest to picture as a company building. Imagine a large company with many departments: reception, finance, shipping, legal, HR, and operations. A customer complaint enters at reception; finance may need to know whether a refund is required, shipping may need to stop a package, legal may need a record, and HR may need to notify the support team. In software terms, that is exactly what happens in an agent system. One event—say, “tool finished” or “session started”—may matter to many modules at the same time.

The naive design is a string broadcaster. Anyone can shout a string event name plus some arbitrary data object, and everyone else can decide whether to listen. It sounds flexible. It is also dangerous.

bus.emit("tool-done", { result: "something" })

At first glance, this seems harmless. But what does result contain? Is there also a toolName field? A success boolean? A duration? An error message? Is the event called tool-done, tool.done, tool_complete, or tool-finished? The receiver does not know. A typo in the event name may silently cause nothing to happen. A sender may rename a field and the listener may keep compiling because both sides are just using loose strings and plain objects. A new developer joining the team has no reliable map of which events even exist. The system works like a construction-site walkie-talkie: people shout whatever seems useful, often loudly, often informally, and sometimes nobody is sure who heard what.

That is acceptable for a toy chat demo. It is disastrous for an agent runtime, where many independent parts are coordinating in real time.

The opposite design is more like aviation radio. In aviation, communication is not casual. A radio message has a strict pattern: who is speaking, where they are, what altitude they are at, and what they intend to do. If the format is wrong or incomplete, air traffic control may refuse to acknowledge it because ambiguity in a high-speed system is dangerous. That is the spirit of a type-safe event bus. Every event has a defined shape. The event name is not a free-form string floating around the codebase. The required fields are known in advance. If you send the wrong structure, the compiler and validator complain.

This is close to how OpenCode approaches internal events. Instead of treating events as vague string labels, it defines them with Zod schemas. Zod is a TypeScript-first schema library: it lets developers describe what a valid object must look like, then use that description both for compile-time typing and runtime validation. In practice, this means an event can be specified precisely.

For example, a tool-completion event can be described like this:

const ToolCompleteEvent = z.object({
  type: z.literal("tool.complete"),
  toolName: z.string(),
  duration: z.number(),
  success: z.boolean(),
  result: z.unknown().optional(),
  error: z.string().optional(),
})

And a session-start event can be defined separately:

const SessionStartEvent = z.object({
  type: z.literal("session.start"),
  sessionId: z.string(),
  agent: z.string(),
  timestamp: z.number(),
})

Notice the key pattern: each schema has a type field with a fixed z.literal(...) value. That literal becomes the event’s discriminator—the field that announces which kind of event this is. Once many such schemas are collected together, they automatically form what TypeScript calls a discriminated union.

For non-CS readers, the phrase sounds intimidating, but the idea is simple. Think of a government or school form where the first box determines what all other boxes must contain. If the first field says type = "tool.complete", then the form must include fields such as toolName, duration, and success. If the first field says type = "session.start", then the form must instead include sessionId and agent. One first field decides the rest of the structure. That is all a discriminated union means: one label tells the system which exact data shape is valid.

This gives the event bus three major advantages.

First, sending becomes safer. If a developer tries to emit a tool.complete event without success, or accidentally uses duraton instead of duration, TypeScript can flag the problem before the code even runs. If the wrong event name is used, it is no longer “just another string”; it fails to match the allowed set. In other words, the sender must speak proper aviation radio, not construction-site shouting.

Second, receiving becomes easier. Suppose a listener checks:

if (event.type === "tool.complete") {
  // IDE now knows event.toolName, event.duration, event.success...
}

Inside that branch, the IDE can auto-complete the correct fields because the type check narrows the union to the exact event schema. If a programmer tries to access event.sessionId inside the tool.complete branch, the editor can immediately mark it as invalid. This is not just convenience. It is live documentation embedded in the code.

Third, the event catalog becomes visible. New developers no longer need to search random strings scattered across the repository. The schemas themselves become the authoritative map of the system’s conversations: what events exist, what each one means, and what payload each requires.

This matters especially in agent systems because events are both frequent and diverse. A single user message may trigger a session record update, token accounting, transcript persistence, and a UI refresh. A model calling a tool may trigger permission checks, execution status events, logging, retries, and output rendering. An MCP disconnect may require a tool registry update, a warning banner in the UI, and telemetry for debugging. These are not rare edge cases. They are the normal daily traffic of an agent runtime.

Without typed events, such a system slowly becomes socially dysfunctional. Adding a new event type raises the question: who needs to know? Nobody is sure. Changing an event payload becomes dangerous because downstream consumers may silently break. One listener may still expect tool, another may now expect toolName, and nothing forces the mismatch to become visible. The software is “working” in the sense that it runs, but it is rotting in the sense that internal contracts are dissolving.

With Zod-typed events, the situation changes. Every event has a contract. If you modify the contract, affected senders and receivers trigger compile errors. IDE auto-completion updates immediately. At runtime, Zod can still validate external or dynamically created payloads as a safety net, which is important because agent systems often interact with plugins, remote tools, MCP servers, and partially trusted boundaries. Compile-time checks catch developer mistakes early; runtime validation catches bad data that arrives from the messy outside world.

This dual layer is important. TypeScript alone is like a clean office checklist: helpful when your own staff follow the rules. Zod adds the front-desk guard who checks IDs when outsiders arrive. Agent systems need both, because they operate across internal code, plugin hooks, provider adapters, and user-triggered workflows.

There is also a design lesson here for future agent builders. When people talk about agent architecture, they often focus on glamorous components: the model, the tool list, the memory strategy, the multi-agent planner. Those matter. But if the internal communication layer is sloppy, the whole organism becomes unreliable. Good agents are not just smart thinkers; they are well-coordinated institutions. The event bus is part of that coordination fabric.

A mature event system therefore does more than “move messages around.” It documents subsystem boundaries, makes refactoring safer, teaches new contributors how the runtime behaves, and turns architectural intent into enforceable contracts. In a field full of rapidly evolving codebases, that discipline becomes a competitive advantage.

One-line summary: A type-safe event bus makes the “conversations” between all subsystems as strict as a signed contract — who sends what, who receives what, what format, all documented, changes trigger alarms.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 3 — The Core Loop: The ReAct Paradigm
Token Usage: ~11,500 input + ~1,750 output

3.3 Stop Conditions and Loop Control

An agent loop is only useful if it can stop at the right time. If it stops too early, the system returns incomplete work. If it stops too late, it wastes tokens, repeats itself, or falls into destructive loops. This is one of the central engineering tensions in coding agents: how do you distinguish “done” from “not done yet,” and how do you enforce that distinction when the model itself is imperfect?

OpenCode, Claude Code, and Oh-My-OpenCode all implement the same fundamental idea—a repeated ReAct loop—but they differ sharply in how much control they give the model versus the runtime.

The basic termination condition

The canonical “normal completion” case is simple: the model produces a response that contains no further tool calls and ends the turn. In Anthropic terminology this often corresponds to stop_reason="end_turn". Conceptually, the meaning is: I have nothing left to execute; I can answer directly now.

This is the cleanest stop condition because it aligns model intention with runtime policy. The model says it is done, and the runtime agrees.

In practice, however, agent loops cannot trust this condition blindly. Models sometimes stop when they should continue, especially after partial progress, mild errors, or premature summaries. They also sometimes continue when they should stop, especially when tool feedback is ambiguous or repetitive. Therefore, every serious coding agent adds external loop control around the model’s native stopping behavior.

OpenCode: continue, stop, compact

OpenCode’s SessionProcessor exposes a concise but revealing control model. At the end of processing a streamed step, process() can return one of three outcomes: "continue", "stop", or "compact".

That is a very useful abstraction. It says the loop does not merely decide whether to continue or stop. It may also decide that the next necessary action is context management. If the session is approaching overflow, the correct control transition is not another model turn yet, but compaction.

Several stop-affecting conditions appear in session/processor.ts and related files:

normal completion when the turn finishes without requiring more tool execution,
blocked stop after permission or question rejection,
stop on unrecoverable assistant error,
compaction trigger when the context is too large,
retry on retryable API errors using session/retry.ts.

session/retry.ts implements exponential backoff with header-aware delay parsing. It explicitly treats context overflow as non-retryable, which is a subtle but correct design choice: if the prompt is too large, retrying the same request is not recovery; it is repetition.

OpenCode also includes doom-loop detection. If the same tool is invoked repeatedly with the same input for the last several parts, the runtime escalates through the permission system. This is a good example of a runtime deciding that “the model wants to continue, but it probably should not.”

Claude Code: a much richer stop-control surface

Claude Code’s query.ts shows a far more elaborate stop-control regime. Here the loop can terminate, continue, retry, compact, escalate output-token limits, invoke stop hooks, or continue because token-budget policy nudges it onward.

Some major stop conditions include:

1. Normal end-of-turn completion

If streaming finishes and there are no relevant tool-use blocks requiring follow-up, the loop can end normally. This is the baseline case.

2. `maxTurns`

Claude Code supports a configurable maximum number of turns in the query loop. Near the end of query.ts, if nextTurnCount > maxTurns, the runtime yields a max_turns_reached attachment and stops. This is the classic safeguard against infinite loops.

In CS terms, this is a bounded-iteration guard. It is conceptually simple, but essential in probabilistic control loops where no proof of convergence exists.

3. Token and budget control

Claude Code is notably explicit about budget management. QueryEngineConfig includes maxBudgetUsd, and the query loop contains token-budget logic that can either continue or complete depending on measured usage. The important point is that “should stop” is not determined only by task semantics. It is also determined by resource policy.

This is a major product distinction. Open-source agent runtimes often focus first on capability; Claude Code treats cost governance as a first-class stop condition.

4. Abort propagation

Claude Code threads abort signals through ToolUseContext. During tool execution and streaming, it checks abortController.signal.aborted. If the user interrupts the run, the loop does not merely stop the next turn; it actively propagates cancellation through the live execution context. This matters because a coding agent is frequently doing real external work—running shell commands, accessing MCP tools, or waiting on network operations.

5. Error recovery and withheld errors

Claude Code distinguishes between errors that should surface immediately and errors that should trigger recovery attempts first. Prompt-too-long errors, max_output_tokens conditions, and some streaming failures may be withheld while the runtime attempts compaction, truncation, escalation, or retry paths.

This is a sophisticated answer to the problem “the model cannot continue right now, but perhaps the runtime can create a world in which it can.” It turns stop logic into a recovery pipeline rather than a single boolean.

6. Stop hooks

Perhaps the most interesting mechanism is handleStopHooks() in query/stopHooks.ts. Even if the model appears to be done, the runtime can run stop hooks that inspect the turn and decide whether continuation should be blocked, allowed, or forced through meta-messages. This directly addresses the case “the agent wants to stop but should not.”

OMO: loop intervention as a philosophy

OMO pushes this tension further than either OpenCode or Claude Code. Its philosophy is that many premature stops are not random model mistakes but systematic orchestration failures. Accordingly, it adds explicit continuation-enforcement machinery.

Ralph Loop

The ralph-loop hook family maintains loop state and can start a continuation regime for a session. This is an intentional override of normal stop behavior. Rather than trusting a single assistant turn to declare completion, Ralph Loop can keep the system moving toward a broader promised objective.

In conceptual terms, Ralph Loop says: completion is not whatever the model says at the end of one turn; completion is whatever satisfies the higher-order task contract.

This is powerful, but dangerous. It directly increases the risk of “the agent will not stop when it should.” OMO accepts that risk because it is optimizing for sustained autonomy.

Todo Continuation Enforcer

The todo-continuation-enforcer watches session events such as session.idle and session.error. When the session goes idle, it can decide that pending todos imply unfinished work and trigger continuation behavior. That is a runtime answer to the classic failure mode where the model writes a polished-looking summary before actually finishing the implementation.

This mechanism formalizes an idea that ordinary agent runtimes often leave implicit: visible task state can override model self-assessment.

Context-window overflow recovery

OMO’s anthropic-context-window-limit-recovery hook is another important example of loop control. When a session hits a token limit, OMO parses the error, marks the session as pending compaction, and automatically attempts recovery after idle or on a timed path. The runtime even surfaces UI toasts such as “Token limit exceeded. Attempting recovery…” before calling compaction logic.

This is a practical illustration of a broader principle: overflow should not always mean stop. Sometimes the correct transition is “repair context, then continue.”

The two core failure modes

All three systems are trying to solve two opposite problems.

1. The agent wants to stop but should not

This happens when the model produces a plausible completion message even though real work remains. Common reasons include:

it completed analysis but not implementation,
it called one tool and mistook partial evidence for final evidence,
it encountered friction and chose a polite exit,
it lost track of explicit subtasks.

OpenCode mostly handles this indirectly through the natural loop: tool calls create follow-up turns, and compaction or retries preserve progress. But it is relatively restrained.

Claude Code handles it through stop hooks, task logic, memory extraction side flows, and policy-aware continuation controls. It is willing to question the model’s own claim of completion.

OMO is the most aggressive. Ralph Loop and Todo Continuation Enforcer explicitly exist to resist premature stopping. In OMO, “done” is a managed condition, not merely a model utterance.

2. The agent will not stop but should

This happens when the model keeps calling tools, keeps revising the same idea, or keeps trying unproductive recovery paths.

OpenCode addresses this with doom-loop detection, retry limits, permission denials, and compaction branching.

Claude Code addresses it with maxTurns, token budgets, USD budget controls, abort propagation, and stop-hook logic that can prevent continuation.

OMO is structurally more exposed to this risk because it adds continuation pressure. It compensates with explicit stop commands such as stop-continuation, cancellation pathways, loop-state control, and recovery guards. But the tradeoff remains real: a system optimized to resist premature stopping must work harder to avoid pathological persistence.

Comparative lesson

The stop problem reveals the philosophical differences among the three systems more clearly than almost any other area.

OpenCode treats loop control as a clean runtime concern: retry the retryable, compact the oversized, stop the blocked, continue otherwise.

Claude Code treats loop control as a product policy problem: completion depends on model output, budget, user interruption, hook evaluation, and recovery state.

OMO treats loop control as an orchestration battleground: the default loop is often not enough, so the system adds explicit mechanisms to push or restrain continuation depending on higher-level task intent.

The deeper design lesson is that future coding agents will need dual control. The model should propose whether to continue, but the runtime must arbitrate that proposal using external evidence: tool traces, budgets, task lists, permissions, and user intent. A coding agent that relies only on model self-termination will stop too early. A coding agent that never trusts the model will loop forever. Good design lives in the disciplined middle.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 4 — Tool System Design
Token Usage: ~7,600 input + ~1,500 output

4.1 Tool Definition Paradigm

The first design question in any coding agent is deceptively simple: what is a tool? In practice, the answer determines how safely the model can act, how much context the runtime can supply, how richly the UI can render execution, and how easy it is for plugin authors to extend the system. OpenCode, Claude Code, and Oh-My-OpenCode (OMO) all treat tools as first-class architectural objects rather than as thin function-calling wrappers, but they express that idea through different definition paradigms.

OpenCode’s core pattern appears in /packages/opencode/src/tool/tool.ts. The center of gravity is Tool.define(), which takes an identifier and an initializer. That initializer returns a description, a parameter schema, and an execute() function. The parameter schema is expressed with Zod, a TypeScript runtime schema validation library. Zod is not a classic textbook CS concept; it is a developer tool for declaring data structure expectations in code and then checking them at runtime. That matters in agent systems because LLMs often produce syntactically plausible but structurally wrong arguments. Compile-time types alone cannot protect against that, because the model’s tool call arrives at runtime.

OpenCode therefore places validation directly at the execution boundary. Tool.define() wraps the supplied execute() implementation and calls toolInfo.parameters.parse(args) before the real body runs. If the schema check fails, the system throws an error telling the model to rewrite the input so it satisfies the expected schema. This is more than defensive programming. It is a form of machine-facing affordance: the runtime is teaching the agent how to recover from malformed calls.

The execution context supplied by OpenCode is also unusually rich. A tool’s execute() receives not just arguments but a context containing sessionID, messageID, agent, an AbortSignal, optional callID, message history, a metadata() callback, and an ask() permission callback. This means tools are session-aware, cancellable, permission-aware, and capable of annotating their own results. In other words, OpenCode does not model tools as stateless RPC endpoints. It models them as capabilities operating inside an ongoing agent runtime.

There is one more important move in Tool.define(): output truncation is applied automatically unless the tool explicitly marks that it already handled truncation. After execution, OpenCode calls Truncate.output(...) and returns a modified result with truncated metadata and, when needed, an outputPath. This turns output control into a framework guarantee rather than a per-tool courtesy.

Claude Code’s pattern, defined in /src/Tool.ts, is broader and more industrialized. Instead of Tool.define(), it uses a buildTool() factory. The file is much larger because it does more than register a tool; it defines the full interface between the model, the runtime, the permission layer, the UI renderer, and the classifier stack. Claude Code tools can specify an inputSchema using Zod, but the system also explicitly supports inputJSONSchema for tools whose schemas arrive in raw JSON Schema form, especially MCP-related tools.

JSON Schema is different from Zod. It is an IETF-standard way to describe the shape of JSON documents: object properties, required fields, enums, nested arrays, and similar constraints. Unlike Zod, which is a TypeScript library with executable validators, JSON Schema is a language-agnostic specification format. In agent systems, JSON Schema is useful because it travels well across APIs, SDKs, and external tool protocols.

Claude Code’s buildTool() factory fills in safe defaults for many behaviors: whether a tool is enabled, concurrency-safe, read-only, destructive, how permissions are checked, how user-facing names are rendered, and what classifier-facing input should look like. This is a subtle but important design improvement. It means tool authors write only the deltas while the system preserves fail-closed defaults. For example, isConcurrencySafe defaults to false, isReadOnly defaults to false, and isDestructive defaults to false unless the tool says otherwise. The factory is therefore not just ergonomic; it is policy-bearing.

The heart of Claude Code’s tool implementation is the run() method, which receives a ToolUseContext. That context is much larger than OpenCode’s, including available commands, the full tool list, MCP clients, MCP resources, app state getters and setters, notification hooks, prompt request handlers, file-reading limits, glob limits, progress emitters, query tracking, and more. The practical meaning is that Claude Code tools are deeply integrated into the application shell. A tool can stream progress, update UI state, coordinate with background tasks, and interact with memory and session infrastructure without escaping the formal interface.

This leads to a major philosophical difference. OpenCode’s tool system is elegant and compact; Claude Code’s is expansive and productized. OpenCode defines the minimum strong contract for runtime tool execution. Claude Code defines a full operating environment for tools as product components.

OMO sits between those two. In /src/tools/index.ts, OMO largely reuses OpenCode’s SDK-style tool definitions through ToolDefinition objects imported from @opencode-ai/plugin, but then layers additional context and orchestration behavior on top. OMO exports tool factories such as createBackgroundOutput, createBackgroundCancel, createCallOmoAgent, createDelegateTask, and various LSP and skill tools. The important point is not merely that OMO adds more tools. It wraps OpenCode’s tool substrate and injects extra lifecycle semantics through surrounding hooks and feature modules.

That is why OMO should be understood as an extension architecture rather than an alternative tool kernel. OpenCode gives it the base definition model. OMO then enriches execution through metadata restoration, extra session awareness, background-agent integration, and pre/post-execution hooks. The result is a hybrid paradigm: tools are still defined in OpenCode’s style, but they operate inside a denser orchestration layer.

Anthropic’s broader ACI principle helps explain why these details matter. The guideline can be summarized as: invest as much effort in ACI, the Agent-Computer Interface, as in HCI, the Human-Computer Interface. In practice, that means tool descriptions should read like docstrings written for a junior developer: concrete, constrained, explicit about inputs, outputs, and failure modes. Tool definition is therefore not only a type problem. It is also an instruction design problem. The schema tells the runtime what valid input looks like; the description tells the model what good usage looks like.

Across the three systems, the pattern is converging. A modern agent tool definition has at least five layers: a natural-language contract for the model, a machine-checked input schema, a rich runtime context, a controlled execution body, and a post-processing stage for output and metadata. OpenCode expresses this in a compact kernel form. Claude Code expands it into a product-grade interface with rendering and policy hooks. OMO proves that once the base abstraction is sound, an orchestration layer can reuse it and still add substantial new behavior.

The design lesson is clear. Tool definition should not be treated as boilerplate around function calling. It is the main programming model of the agent runtime. The more carefully that contract is shaped, the more reliable, safe, and composable the entire agent becomes.

Deep Dive: Zod Schema as Full-Chain Penetration

Model: openai/gpt-5.4
Token Usage: not exposed in this local append operation

If the previous section explained that tools need schemas, this section asks a deeper question: why does a schema matter so much in an agent system? The short answer is that a schema is not just a form. It is a control surface that reaches through the entire system. In OpenCode-style architectures, Zod does not merely validate one function call. It acts like a structured checkpoint that keeps meaning stable as data moves from prompt, to model output, to tool execution, to storage, to APIs, and back again.

Start with a high-school-level analogy: ordering at McDonald’s. Imagine a restaurant with no fixed menu fields. A customer says, “I want that burger thing, maybe large, no, medium, and add something cold.” The cashier guesses. The kitchen guesses again. The packer guesses whether “cold” means Coke, Sprite, or ice cream. By the time the tray reaches the customer, everyone has interpreted the sentence differently. The system did not fail because people were stupid. It failed because each stage had to invent structure for itself.

Now compare that with a structured order system. The cashier screen requires: main item, size, drink, remove ingredients, extra items. If the customer says something vague, the cashier cannot finalize the order until the ambiguity is resolved. The kitchen receives the exact same structured record. The drink station receives a smaller but still validated subset. The receipt reflects the same fields. In other words, the order is not “understood” separately by each person. It is validated once and then trusted everywhere else.

That is what schema does in software. Without schema, every layer guesses meaning. With schema, meaning is declared once and checked at every step.

In the TypeScript ecosystem, Zod is one of the most popular tools for doing this. Zod is a runtime schema validation library. “Schema” here means a precise description of what data should look like: which fields exist, what type each field has, which ones are required, which values are allowed, and how nested structures are organized. This is not just a textbook “type system” idea. A normal TypeScript type disappears after compilation. Zod does not disappear. It stays alive at runtime and can inspect real incoming data.

That distinction matters enormously. TypeScript can tell a programmer, during development, that lineNumber should be a number. But if an LLM later emits { "lineNumber": "42" }, the TypeScript compiler is no longer there to protect you. Zod is. It checks the actual payload when the program is running.

Here is a very small example:

import { z } from "zod"

const ToolInput = z.object({
  filePath: z.string().min(1),
  lineNumber: z.number().int().positive(),
  includeContext: z.boolean().default(false),
})

const rawInput = {
  filePath: "/project/src/app.ts",
  lineNumber: "18",
  includeContext: true,
}

const result = ToolInput.safeParse(rawInput)

if (!result.success) {
  console.error(result.error.issues)
} else {
  const input = result.data
  console.log(input.lineNumber)
}

The key call is safeParse. It does not crash the process immediately. Instead, it returns either a success result with validated data, or a failure result with structured error details. That means the system can respond precisely: “lineNumber must be a positive integer, but received a string.” This is much better than letting bad data drift deeper into the stack.

Why do AI agents especially need this? Because LLMs are powerful, but they are also non-deterministic. “Non-deterministic” means the same instruction can produce slightly different outputs on different runs. That is acceptable for language generation. It is not acceptable at system boundaries. A tool call is not poetry. It is an operation that may read files, write files, call an API, or mutate state. Once we cross into execution, the system must become deterministic.

Typical failures look almost trivial, but they are exactly the kinds of tiny mismatches that break tool systems:

path is given as src/main.ts when the tool contract requires an absolute path like /Users/.../src/main.ts
lineNumber is emitted as the string "88" instead of the number 88
a required field such as session_id is missing entirely
a tool expecting { mode: "fast" | "safe" } receives { mode: "quick" }
a union-shaped object is half one format and half another, so downstream code does not know which branch to trust

These mistakes are common because the model is predicting tokens, not executing a proof. It may output something that looks reasonable to a human reader but is invalid for a machine. That is why agent architecture must follow a strict philosophy: LLMs may be probabilistic at the center, but boundaries must be deterministic at the edges.

The danger without schema is what we can call silent failure propagation. Suppose the model emits a relative path. The tool implementation assumes absolute paths and tries to resolve it using the current working directory. That accidentally points to the wrong file. The wrong file produces the wrong diff. The wrong diff is stored in message history. Another model step reads that history and makes a wrong repair. The system still “works” in the sense that no hard exception happened, but the error has contaminated multiple layers.

That flow looks like this:

LLM emits fuzzy arguments
  -> tool accepts them anyway
  -> code makes hidden assumptions
  -> wrong file / wrong state / wrong result
  -> misleading output enters conversation history
  -> later steps build on corrupted context
  -> failure spreads silently

Now compare that with a schema-first flow:

LLM emits fuzzy arguments
  -> Zod validates at boundary
  -> validation fails immediately and specifically
  -> system returns exact correction message
  -> LLM retries with corrected structure
  -> tool executes on trusted data
  -> clean result flows downstream

This is the difference between “the system discovers the problem after damage” and “the system contains the problem at the smallest possible scope.” Good schema design makes errors local.

In OpenCode-like systems, Zod’s importance goes beyond the single execute() call. It penetrates the full chain. Think of the entire runtime as a sequence of borders, and every border needs a customs checkpoint.

graph LR
    A["System Prompt"] -->|"Zod ✓"| B["LLM Inference"]
    B -->|"Zod ✓"| C["Tool Parameters"]
    C -->|"Zod ✓"| D["Tool Execution"]
    D -->|"Zod ✓"| E["Tool Result"]
    E -->|"Zod ✓"| F["Message History"]
    F -->|"Zod ✓"| G["SQLite Storage"]
    G -->|"Zod ✓"| H["Event Bus"]
    H -->|"Zod ✓"| I["HTTP API"]
    
    style A fill:#4a9eff,color:#fff
    style I fill:#51cf66,color:#fff

System prompt assembly: tool definitions, parameter names, and descriptions are assembled into the prompt. The schema indirectly shapes how the model understands the tool before generation even starts.
LLM inference: the model decides to call a tool and emits arguments. This is the most probabilistic point in the chain.
Tool call parameters: Zod parses the emitted payload and checks whether it matches the declared structure.
Tool execution: only validated data reaches the real execution body. The tool can now operate with stronger assumptions.
Tool result: output often has its own structure, truncation rules, metadata flags, or discriminated states like success vs error.
Message history (MessageV2 parts): the result is turned into structured message parts that later inference steps will consume.
SQLite storage: persisted records need stable shapes so sessions can be resumed without guessing old formats.
Event bus payload: UI updates, notifications, and observers depend on consistent event objects.
HTTP API response: external callers need the same contract discipline, otherwise integrations become fragile.

Seen this way, schema is not a local helper. It is an architectural membrane. Each boundary says: “Before you pass, prove your shape.”

This is why “full-chain penetration” is a useful phrase. The schema is not sitting at the edge like a decorative label. It is drilling through the whole stack. Prompt design, tool calling, storage format, event payloads, and transport APIs all become more reliable because they are speaking through declared structures instead of informal guesses.

Another analogy makes the same point from a different angle: airport security. The LLM is the passenger. Most passengers are harmless, many are messy, some forget things, some carry items in the wrong bag, and a few behave unpredictably. The tool execution environment is the boarding gate. Once you let someone through incorrectly, the cost of a mistake rises quickly. The schema is the X-ray machine and document check. It does not try to predict human intention philosophically. It checks whether what is presented satisfies concrete rules right now.

This analogy matters because agent engineers sometimes make a category mistake. They think, “The model is smart enough; it probably meant the right thing.” Airport security does not work that way. “Probably fine” is not a policy. The checkpoint must be explicit, repeatable, and inspectable. Zod gives software a machine-readable version of that checkpoint.

The core design philosophy can be pictured like this:

+-------------------------------------------------------------+
| Deterministic system shell                                  |
|                                                             |
|  Prompt assembly -> Schema check -> Tool exec -> Storage    |
|        |                 |              |           |        |
|        v                 v              v           v        |
|                 [ Non-deterministic LLM core ]              |
|                                                             |
|  Events -> API responses -> Session resume -> UI rendering  |
|                                                             |
+-------------------------------------------------------------+

Or even more simply:

deterministic boundary
    -> probabilistic generation
        -> deterministic validation
            -> deterministic execution
                -> deterministic persistence

That shell is what makes an agent usable in production. You do not remove non-determinism from the model; you surround it with deterministic contracts.

There are at least four major implications.

First, errors are contained to the minimum scope. If argument validation fails before execution, you have not touched the file system, network, database, or downstream context. The blast radius stays small.

Second, error messages become precise. Instead of “tool failed,” you can say “filePath must be absolute” or “lineNumber must be an integer greater than 0.” Precise feedback is not only useful for humans. It is useful for the LLM itself, because the model can often repair its next attempt when given a sharply bounded correction.

Third, the system can self-heal. A schema failure is often recoverable. The model can retry with corrected arguments. This is much harder when the failure appears late as a vague side effect, such as corrupted history or an invalid stored record.

Fourth, subsystems can collaborate safely. Prompt builders, tool runners, storage layers, event emitters, and HTTP handlers do not need to rely on tribal knowledge. Shared schemas become shared truth. That reduces accidental coupling and makes extension safer.

One more technical concept is worth naming because it often appears in robust agent systems: the discriminated union. A union means data may legally take one of several shapes. A discriminated union adds a clear tag field such as type or kind so the system knows which branch it is looking at. For example, a tool result might be either { type: "success", data: ... } or { type: "error", message: ... }. Without the discriminator, downstream code may guess the branch. With it, downstream code can route confidently. This is the same philosophy again: no guessing at boundaries.

So when we say “Zod Schema as full-chain penetration,” we are really stating a broad architectural principle. A schema is not just validation glue for nervous developers. In an AI agent, it is the mechanism that keeps a probabilistic reasoning engine connected to a deterministic software world. It turns fuzzy generation into accountable execution. It transforms silent corruption into precise correction. It allows many subsystems to interoperate without inventing their own private meanings.

That is why the best summary is simple and worth repeating: LLMs can be non-deterministic, but system boundaries must be deterministic. Zod is one of the clearest ways to enforce that rule all the way through the stack.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 4 — Tool System Design
Token Usage: ~7,900 input + ~1,650 output

4.2 Shared Tool Inventory

One of the most striking findings in this three-system comparison is not how different the tool sets are, but how similar their core inventory has become. OpenCode, Claude Code, and OMO differ in packaging, naming, orchestration depth, and product polish, yet they converge on a common minimum toolkit for practical coding autonomy. That convergence is important because it suggests a de facto standard architecture for AI coding agents in 2026.

At the highest level, all three systems expose tools in six broad categories: file operations, search, command execution, web access, composition, and interactive clarification. These categories correspond closely to the real workflow of software engineering. A coding agent must inspect code, modify code, search for symbols or strings, run commands, retrieve external information, delegate subtasks, and occasionally ask the user a question. Remove any one of these and the agent becomes noticeably weaker.

The first shared category is file operations. All three systems include read, write, and edit capabilities. The details differ, but the pattern is stable: read returns bounded file content; write creates or replaces content; edit performs targeted structured modification, often with diff-like semantics. This matters because software engineering is not merely text generation. It is stateful transformation of an existing artifact. A useful coding agent must be able to inspect current state and produce incremental edits rather than regenerate whole files unnecessarily.

The second shared category is search. The common primitives are grep, glob, and some kind of code-oriented search. grep performs regex-based content search across files. glob performs filename or path-pattern matching, such as src/**/*.ts. Code search goes one step further by searching source structure or indexed code content rather than raw file names alone. These tools are fundamental because an agent cannot reason well about a repository it cannot navigate. Search tools are the eyes of the coding agent.

The third shared category is execution, centered on a shell tool such as bash. In all three systems, command execution is not a casual extra; it is a central capability. Tests, builds, linters, package managers, Git operations, local scripts, and environment inspection all flow through shell access. In practice, the shell tool is the highest-leverage tool in the system, which is precisely why it also becomes the highest-risk one. Claude Code adds deeper classification and sandbox logic, while OpenCode and OMO wrap execution with permissions and truncation, but all three accept the same basic reality: a coding agent that cannot execute commands remains only half-empowered.

The fourth shared category is web access. All three systems expose some version of webfetch, typically combining HTTP retrieval with a conversion layer that returns text or markdown instead of raw HTML. All three also support web search, either directly or through provider-dependent enablement. This is an important evolution beyond early “offline” coding agents. Modern software work often depends on package documentation, API references, issue threads, release notes, and current ecosystem knowledge. The web tool turns the agent from a repository-local assistant into a networked researcher.

The fifth shared category is composition. Here the common tools are task and skill. A task tool delegates work to a subagent or background worker. A skill tool loads reusable instruction bundles, templates, or domain-specific prompt modules. This category reveals a major shift in agent design. Tools are no longer only world-interaction primitives like file I/O and shell. They are also cognition-structuring primitives. Delegation and skill loading are mechanisms for reorganizing the model’s problem-solving process.

The sixth shared category is interactive clarification, represented by tools like question or ask user. Even highly autonomous coding agents still need a safe channel for explicit user input. Sometimes the issue is ambiguity. Sometimes it is permission. Sometimes the user must choose among alternatives or provide a secret that the runtime cannot infer. The existence of a dedicated question tool is therefore a design acknowledgement that autonomy has limits and that explicit human checkpoints remain valuable.

The equivalence is easiest to see in table form.

Capability	OpenCode	Claude Code	OMO	Notes
File read	`read`	`FileReadTool`	inherited `read`	Bounded file reading in all three
File write	`write`	`FileWriteTool`	inherited `write`	Full-file creation or replacement
Structured edit	`edit` / `apply_patch`	`FileEditTool` / notebook edit variants	inherited `edit` + `hashline-edit`	Diff-oriented change application
Content search	`grep`	`GrepTool`	`createGrepTools()` / `grep` wrappers	Regex search across repository
Path search	`glob`	`GlobTool`	`createGlobTools()` / `glob` wrappers	Pattern-based file discovery
Code search	`codesearch`	`ToolSearchTool` plus broader indexed discovery	`ast_grep_search`, GitHub/code-oriented extras	Different implementations, same need
Shell execution	`bash`	`BashTool`	inherited `bash` + `interactive_bash`	Core command runner
Web fetch	`webfetch`	`WebFetchTool`	inherited `webfetch`	HTTP fetch plus readable conversion
Web search	`websearch`	`WebSearchTool`	inherited `websearch` + extra search tools	Provider-dependent but present
Subagent delegation	`task`	`AgentTool` / task tools	`delegate-task`, `call_omo_agent`, background tools	Composition primitive
Reusable prompt loading	`skill`	`SkillTool`	`createSkillTool()` / `skill_mcp`	Loads reusable expertise
Ask user	`question`	`AskUserQuestionTool`	inherited `question`	Human clarification channel

This table should not be read as claiming one-to-one identity. The tools are not always semantically identical. Claude Code’s AgentTool participates in a more productized background-task system. OMO’s delegation tools integrate with its specialized multi-agent architecture. OpenCode’s tools remain comparatively direct and kernel-like. Yet the capability lattice is unmistakably shared.

That shared inventory reveals a deeper architectural truth. The tool system of a coding agent is converging toward an operating-system-like substrate. File tools correspond to storage operations. Search tools correspond to indexing and lookup. Shell tools correspond to process execution. Web tools correspond to network I/O. Task tools correspond to scheduling and parallel work. Skill tools correspond to dynamically loaded procedures or operator manuals. Question tools correspond to privileged interrupts from the human supervisor.

This is why tool count alone is a misleading metric. A system with fifty tools is not necessarily more capable than one with twenty if the extra thirty are just aliases, UI conveniences, or narrow variants. What matters more is whether the system covers the full workflow graph of software engineering. On that test, all three systems already cover the essentials.

OMO then demonstrates what happens when one keeps the shared base inventory but extends the composition layer aggressively. OpenCode supplies the foundation. Claude Code broadens and hardens the base set within a commercial product surface. OMO adds orchestration-centric tools like session management, background output retrieval, and agent spawning, but it does so without abandoning the common substrate. That is why OMO feels familiar despite being much more interventionist.

The design lesson is that the field has already discovered a stable minimum viable tool stack for coding agents. New systems can innovate above it, but they rarely replace it. If you are designing an agent from scratch, you should assume you need at least the shared inventory described here. Everything else is an optimization, specialization, or product strategy layered on top of that core.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 4 — Tool System Design
Token Usage: ~8,200 input + ~1,700 output

4.3 Tool Permissions and Safety

The moment an AI coding agent gains tools, it stops being only a language system and becomes an action system. That transition creates a new security problem. A model that can read files, write files, run shell commands, fetch network resources, or spawn subagents can produce real-world side effects. Tool permissions are therefore not a peripheral UX feature. They are the control plane of the entire runtime.

All three systems in this comparison share the same foundational permission pattern: allow, deny, and ask. This tri-state model is more expressive than a simple boolean. allow means the action can proceed silently, deny means it is blocked automatically, and ask means a human approval checkpoint is required. This is now close to a standard design pattern for agent systems because it separates policy from execution while still leaving room for human intervention.

OpenCode implements this pattern in a relatively compact and elegant way in /packages/opencode/src/permission/next.ts. Permission rules are represented as triples of permission, pattern, and action, where action is a Zod enum over allow, deny, and ask. Rules can be loaded from configuration, merged, and evaluated against a requested permission and a concrete pattern such as a file path. The system uses wildcard matching rather than full semantic analysis. That choice is important. It keeps the mechanism simple, inspectable, and portable.

Rule precedence in OpenCode is resolved through ordered merging and findLast() matching. In effect, later rules override earlier ones, which gives users and higher-precedence configuration sources a straightforward way to specialize or override defaults. OpenCode also normalizes related editing tools under a shared edit permission bucket when determining disabled tools. This is a practical simplification: from a safety perspective, write, edit, patch, and multiedit are variations of the same class of risk.

Another notable OpenCode feature is that permissions are session-aware and can be asked interactively. If a matching rule resolves to ask, the runtime constructs a permission request, publishes a bus event, and pauses until the user replies with once, always, or reject. That creates a clean bridge between static policy and live decision-making. Per-agent permissions also fit naturally into this architecture because the request already carries tool and session metadata, allowing different agents to operate under different behavioral envelopes.

Claude Code takes the same tri-state logic and expands it into a much more layered permission architecture. At the user-facing level, it exposes a four-mode control system commonly described as default, auto, bypass, and plan. These are not just UI presets. They are macro-configurations for how aggressively the runtime should seek approval, how much to trust automated checks, and whether the model is currently allowed to act or only to plan.

The key implementation sits in /src/utils/permissions/permissionSetup.ts, a file well over 1,500 lines long. That size is not accidental complexity alone; it reflects the fact that Claude Code treats permissioning as a first-class product subsystem. The file contains logic for loading rules from settings and CLI arguments, applying them to the runtime context, managing mode transitions, and identifying dangerous permissions that would undermine the classifier layer.

Two especially important additions appear beside the rule engine. First, Claude Code has a bash command classifier. Second, it has an ML-based YOLO classifier in yoloClassifier.ts. The bash classifier and dangerous-pattern logic look for commands or rule patterns that are broad enough to smuggle arbitrary execution, such as permitting interpreters or nested shell launches too loosely. The YOLO classifier adds a probabilistic policy layer: instead of matching only static rules, it inspects the action and context to predict whether a tool call should be blocked or require review.

This is a major architectural difference from OpenCode. OpenCode’s permission system is mostly symbolic: patterns, rules, precedence, and user approval. Claude Code keeps that symbolic layer but adds statistical judgment on top. The benefit is lower friction. The cost is greater complexity and less transparency. Yet in a commercial product this tradeoff is often justified, because the goal is not just correctness but throughput under realistic user impatience.

Claude Code also contains explicit dangerous-pattern detection. In permissionSetup.ts, rules that broadly allow Bash, PowerShell, or Agent spawning can be flagged as unsafe for auto mode because they would bypass classifier checks. This is a crucial insight: permission rules themselves can become vulnerabilities if they are too permissive. A safe permission system must therefore validate not only actions but also the rules used to authorize actions.

OMO largely inherits OpenCode’s base permission model, but it does not stop there. Its added hooks create a second defensive layer around tool execution. In src/plugin/tool-execute-before.ts, OMO runs a sequence of pre-tool hooks including a write-existing-file guard, a question label truncator, and a rules injector. In practice, this means OMO can impose additional checks or mutate tool invocation context before the underlying tool runs. That is a different style of safety engineering from Claude Code’s classifier-heavy design. It is more hook-driven and compositional.

OMO also applies post-tool safety and hygiene measures. Its output truncation hook can shrink oversized results after execution, and its metadata store restores execution metadata that OpenCode’s plugin wrapper would otherwise discard. Although these are not “permissions” in the narrow sense, they are part of the same safety envelope: controlling what tools may do, what they may return, and how much unbounded information they may inject back into the conversation.

The broader strategic lesson comes from Anthropic’s reported result that OS-level sandboxing can reduce permission prompts by roughly 84%. The key idea is simple: if the runtime can safely constrain what a command can affect at the operating-system level, then fewer actions require human review. Bubblewrap on Linux and Seatbelt on macOS are examples of such sandboxing technologies. They restrict filesystem, process, and network reach so that a large class of tool calls becomes mechanically safer.

This is an important shift in design philosophy. The best way to reduce permission fatigue is not to nag the user less. It is to make more actions intrinsically safe. If the environment itself prevents dangerous side effects, the agent can be granted more autonomy without increasing real risk. In security engineering terms, this moves protection from policy-only enforcement toward capability confinement.

Across the three systems, we therefore see a layered model emerging. First comes the symbolic permission rule: allow, deny, ask. Next comes runtime context: which agent, which file, which command, which session. Then come secondary defenses: hooks, dangerous-pattern detectors, classifiers, and sandboxing. The strongest systems do not rely on any single layer.

The design conclusion is straightforward. A serious coding agent should not ask for permission only because the prompt says so. It should have a formal permission model, explicit precedence rules, path or command pattern matching, support for human escalation, and, ideally, environmental confinement. Safety is not one feature. It is the combined architecture around tool execution.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 4 — Tool System Design
Token Usage: ~7,800 input + ~1,550 output

4.4 Tool Output Handling

Defining tools is only half the problem. Once a tool runs, the system must decide how to carry its result back into the agent loop. This is harder than it looks. A tool can return too much text, produce metadata the model needs but the user should not see directly, stream progress over time rather than finish instantly, or emit output that is useful for storage but too large for prompt inclusion. Tool output handling is therefore a core part of context engineering.

All three systems in this comparison recognize the same basic constraint: raw tool output cannot be passed back to the model without bounds. A single large command result, giant file read, or verbose web fetch can consume the context window, drown out relevant information, and degrade the next reasoning step. For that reason, each system implements some combination of truncation, structured metadata, and staged rendering.

OpenCode handles this directly in tool/tool.ts. After a tool executes, Tool.define() automatically routes the textual output through Truncate.output(...) unless the tool has already declared its own truncation behavior by setting result.metadata.truncated. This is an important framework-level decision. Instead of trusting each tool author to remember output budgeting, OpenCode makes bounded output the default behavior of the tool runtime.

The return object in OpenCode is also structured rather than ad hoc. A tool returns title, metadata, output, and optionally attachments. The metadata() callback exposed in the execution context lets the tool annotate itself during execution, while the final post-processing layer can append truncation indicators and an outputPath when the content is too large and has been offloaded. The result is then stored in session state as part of a multi-part message structure. In other words, tool output is not just printed; it becomes a typed artifact in the conversation log.

This is one of OpenCode’s understated strengths. By treating tool results as structured session parts rather than transient terminal noise, it preserves the possibility of later compaction, replay, summarization, and UI-specific rendering. The tool result is both machine-usable and persistence-friendly.

Claude Code generalizes this idea much further. In /src/Tool.ts, tools return a typed ToolResult, and the interface includes explicit mechanisms for mapping content into Anthropic tool-result blocks, rendering result messages, extracting searchable transcript text, rendering progress messages, and grouping parallel tool uses. This is a broader notion of output handling than simple truncation. Claude Code treats tool output as something that may need to be represented differently for different audiences: the model, the transcript indexer, the live UI, the brief view, and the progress renderer.

A major Claude Code addition is progress events. Many tools do not simply start and end. They fetch, stream, search, spawn sub-processes, or orchestrate subagents. Claude Code therefore includes structured progress types such as BashProgress, MCPProgress, SkillToolProgress, TaskOutputProgress, and WebSearchProgress. A tool can emit intermediate progress updates while still producing a final result object at the end. Architecturally, this is significant because it separates execution observability from final output.

Claude Code also explicitly tracks maximum result size through maxResultSizeChars. When a result exceeds that bound, the system can persist the full output to disk and give the model a preview plus a file path instead. This pattern is similar in spirit to OpenCode’s outputPath, but it is more deeply integrated into the product’s rendering and transcript systems. The runtime is not only preventing overflow; it is deciding how overflow should still remain inspectable.

Another important detail is that Claude Code distinguishes between model-facing serialization and user-facing rendering. The content used in mapToolResultToToolResultBlockParam() is not necessarily identical to the content shown in the terminal UI or indexed in transcript search. This is good ACI design. The model needs concise, structured, semantically important information. The user may want richer formatting or progress summaries. The search layer needs flattened visible text. Treating these as separate surfaces avoids forcing one representation to serve incompatible purposes.

OMO inherits OpenCode’s basic tool result model, but because it is implemented through the plugin layer it encounters a specific problem: OpenCode’s generic fromPlugin() wrapper can overwrite plugin metadata with truncation-related metadata. OMO compensates for this through its own metadata store in src/features/tool-metadata-store/store.ts. During execution, a tool can stash pending metadata keyed by sessionID and callID. Then, in src/plugin/tool-execute-after.ts, OMO consumes that stored metadata and merges it back into the tool result before the session processor finalizes the message.

This is a small but revealing design move. It shows what happens when an orchestration layer has to preserve richer semantics than the host runtime originally exposed through the plugin bridge. OMO is effectively repairing a metadata-loss boundary so that higher-level orchestration tools can carry titles, session identifiers, and custom annotations through the full execution pipeline.

OMO also adds a dedicated post-tool truncation hook in src/hooks/tool-output-truncator.ts. The hook targets selected high-volume tools such as grep, glob, lsp_diagnostics, ast_grep_search, interactive_bash, skill_mcp, and webfetch, with tighter limits for especially noisy sources like web pages. The truncator operates after execution and can be configured to apply broadly. This gives OMO a second chance to enforce context discipline even when the base tool or host runtime is not enough.

This layered output handling is especially important in a multi-agent system. A background agent may produce a large amount of intermediate reasoning or tool output that the parent agent does not need in full. Without aggressive output shaping, subagent orchestration rapidly becomes a context-window disaster. OMO’s post-tool hooks are therefore not just convenience features. They are essential for keeping multi-agent work economically viable.

Across all three systems, a common pattern emerges. First, tool output is structured rather than free-form. Second, large outputs are truncated, summarized, or spilled to disk. Third, metadata is preserved as a separate channel from human-readable text. Fourth, there is a growing separation between execution progress, final machine-facing result, and final user-facing rendering.

This has a direct implication for agent design. When people discuss tool quality, they often focus on input schema and capability coverage. But output handling may be equally important. A great tool with sloppy output handling can still harm the system by flooding context, obscuring key facts, or losing essential metadata. Conversely, a well-designed output pipeline can make even noisy external systems usable by converting them into bounded, structured observations.

The larger lesson is that tool output is part of the agent’s perception system. If tool definitions specify what the agent can do, output handling specifies what the agent can successfully understand afterward. In an autonomous coding agent, that distinction is foundational.

Model: openai/gpt-5.4 Generated: 2026-04-01 Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 5 — Session and Context Management Token Usage: Approx. 1,450 output tokens for this section

5.1 Session Persistence

Session persistence is the hidden spinal cord of a coding agent. Without it, an agent is just a short-lived chatbot with temporary memory. With it, the agent becomes a continuing software worker: it can survive long tasks, resume after interruption, preserve audit trails, and support higher-level behaviors such as compaction, recovery, forked sessions, and background delegation. OpenCode, Claude Code, and Oh-My-OpenCode (OMO) all solve this problem, but they do so with notably different storage philosophies.

OpenCode: relational persistence on SQLite + Drizzle ORM

OpenCode uses a relational storage model. Its session data is stored in SQLite, with Drizzle ORM defining the schema and managing access. The core schema is exposed through storage/schema.ts, which re-exports tables such as SessionTable, MessageTable, PartTable, TodoTable, and PermissionTable from session/session.sql.ts. The database bootstrapping logic in storage/db.ts shows a Bun SQLite database initialized with Drizzle and then migrated through SQL migration files.

This choice matters because OpenCode treats sessions as structured entities, not just logs. A session row contains an id, slug, project_id, directory, title, version, timestamps, optional summary statistics, revert metadata, permission state, and archival state such as time_archived. In other words, OpenCode models a session almost like a first-class business object. The message history is normalized into separate message and part tables, which means the system can query, update, or clean up individual layers of the conversation instead of rewriting one giant blob.

This relational model gives OpenCode several advantages. First, it is queryable. You can ask for all sessions in a project, all messages in a session, all parts in a message, or all todos bound to a session. Second, it supports schema evolution in a disciplined way. storage/db.ts explicitly loads and applies migrations, which means new features can be added without abandoning existing data. Third, it supports richer workflows such as revert, archival, diff summaries, and structured session metadata because those are easy to represent in columns and related tables.

There is also an architectural implication: OpenCode assumes that the session store is part of the application’s internal control plane. Persistence is not merely for debugging; it is an active substrate for the agent runtime.

Claude Code: JSONL transcripts as append-only logs

Claude Code takes a very different approach. Instead of a relational database, it persists sessions as JSONL files under ~/.claude/sessions/-style project storage, with the concrete path logic handled by src/utils/sessionStorage.ts. The active transcript path is computed dynamically, and each session is written as a .jsonl file. A JSONL file, short for JSON Lines, is a simple line-oriented format where each line is one independent JSON object. It is widely used in engineering systems, but it is not one of the classic named formats usually emphasized in CS textbooks like CSV, XML, or normalized SQL tables. Conceptually, JSONL is a stream-friendly logging format rather than a formally standardized database model.

Claude Code’s persisted unit is the TranscriptMessage. In src/types/logs.ts, a transcript message is essentially a serialized message plus metadata such as parentUuid, isSidechain, sessionId, timestamp, optional agent information, and other resume-related fields. The session store also records extra entry types such as summaries, task summaries, worktree state, PR links, and content replacement records. This makes the transcript file a hybrid of conversation log and event log.

The key design principle is append-only persistence. As the conversation evolves, Claude Code can stream new entries into the JSONL file incrementally. That makes writes simple and operationally robust. There is no need to manage multi-table transactions for ordinary transcript growth. The format is also easy to inspect with generic tooling, easy to move between machines, and naturally aligned with replay-oriented recovery.

Just as important is what Claude Code deliberately excludes. sessionStorage.ts explicitly treats progress messages as ephemeral UI state rather than transcript truth. Comments in that file explain that progress entries must not participate in the parent chain, because doing so can fork the conversation graph and break resume behavior. This is a subtle but important distinction: Claude Code separates persistent semantic history from transient runtime chatter.

OMO: OpenCode persistence plus continuation overlays

OMO inherits OpenCode’s core persistence architecture because it is built on top of OpenCode rather than replacing the host runtime. That means its baseline conversation storage remains SQLite plus Drizzle-managed session/message/part state. However, OMO extends the persistence story with continuation-specific features.

The claude-code-session-state feature keeps track of relationships such as main session ID, subagent session membership, and session-to-agent mappings. This is small in code, but conceptually important: OMO needs to know not only that a session exists, but also whether that session belongs to the main thread or to a delegated specialist.

The run-continuation-state feature adds file-based continuation markers. Its storage module writes small JSON files keyed by session ID and records which continuation source is active, when it was updated, and why. OMO also introduces boulder-state tracking for plan continuity. In features/boulder-state/storage.ts, it persists plan identity, start time, associated session IDs, and progress-related information in a .sisyphus state file. This is not just conversation persistence; it is workflow persistence.

So OMO’s persistence model is layered:

OpenCode relational storage for canonical sessions and messages.
Auxiliary continuation state for orchestration-specific recovery.
Plan and agent lineage state for long-running autonomous workflows.

This layering reflects OMO’s broader philosophy: a session is not merely a chat transcript but an execution thread in a multi-agent system.

Relational vs log-based persistence: the real tradeoff

The contrast between OpenCode and Claude Code is not “database good, files bad.” It is a deeper design tradeoff between two persistence metaphors.

The relational model gives structured queries, referential integrity, explicit migrations, and easy support for features like summaries, permissions, reverts, todos, and archival flags. It fits systems that treat sessions as entities with internal structure. The cost is complexity. A relational store requires schema design, migration discipline, and more careful update logic.

The log-based model gives streaming writes, append-only simplicity, human-inspectable artifacts, and natural replay semantics. It fits systems that think in terms of event history and incremental transcript growth. The cost is that queries become harder, compaction and cleanup may require bespoke parsing logic, and “state” often has to be reconstructed by replaying entries or scanning the file.

OpenCode leans toward stateful structure. Claude Code leans toward durable event history. OMO shows a third pattern: hybrid layering, where a structured core is augmented with extra lightweight files for orchestration continuity.

Design lesson

For agent designers, the lesson is clear. If your agent needs search, analytics, revert, permissions, per-part editing, and deep orchestration, a relational substrate is powerful. If your highest priorities are streaming durability, operational simplicity, and portable replay, JSONL-style append-only transcripts are extremely attractive. In practice, the strongest future systems may mix both: an append-only event log for raw truth, plus indexed relational or materialized views for fast control-plane operations.

Session persistence is therefore not a storage afterthought. It encodes how an agent thinks about memory, identity, and time.

Model: openai/gpt-5.4 Generated: 2026-04-01 Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 5 — Session and Context Management Token Usage: Approx. 1,520 output tokens for this section

5.2 Message Structure

If session persistence defines where an agent remembers, message structure defines what it remembers. Modern coding agents do not store a conversation as a simple list of strings. They store heterogeneous, multi-part records containing user text, assistant output, reasoning traces, tool calls, files, patches, snapshots, attachments, and system events. This richer structure is essential because coding work is not just dialogue. It is dialogue interleaved with actions.

OpenCode: `MessageV2` as a typed multi-part object

OpenCode’s most explicit message model appears in session/message-v2.ts. The key design is a parts array governed by discriminated unions. In plain English, a discriminated union is a typed structure where every variant contains a field such as type, and that field tells the program which shape of data follows. This pattern is common in modern typed programming, although the term itself is not always highlighted in introductory CS textbooks.

In OpenCode, the message does not flatten everything into raw text. Instead, it decomposes content into typed parts such as:

text
reasoning
tool
file
snapshot
patch
compaction
retry
step-start / step-finish
subtask
agent

This is a very agent-native design. A file attachment is not forced into plain text. A patch is not merely prose that says “I changed three files”; it is represented as a patch part with file metadata. A reasoning block is not mixed with user-visible output. A tool call has state transitions such as pending, running, completed, and error, each with its own structure.

The result is a message model that can support sophisticated behavior: structured rendering in a TUI, selective compaction, replay, tool-result pruning, cost accounting, patch recovery, and machine-readable analytics. It also means OpenCode can convert stored messages into provider-facing model messages later, rather than storing only the provider-ready form.

OpenCode’s message model is therefore not just a transport format. It is an internal semantic representation of agent work.

Claude Code: `TranscriptMessage` plus content blocks

Claude Code’s structure is different but no less sophisticated. The persistent layer uses TranscriptMessage in src/types/logs.ts, while the runtime message system uses message variants such as user, assistant, attachment, and system. Each of those can carry structured content blocks rather than a single string.

This block-oriented design closely follows Anthropic-style API messaging. A single assistant message may contain visible text, thinking blocks, tool-use blocks, and other structured content. Likewise, a user message may include text, tool results, images, or documents. In other words, Claude Code stores message content in a shape that is already aligned with model interaction.

This has two consequences. First, Claude Code’s transcript format is naturally suited to replaying the exact conversation into the next model call. Second, it makes the persistent transcript feel like an execution log of API-compatible blocks. OpenCode, by contrast, feels more like a typed internal domain model that can later be transformed into API messages.

The Claude Code model also extends beyond ordinary dialogue. logs.ts defines transcript-adjacent entry types for worktree state, content replacement, task summaries, context-collapse commits, attribution snapshots, and more. So while the main conversation is block-based, the full session history is actually a mixed event stream.

Extended thinking and the separation of reasoning from output

One of the most important message-structure questions in modern agents is how to handle reasoning. Claude-family systems often distinguish between user-visible answer text and internal or semi-internal “thinking” content. In Claude Code, the codebase explicitly handles thinking and redacted_thinking blocks during token estimation and compaction. This shows that reasoning is stored as its own content type rather than being fused into ordinary assistant prose.

That separation is architecturally significant. It allows the system to:

render only final output to the user,
preserve or strip reasoning depending on policy,
estimate token costs more precisely,
compact conversations without losing visible intent,
recover from malformed ordering problems involving thinking and tool blocks.

OpenCode has a parallel concept through reasoning parts in MessageV2. Again, the idea is not that the assistant’s chain-of-thought is dumped into one blob. Instead, reasoning is recognized as a distinct layer of the message. That makes it easier to keep reasoning structurally separate from the answer text while still tracking it as part of the execution history.

This separation is one of the defining differences between agent-era message systems and classic chatbot transcripts. In a simple chatbot, “the answer” is a string. In a coding agent, “the answer” may be only one visible slice of a larger structured action record.

Why multi-part messages matter

The practical value of multi-part messages becomes obvious in coding workflows.

When an agent reads a file, the event is not just “assistant said something.” It may involve a tool-use block, a tool-result block, a file attachment, a reasoning segment explaining why the file matters, and a later patch block. If all of this is flattened into plain text, the system loses operational meaning. It becomes much harder to compact safely, recover from interruption, or resume a tool chain without ambiguity.

Typed parts preserve semantics. They let the platform answer questions like:

Which content came from a tool?
Which content was visible output?
Which content was hidden reasoning?
Which files were attached or edited?
Which message boundary should survive compaction?
Which tool results can be cleared without harming continuity?

This is why modern agent systems increasingly resemble miniature event-sourced operating systems rather than chat logs.

Explaining JSONL in context

Claude Code persists transcript entries in JSONL. Since this term is often unfamiliar outside systems practice, it is worth defining carefully. JSONL means JSON Lines: one JSON object per line in a plain text file. Each line stands on its own as valid JSON, and the file as a whole is a sequence of records rather than one enclosing array.

Example idea:

{"type":"user", ...}
{"type":"assistant", ...}
{"type":"summary", ...}

Why use it? Because it is easy to append. A program can write one new line at a time without rewriting the whole file. That makes JSONL ideal for streaming systems, logs, and large incremental transcripts. It is simple in practice, but it is better understood as an engineering convention than as a classic textbook data model.

Comparative reading

OpenCode’s MessageV2 is more explicit as an internal ontology of agent actions. Claude Code’s TranscriptMessage ecosystem is more log-oriented and closer to API block semantics. OMO inherits OpenCode’s part-based structure and then overlays orchestration behaviors on top of it.

The tradeoff is similar to the persistence tradeoff in the previous section. OpenCode optimizes for typed internal manipulation. Claude Code optimizes for durable replay and incremental transcript growth. Both are valid, but they reflect different priorities.

Design lesson

The deeper lesson is that message structure should be designed for the actual unit of work in an agent system. That unit is not a sentence. It is a bundle of intent, reasoning, tool interaction, and artifact mutation. Systems that preserve that structure gain better compaction, better recovery, better analytics, and better UI rendering.

If future coding agents converge on a common message standard, it will likely look less like “chat messages” and more like a typed graph of conversational and operational blocks. OpenCode and Claude Code approach that future from different directions, but both already show that plain text transcripts are no longer enough.

Model: openai/gpt-5.4 Generated: 2026-04-01 Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 5 — Session and Context Management Token Usage: Approx. 1,560 output tokens for this section

5.3 Context Compaction

Every long-running coding agent eventually collides with the same physical limit: the context window is finite, but real engineering work is not. A difficult bug hunt, a multi-file refactor, or a day-long autonomous run can easily produce far more tokens than a model can safely carry forward. Context compaction is the family of strategies used to survive that limit without losing the thread of work.

The background problem: context rot

The overflow problem is not only about hard token ceilings. It is also about quality degradation before the ceiling is reached. As context grows, retrieval inside the model becomes less reliable. Attention computation in transformers scales roughly with sequence length, often described in simplified form as quadratic, or O(n²), with respect to token count. Even when the model technically accepts a large prompt, the practical quality of recall, prioritization, and grounding can decline. This deterioration is often informally called context rot: the conversation still fits, but the agent remembers less clearly and reasons less sharply.

So compaction serves two purposes:

avoid hard overflow errors,
restore cognitive sharpness by reducing stale or noisy context.

OpenCode: summarize, mark, and continue

OpenCode’s compaction logic lives in session/compaction.ts. Its design is comparatively direct and elegant. The system estimates overflow by comparing token usage against the model’s usable input window after reserving room for output. If auto-compaction is enabled and the threshold is reached, OpenCode can summarize the session and continue.

The summary prompt in compaction.ts is revealing. It asks for a continuation-oriented summary with sections such as goal, instructions, discoveries, accomplished work, and relevant files. That is a strong signal that OpenCode views compaction not as compression in the abstract, but as a handoff from one working memory state to the next.

OpenCode also includes a lighter-weight mechanism: pruning old tool outputs. The prune function walks backward through earlier parts and clears completed tool outputs once enough token mass has accumulated, while protecting certain tools. This is important because tool output is often high-volume and low-value after its immediate use. Clearing old tool results is the mildest possible compaction: the message graph remains mostly intact, but bulky observations are trimmed.

So OpenCode effectively has two levels:

light compaction: prune old tool results,
hard compaction: summarize the conversation and reset the active context.

Claude Code: a five-layer defense system

Claude Code takes compaction further and turns it into a multi-strategy subsystem. The files under services/compact/ reveal a layered design rather than a single summarization trigger. Conceptually, Claude Code uses at least five defensive layers:

Auto-compact — proactive compaction when token use passes a threshold.
Snip-compact — selective removal or reduction of older context slices.
Micro-compact — minimal surgery, especially clearing heavyweight tool results.
Session memory compact — preserve durable extracted memory while compressing transcript history.
Context collapse — a stronger restructuring mechanism that commits summarized spans and reconstructs them later.

autoCompact.ts shows a threshold-driven policy with reserved output budget, warning buffers, error buffers, and circuit-breaker logic for repeated failures. This is not a naive “if full, summarize” implementation. It is a production-grade traffic control system.

microCompact.ts shows the lightest-touch strategy: identify compactable tool uses and clear old tool result content. This is especially effective because file reads, shell outputs, grep results, and web fetches often dominate token volume without needing verbatim preservation forever. In other words, Claude Code tries to delete the least semantically valuable tokens first.

sessionMemoryCompact.ts adds another layer: preserve extracted session memory and compact around it while maintaining API invariants such as tool-use and tool-result pairing. This file is full of careful boundary logic, showing how hard real compaction becomes once a transcript contains streaming fragments, tool calls, and thinking blocks.

The “context collapse” mechanism, referenced in transcript entry types such as marble-origami-commit, goes even further. It treats context management as structured archival, not just summarization. This is closer to partial checkpointing than ordinary compaction.

In short, Claude Code treats compaction as a strategic stack, not a single feature.

OMO: compaction with orchestration awareness

OMO inherits OpenCode’s base compaction but adds orchestration-aware safeguards. Three additions are especially important.

1. Preemptive compaction

The preemptive-compaction hook watches token usage after assistant updates and tool executions. When usage reaches a defined fraction of the actual limit, it calls session summarization before a hard overflow occurs. This matters because autonomous multi-agent runs are more vulnerable to abrupt overflow than interactive chat sessions. A preemptive trigger buys safety margin.

2. Compaction context injector

The compaction-context-injector hook strengthens the summary prompt. Instead of relying on a generic “summarize what matters,” it explicitly demands sections for original user requests, final goal, completed work, remaining tasks, active files, constraints, verification state, and delegated agent sessions with session_id values. This is a crucial innovation. OMO knows that summary failure in a multi-agent environment is often not about missing prose; it is about missing execution context.

3. Todo preservation

The compaction-todo-preserver hook snapshots todos before compaction and restores them afterward if needed. This is a subtle but powerful feature. In autonomous workflows, the todo list is not just UI decoration. It is externalized short-term intent. Losing it during compaction can derail the run even if the prose summary is good.

Together these features make OMO less likely to suffer from “successful compaction, failed continuation.” It does not merely shrink context; it protects the scaffolding required to keep working.

graph TD
    subgraph "Claude Code: 5-Layer Defense"
        CC1["Auto-Compact<br/>Token limit trigger"] --> CC2["Snip-Compact<br/>History snipping"]
        CC2 --> CC3["Micro-Compact<br/>Incremental"]
        CC3 --> CC4["Session Memory<br/>Compress memory files"]
        CC4 --> CC5["Context Collapse<br/>Old msgs → summaries"]
    end
    
    subgraph "OMO: Preemptive + Protective"
        OMO1["Preemptive Compaction<br/>Trigger BEFORE overflow"] --> OMO2["Context Injector<br/>Preserve essential context"]
        OMO2 --> OMO3["Todo Preserver<br/>Protect todos during compact"]
    end
    
    subgraph "OpenCode: Summarize & Reset"
        OC1["Detect threshold"] --> OC2["Generate summary"]
        OC2 --> OC3["Reset with summary"]
    end

Tool-result clearing: the lightest touch

Across these systems, the most underrated compaction strategy is tool-result clearing. It is attractive because it minimizes semantic damage. Old bash output, grep listings, and file-read bodies are often useful at the moment of observation but not as permanent prompt residents. Clearing or stubbing them yields large token savings without rewriting the task narrative.

This is why micro-compact style approaches are so important. They act like garbage collection before full checkpointing becomes necessary.

Comparative analysis

OpenCode’s approach is conceptually clean: estimate overflow, summarize, optionally replay the user prompt, and continue. Claude Code’s approach is broader and more defensive: warning thresholds, micro-compaction, session memory, context collapse, and fallback behaviors. OMO extends the OpenCode path with orchestration-preserving hooks that make summaries more continuation-safe.

The tradeoff is complexity versus robustness.

OpenCode is easier to reason about and easier to extend.
Claude Code is harder to reason about but more resilient under pathological long-context conditions.
OMO demonstrates that compaction quality is not only about compression algorithms; it is also about preserving workflow control state.

Design lesson

The best future architecture is probably multi-layered:

clear stale tool outputs first,
preserve explicit task state such as todos and active files,
summarize only when necessary,
keep resumable memory artifacts outside the main prompt,
reserve a stronger archival mode for very long sessions.

Compaction is therefore less like deleting history and more like building a hierarchy of memory: hot context, warm summaries, and cold archives. Coding agents that master that hierarchy will scale to much longer, more autonomous work without collapsing under their own transcript weight.

Model: openai/gpt-5.4 Generated: 2026-04-01 Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 5 — Session and Context Management Token Usage: Approx. 1,500 output tokens for this section

5.4 Session Recovery and Continuation

Persistence keeps the past. Recovery and continuation decide whether that past can become a usable future. This distinction is crucial. A coding agent may have perfectly stored transcripts and still fail catastrophically after interruption if it cannot reconstruct state, resume intent, and continue from the correct branch of work. Long-running agent systems live or die by how well they solve this continuation problem.

The continuation problem

The continuation problem can be stated simply: after a crash, compaction, model error, machine switch, or context reset, how does an agent continue the same job without losing momentum or repeating work?

This is harder than ordinary persistence because the system must restore more than conversation text. It may need to recover:

active task intent,
working directory or worktree,
todo state,
background subagent lineage,
plan progress,
file diffs or snapshots,
tool-use invariants,
the last safe restart point.

In other words, continuation requires reconstructing an execution state, not just a transcript.

OpenCode: archive, load, fork, and revert

OpenCode’s recovery model begins with its structured session store. Because sessions are modeled explicitly, OpenCode can archive them, list them, load them, and fork them. In session/index.ts, a session includes time.archived, and queries can include or exclude archived sessions. That gives the platform lifecycle control over dormant conversations.

More interesting is session/revert.ts. This file implements revert logic that can roll the session back to a target message or part, restore file snapshots, compute diffs over the reverted range, and then clean up later messages or parts. This is not merely “undo the last reply.” It is session-level time travel backed by snapshots and patch tracking.

This makes OpenCode resilient in a very particular way. It can recover not only from interruption but also from bad turns. If an agent takes the wrong path, the runtime can rewind the session’s semantic and file-system state together. That is a powerful continuation primitive because resuming the wrong state is sometimes worse than not resuming at all.

OpenCode also supports branching behavior through session forking. A forked session can preserve history up to a point and then continue independently. That is another answer to the continuation problem: sometimes the safest resume is not overwrite-in-place but branch-and-continue.

Claude Code: `/resume`, worktrees, and teleportation

Claude Code’s continuation features are centered around transcript replay and environment restoration. The CLI logic in src/cli/print.ts shows explicit handling for --continue, --resume, and --teleport. The code restores messages, reuses or switches the session ID, reloads session metadata, and restores worktree state when applicable.

The worktree support is especially important. types/logs.ts defines PersistedWorktreeSession, storing fields such as originalCwd, worktreePath, worktreeName, branch information, and the session ID. That means a resumed conversation can return not only to the right transcript but also to the right isolated Git working environment. For coding agents, this is a major practical feature because environment drift is one of the most common causes of broken continuity.

Claude Code also supports “teleport” workflows, visible in both CLI handling and bootstrap state. Teleportation is essentially continuation across machines or execution contexts. Instead of treating a session as tied to one local process, Claude Code can hydrate or resume it elsewhere. This is a strong example of log-based persistence paying off: append-only transcript artifacts are relatively portable.

In addition, Claude Code stores subagent metadata, content replacement records, task summaries, and worktree state sidecars. This means resume is not merely loading old chat text; it is a reconstruction pipeline.

OMO: recovery hooks and multi-agent continuity

OMO raises the bar because it must recover not only a main thread but also an orchestration graph.

Session-recovery hook

The session-recovery hook in hooks/session-recovery/hook.ts detects recoverable assistant errors such as missing tool results, thinking block order problems, and thinking-disabled violations. It can abort the session, inspect recent messages, repair structural issues, notify the UI, and optionally auto-resume from the last user message. This is a more surgical form of recovery than ordinary resume. It repairs the transcript so continuation becomes possible again.

Boulder-state tracking

OMO’s boulder state persists active plan identity, start time, associated session IDs, and plan name. This is critical for Sisyphus-style long autonomous runs. If the process dies, the system does not only know “which conversation existed”; it knows “which plan was active, which sessions participated, and where the run was in the broader workflow.”

`session_id` continuation for subagents

OMO also treats background agent continuity as a first-class concern. The compaction context injector explicitly tells the summarizer to preserve delegated agent session_id values and to resume existing agent sessions rather than spawning fresh ones. This is a major design insight. In multi-agent systems, naively restarting a subagent after compaction wastes tokens, loses learned context, and can duplicate research or edits.

The small claude-code-session-state module tracks main versus subagent sessions and session-to-agent mapping, while run-continuation markers in features/run-continuation-state/storage.ts persist whether continuation is active and why. These mechanisms externalize orchestration state so it survives beyond one model call.

Recovery philosophies compared

OpenCode’s recovery is strongest in stateful rollback and branching. Claude Code’s recovery is strongest in portable replay and environment restoration. OMO’s recovery is strongest in workflow continuity across orchestration layers.

That difference mirrors the core identities of the systems.

OpenCode thinks in terms of sessions as structured objects that can be reverted and forked.
Claude Code thinks in terms of sessions as replayable transcripts that can be resumed, moved, and rehydrated.
OMO thinks in terms of sessions as nodes in a larger multi-agent process that must survive interruption without losing delegated work.

Why continuation is harder after compaction

Compaction sharpens the continuation problem because it deliberately destroys detail. After a compact, the agent is no longer resuming from full history but from a compressed representation of history. That means the quality of the summary, the preservation of todo state, and the retention of agent lineage all become decisive.

This is why OMO’s preservation hooks matter, why Claude Code persists extra metadata like worktree state and content replacements, and why OpenCode’s revert and snapshot mechanisms remain valuable. A continuation architecture must assume that not all future resumes happen from pristine full transcripts.

Design lesson

The best continuation design for future coding agents should combine four capabilities:

durable transcript replay for basic resume,
environment restoration for cwd, branch, and worktree continuity,
stateful rollback and branching for recovering from bad turns,
workflow lineage preservation for subagents, plans, and todos.

In short, session recovery should be treated as distributed systems engineering, not as a UX convenience feature. The stronger the agent’s autonomy, the more important continuation becomes. A truly capable coding agent is not the one that can merely start well. It is the one that can stop, recover, and keep going.

Model: openai/gpt-5.4 Generated: 2026-04-01 Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 6 — LLM Provider Abstraction Token Usage: Approx. 1,550 output tokens for this section

6.1 Multi-Model Support Architecture

An agent that can only talk to one model is not really an agent platform; it is an application welded to one vendor. Chapter 6 begins with the layer that prevents that weld. The LLM provider abstraction is the architectural boundary that translates agent intent into provider-specific API calls while trying to preserve a consistent runtime contract for streaming, tool use, multimodal input, pricing, and authentication. OpenCode, Claude Code, and Oh-My-OpenCode (OMO) all solve this problem, but they solve it at different levels of abstraction.

OpenCode: Vercel AI SDK as universal transport layer

OpenCode is the most explicitly provider-agnostic of the three systems. In packages/opencode/src/provider/provider.ts, the runtime imports a broad catalog of SDK adapters: Anthropic, OpenAI, Azure, Google, Vertex, Vertex Anthropic, Amazon Bedrock, Groq, Mistral, DeepInfra, Cerebras, Cohere, Together, Perplexity, XAI, OpenRouter, Vercel, GitLab, and GitHub Copilot, among others. The file’s bundled provider map makes the architecture obvious: OpenCode treats each upstream LLM service as a pluggable backend behind one common provider interface.

The enabling substrate is the Vercel AI SDK ecosystem. This is not a classic textbook abstraction, so it deserves explanation. Conceptually, the Vercel AI SDK is a unification layer for multiple LLM vendors. Instead of writing separate request code for every provider, an application targets a shared TypeScript interface and swaps concrete provider adapters underneath it. OpenCode leans heavily on this strategy. The result is a system that can integrate 20+ providers without having 20 entirely separate agent implementations.

This approach is reinforced by two internal layers. First, provider/provider.ts resolves models and loads the right SDK constructor, sometimes with custom loader logic for edge cases such as OpenAI Responses, Azure completion URLs, or GitHub Copilot chat versus responses mode. Second, provider/transform.ts defines a ProviderTransform layer that normalizes the messy edge conditions of cross-provider use. That file rewrites message shapes, tool call IDs, caching hints, reasoning payload placement, unsupported modality handling, and provider-specific option keys. In other words, OpenCode does not just abstract transport. It also abstracts incompatibility.

That normalization layer is the real reason the design scales. A naive provider abstraction usually collapses at the first serious mismatch: one vendor wants top_p, another rejects empty assistant content, another requires tool IDs with exact formatting, another exposes reasoning text in a provider-specific field. OpenCode centralizes those differences rather than leaking them into the agent loop. The more than 40 @ai-sdk/* packages in the ecosystem matter less as a raw number than as evidence of architectural ambition: OpenCode wants model choice to be a deploy-time concern, not a rewrite event.

Claude Code: Anthropic-first, fallback-capable, deeply optimized

Claude Code takes almost the opposite stance. It is not model-agnostic by default; it is Anthropic-first by design. Its main logic lives in src/utils/model/model.ts, a 618-line file that handles model naming, defaults, aliases, subscriptions, 1M-context upgrades, and provider-aware defaults. src/utils/model/providers.ts shows the provider surface clearly: firstParty, bedrock, vertex, and foundry. That is a much narrower provider set than OpenCode’s catalog.

But narrow does not mean simplistic. Claude Code’s model resolution pipeline is more opinionated and more tightly integrated with product behavior. The priority order is explicit: session override from /model, then startup override from CLI flags, then ANTHROPIC_MODEL environment variable, then saved settings, then the built-in default. This precedence chain matters because it tells us the abstraction is not merely about API transport. It is about product control over model identity across a long-lived session.

Claude Code therefore practices a form of deep model-tool co-optimization. That phrase also deserves explanation. In textbook software architecture, we often separate the engine from the application logic. In agent systems, however, model behavior and tool behavior often need joint tuning. Prompt format, thinking mode, tool schema style, context compaction strategy, and permission UX may all be calibrated for one family of models. Claude Code benefits from this because Anthropic controls both the model family and much of the surrounding runtime. The system can assume more, optimize harder, and expose richer product behaviors such as context upgrades, model aliases, and provider-specific defaults for Bedrock, Vertex, and Foundry when first-party access is not the transport.

The cost is portability. Claude Code is not trying to be a universal LLM shell. It is trying to be the best possible runtime for Claude-family workflows, with carefully bounded enterprise fallbacks.

OMO: semantic abstraction above provider abstraction

OMO adds a third architectural pattern. It inherits OpenCode’s provider layer, so it already benefits from the wide provider surface and the Vercel AI SDK ecosystem. But it introduces another abstraction above raw model names: semantic categories. In src/tools/delegate-task/category-resolver.ts, categories such as visual-engineering, ultrabrain, quick, deep, and writing are resolved into actual model choices using available-model inspection, category defaults, fallback chains, and user overrides. subagent-resolver.ts performs similar logic for agent-specific model resolution.

This is a more important innovation than it may first appear. Most multi-model systems still force the user or the agent to think in model IDs: “use sonnet,” “switch to gemini,” “call a fast model,” and so on. OMO instead asks for the nature of the work. That is a semantic scheduling layer rather than a provider layer. The user selects intent; the runtime selects the model.

This design is meant to reduce what OMO calls model self-perception bias. A frontier model may claim it is good at everything, or a user may overfit to brand reputation rather than workload shape. By routing through categories, OMO shifts the decision from self-description to policy. It is, in effect, a task-to-model compiler.

This also explains why OMO is best understood as an orchestration system, not just an OpenCode plugin bundle. OpenCode says, “many providers can fit behind one interface.” OMO says, “many models can fit behind one task category.” Those are different abstraction levels.

The fundamental tradeoff: portability versus optimization

These three designs expose the central tradeoff of LLM provider abstraction.

The model-agnostic strategy, represented by OpenCode, maximizes portability, bargaining power, and ecosystem reach. It is attractive for open-source systems, experimental teams, and organizations that want to avoid single-vendor dependence. It also creates healthy competitive pressure: if one provider improves pricing, latency, or context length, the platform can adopt it without redesigning the agent core.

The model-specific strategy, represented by Claude Code, maximizes optimization. If the runtime knows its model family deeply, it can design tool schemas, context policies, thinking behaviors, and UX affordances that align tightly with actual model behavior. This often produces a better end-user product, but it increases lock-in and narrows substitutability.

OMO points toward a hybrid future. It accepts OpenCode’s provider portability, but then adds a policy layer that is not provider-centric at all. That may be the more durable design pattern for advanced agents: abstract the transport at the bottom, abstract task semantics at the top, and leave raw model names as an expert escape hatch rather than the primary UX.

Design lesson

The deeper lesson is that “provider abstraction” is not one thing. There are at least three layers:

Transport abstraction: one code path for many provider SDKs.
Behavior normalization: one runtime contract despite incompatible APIs.
Policy abstraction: one task vocabulary that hides model-brand complexity.

OpenCode is strongest at the first two. Claude Code is strongest at optimizing the whole stack around a narrower provider set. OMO is strongest at the third. The best future agent architecture will probably combine all three: universal provider adapters, strong transformation layers, and semantic task routing that keeps users and agents from obsessing over model names.

Model: openai/gpt-5.4 Generated: 2026-04-01 Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 6 — LLM Provider Abstraction Token Usage: Approx. 1,500 output tokens for this section

6.2 Model Capability Detection

Supporting many models is only the first half of provider abstraction. The second half is knowing what those models can actually do at runtime. A modern coding agent cannot safely assume that every endpoint supports the same sampling controls, the same context length, the same multimodal inputs, or the same reasoning features. That is why capability detection is not a cosmetic feature. It is a negotiation layer between the agent runtime and the model backend.

Why runtime capability negotiation exists

In theory, the agent could send the same parameters to every provider and let the remote API reject what it does not understand. In practice, that would produce brittle behavior, wasted requests, confusing user experience, and unnecessary token cost. A strong runtime instead inspects or encodes model capabilities in advance and shapes requests accordingly.

The mismatches are real. Some providers support temperature but interpret its range differently. Some support top_p; others ignore it. Some expose max_tokens; others split this into input and output limits or cap it differently by model family. Some models can accept image or PDF inputs; some are text-only. Some expose explicit reasoning or thinking modes, while others only simulate step-by-step behavior through prompting.

This is a classic systems problem: a supposedly common interface sits on top of a heterogeneous substrate. The abstraction survives only if the runtime actively manages variance.

OpenCode: transform-driven capability normalization

OpenCode handles much of this in packages/opencode/src/provider/transform.ts. That file is a good example of capability detection through normalization rules rather than through one giant capability registry. The transform layer maps npm package names to provider option keys, rewrites messages to fit provider constraints, strips empty content that Anthropic rejects, sanitizes tool-call IDs for Claude and Mistral, inserts assistant bridge messages for Mistral sequencing quirks, rewrites reasoning payloads into provider-specific fields, and downgrades unsupported image or file inputs into explicit error text for the model to relay.

This last point is particularly important. If a model lacks image support, OpenCode does not simply crash. It transforms the unsupported input into a textual explanation such as “this model does not support image input.” That is an example of capability-aware degradation. The runtime preserves the conversation loop even when the requested modality is unavailable.

OpenCode’s design also reflects another reality: capability detection is sometimes indirect. You do not always have a clean machine-readable capability manifest from the provider. Instead, you infer capability from provider identity, model metadata, naming conventions, or known incompatibility rules. That is why ProviderTransform contains so much conditional logic. Runtime negotiation in the real world is often heuristic, not purely declarative.

Claude Code: explicit capability tracking for the Claude ecosystem

Claude Code is narrower in provider scope, but stronger in explicit capability accounting. src/utils/model/modelCapabilities.ts caches model metadata retrieved from Anthropic’s model listing endpoint and stores fields such as max_input_tokens and max_tokens in a local cache file. That may sound small, but it has major architectural significance. Claude Code is not guessing context size from branding alone; it is building a local capability record that can influence runtime decisions.

This is complemented by other files in the same model subsystem. contextWindowUpgradeCheck.ts encodes logic for context upgrades such as opus[1m] or sonnet[1m], while model.ts and related files reason about aliases, defaults, provider branches, and model availability. Together, these files act like a capability-aware control plane.

The phrase context window deserves a precise definition because it is often used casually. In textbook terms, it is the maximum amount of input a model can consider in one inference request, measured in tokens rather than characters. Tokens are subword units used by the model’s tokenizer. A larger context window changes agent behavior materially: it affects whether a tool result must be truncated, whether a large diff can be inlined, whether compaction must happen early, and whether a planning model can keep long task history in memory.

Claude Code’s approach is therefore less about universal provider negotiation and more about precise operating knowledge of the Claude family and its hosted variants. That fits its Anthropic-first philosophy.

Reasoning mode and thinking support

One of the newest capability dimensions is reasoning support. Different vendors expose it differently. Anthropic uses extended or interleaved thinking modes; OpenAI reasoning families such as o1 introduced a distinct reasoning-oriented interaction style; other providers may expose hidden thinking, explicit reasoning blocks, or no special control at all.

This creates a subtle challenge. “Reasoning” is not a single API standard. It is a vendor-specific feature family. OpenCode’s transform layer reflects that by moving reasoning content into provider-specific fields when a model’s interleaved capability definition demands it. Claude Code, because it is tuned around Claude-family models, can assume more coherent semantics when enabling thinking-related behaviors. OMO, sitting on top of OpenCode, inherits the lower-level complexity and must rely on underlying model selection and provider normalization to avoid assigning thinking-heavy workloads to weak or unsupported models.

For agent design, reasoning support matters because it changes planning quality, tool choice discipline, and latency/cost tradeoffs. A deep-thinking model may produce better decomposition but at higher latency and higher token use. A quick model may be sufficient for file lookup or formatting work. Capability detection therefore feeds orchestration policy, not just request formatting.

Vision and multimodal support

Vision capability detection is another practical necessity. Coding agents increasingly inspect screenshots, diagrams, PDFs, and UI mockups. But multimodal support is uneven. Some providers support images but not PDFs. Some support image understanding but not tool-call-rich agent loops. Some support vision only on certain endpoints.

OpenCode’s transform layer handles this with explicit modality checks and fallback text when a model cannot consume image, audio, video, or PDF input. Claude Code, being more tightly coupled to its supported model family, can integrate capability assumptions more directly into model policy. In both cases, the lesson is the same: multimodal support must be tested as a real capability dimension, not assumed from marketing copy.

Parameter support variance

Even classic controls like temperature, top_p, and max_tokens are not stable across providers. A model abstraction layer must decide whether to omit unsupported parameters, translate them, clamp them, or reject them. This is one reason OpenCode’s provider transform exists at all. The runtime is effectively doing protocol mediation between one agent interface and many subtly incompatible APIs.

That mediation affects agent behavior. If the runtime cannot rely on temperature, it may need to achieve determinism through prompt structure or model selection instead. If max_tokens differs drastically, the agent may need more aggressive output truncation or chunking. Capability detection is thus operational, not merely descriptive.

Design lesson

The essential lesson is that model support must be treated as negotiated capability, not assumed identity. “Using model X” does not mean the same thing across providers, regions, hosted variants, or API generations. Good agent runtimes therefore need at least four capability dimensions:

Sampling controls: temperature, top-p, token caps.
Modalities: text, image, PDF, audio, video.
Reasoning features: thinking modes, reasoning channels, interleaving.
Context budget: maximum input and output token windows.

OpenCode shows how to survive heterogeneity with aggressive transformation. Claude Code shows the value of precise, cached capability knowledge in a narrower ecosystem. OMO reminds us that capability detection eventually shapes orchestration policy: the agent should not just know what a model can do, but choose work based on that knowledge.

Model: openai/gpt-5.4 Generated: 2026-04-01 Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 6 — LLM Provider Abstraction Token Usage: Approx. 1,600 output tokens for this section

6.3 Authentication and Key Management

If provider abstraction is the control plane for model access, authentication is the trust boundary that makes that access possible. An agent runtime may support dozens of providers, but each provider is useless until the system can securely acquire, store, refresh, and apply credentials. This is why authentication and key management are not peripheral utilities. They are part of the agent architecture itself.

The baseline problem: too many providers, too many credential styles

LLM systems now authenticate through several patterns: raw API keys, OAuth access tokens, refresh tokens, enterprise-managed credentials, device authorization flows, and cloud-provider identity chains. The runtime must also decide where credentials live: environment variables, config files, OS keychains, encrypted credential stores, or managed remote settings.

From a textbook perspective, this is a standard credential management problem, but agent runtimes amplify the stakes because they may call providers autonomously and continuously. A leaked credential is not just a one-time compromise; it can become a silent billing drain or an exfiltration channel.

OpenCode: broad auth surface for a broad provider surface

OpenCode’s auth model reflects its universal-provider ambition. packages/opencode/src/auth/index.ts defines three stored credential forms: api, oauth, and wellknown. The runtime persists these in auth.json with restrictive file permissions (0o600), which is a Unix-style mode meaning the file should be readable and writable only by its owner. That is a small implementation detail, but it shows that OpenCode treats credential storage as security-sensitive state.

The three forms correspond to three access patterns. API auth is the classic model: a provider key is stored and then attached to requests. OAuth auth stores refresh and access tokens, expiration, and optional account metadata. Well-known auth is more specialized and points to dynamic endpoint or token discovery patterns.

The phrase well-known endpoint discovery deserves explanation because it is not usually presented in undergraduate CS textbooks. In modern web identity systems, a service may publish a standard metadata document under a predictable path such as /.well-known/.... Clients can fetch it to discover authorization URLs, token endpoints, issuer information, or capability metadata rather than hardcoding every endpoint. This pattern reduces manual configuration and supports interoperable auth flows across deployments.

OpenCode extends this general auth substrate with provider-specific plugins. plugin/codex.ts implements an OAuth PKCE flow for OpenAI Codex-style access. plugin/copilot.ts implements GitHub Copilot login, including GitHub Enterprise device authorization. These files are important because they show that provider abstraction is not only about request formats; it is also about credential acquisition workflows.

PKCE, short for Proof Key for Code Exchange, is another term outside the usual core CS curriculum. It is an OAuth security extension that protects public clients such as CLIs or mobile apps. The client creates a random verifier, hashes it into a challenge, sends the challenge during authorization, and later proves possession of the original verifier when redeeming the code. This reduces the risk of intercepted authorization codes being replayed by an attacker.

In plugin/codex.ts, OpenCode spins up a localhost callback server, generates PKCE values, builds an authorization URL, exchanges the code for tokens, and refreshes access tokens when needed. In plugin/copilot.ts, it supports GitHub.com and enterprise domains through the device-code flow, polling until authorization succeeds. Both flows demonstrate a product-grade understanding of CLI authentication rather than mere API-key injection.

GitHub Copilot and Codex: auth as protocol adaptation

The Copilot integration is especially revealing. OpenCode’s Copilot plugin does not simply store a token and send it unchanged. It adapts headers, distinguishes agent versus user initiation, handles vision flags, and supports enterprise base URLs. In effect, auth logic becomes part of protocol shaping.

That is a recurring theme in agent systems. Authentication is often intertwined with routing, endpoint selection, tenant identification, and capability enablement. It is not a separate layer you can fully isolate from provider behavior.

Claude Code: narrower provider scope, stronger enterprise control

Claude Code’s authentication surface is narrower but more enterprise-oriented. At its simplest, the system uses an Anthropic API key. But its model provider logic also supports Bedrock, Vertex, and Foundry branches, and its commercial design includes remote managed settings and enterprise control patterns rather than only local end-user secrets.

This distinction matters. OpenCode is primarily a bring-your-own-provider platform, so local credential flexibility is essential. Claude Code, by contrast, often operates inside organizational purchasing, governance, and policy frameworks. In that world, the crucial features are not just “store a key safely,” but also “enforce which provider is allowed,” “apply centrally managed settings,” and “let enterprise administrators shape model access without editing local code.”

You can think of this as a difference between developer credential management and organizational credential management. The former optimizes for flexibility. The latter optimizes for consistency, auditability, and policy.

OMO: inherited provider auth plus MCP OAuth expansion

OMO inherits OpenCode’s provider authentication stack because it runs atop OpenCode. That means API keys, OAuth provider tokens, and the existing auth store remain the foundation for direct model access. But OMO extends the credential story through its MCP ecosystem.

The relevant files in src/features/mcp-oauth/ and src/features/skill-mcp-manager/oauth-handler.ts show that OMO adds OAuth handling for MCP-backed tools and skills. This is strategically significant. Once an agent platform expands beyond model providers into MCP servers, credentials are no longer just for LLM APIs. They are also for search providers, documentation services, spreadsheets, design systems, ticketing tools, and internal enterprise APIs.

In other words, OMO broadens key management from “how do we authenticate to models?” into “how do we authenticate the whole agent tool universe?” That is a natural consequence of becoming a multi-agent orchestration layer with rich external integrations.

Enterprise patterns: remote config, managed keys, policy enforcement

Across the three systems, several enterprise patterns emerge.

First is remote configuration. Instead of expecting every user to hand-edit environment variables, the platform can distribute managed settings from a central source. Second is managed API keys, where the user may never directly handle raw provider credentials because the organization injects them or brokers access through approved services. Third is policy enforcement, where the system restricts providers, models, or auth methods according to compliance or billing rules.

These patterns are likely to become more important than local secret storage alone. The reason is simple: coding agents are moving from hobbyist tools into regulated enterprise infrastructure.

Design lesson

The design lesson is that authentication for coding agents must be treated as a first-class subsystem with at least four responsibilities:

Credential acquisition: API key entry, browser OAuth, device flow, cloud identity.
Credential storage: secure local files, OS stores, or managed remote config.
Credential refresh and rotation: expiration handling, token renewal, revocation.
Policy enforcement: enterprise restrictions, provider allowlists, tenant routing.

OpenCode is strongest in auth diversity because it must support a wide provider ecosystem. Claude Code is strongest in enterprise alignment because its narrower model surface allows tighter governance. OMO shows the next frontier: unifying model credentials and MCP credentials under one orchestration-aware security model.

The long-term implication is clear. The more capable an agent becomes, the more credentials it touches. That means authentication is no longer a setup step. It is part of the runtime’s core safety architecture.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 7 — MCP: The USB-C of AI
Token Usage: ~11,000 input + ~1,900 output (estimated)

7.1 Why MCP Changes Everything

When Anthropic introduced MCP, the Model Context Protocol, in November 2024 under the MIT license, the announcement sounded modest: here is an open standard for connecting AI systems to tools, data, and prompts. In practice, it marked a deep architectural shift. MCP matters for agents in the same way USB-C mattered for hardware ecosystems. Before USB-C, every device family invented its own cable, power profile, and negotiation quirks. Before MCP, every AI framework invented its own tool format, transport assumptions, and integration layer. The result in both worlds was the same: fragmentation, duplicated work, and poor composability.

The “USB-C for AI” analogy is more than marketing. USB-C is valuable not because it is a cable, but because it standardizes the interface between many independent producers of capability. A laptop vendor, a monitor vendor, and a charger vendor can all move faster because they share the same connector and negotiation model. MCP does the same for software capabilities. It standardizes how an AI host discovers what an external server can do, how it invokes those capabilities, how it reads structured resources, and how it receives reusable prompt templates. Once that interface is stable, ecosystems can grow without every integration being custom.

At a high level, the architecture is simple:

Host → MCP Client ←→ JSON-RPC 2.0 ←→ MCP Server

graph TB
    subgraph "Host Application"
        H["Agent Runtime<br/>(OpenCode / Claude Code)"]
        MC1["MCP Client 1"]
        MC2["MCP Client 2"]
        MC3["MCP Client 3"]
    end
    
    H --> MC1
    H --> MC2
    H --> MC3
    
    MC1 -->|"stdio"| S1["🗄️ PostgreSQL<br/>MCP Server"]
    MC2 -->|"HTTP/SSE"| S2["🔍 GitHub<br/>MCP Server"]
    MC3 -->|"stdio"| S3["📁 Filesystem<br/>MCP Server"]
    
    S1 -.->|"Tools + Resources"| MC1
    S2 -.->|"Tools + Resources"| MC2
    S3 -.->|"Tools + Resources"| MC3
    
    style H fill:#4a9eff,color:#fff
    style S1 fill:#51cf66,color:#fff
    style S2 fill:#51cf66,color:#fff
    style S3 fill:#51cf66,color:#fff

The host is the user-facing application: Claude Desktop, Cursor, VS Code, Zed, ChatGPT, or an agent runtime such as OpenCode. Inside the host sits an MCP client, which knows how to speak the protocol. On the other side is an MCP server, which exposes three primary capability classes:

Tools: executable actions, such as search, file access, browser automation, or API calls
Resources: readable data objects, such as documents, memory entries, schemas, or generated artifacts
Prompts: reusable prompt templates supplied by the server

This split is important. Many early agent systems flattened everything into “tools.” MCP is more expressive. Some things are actions, some are data, and some are structured instructions. That distinction reduces friction. A documentation server, for example, may expose search as a tool, individual pages as resources, and task-specific prompt templates for summarization or migration guidance.

The wire protocol underneath MCP is usually JSON-RPC 2.0. JSON-RPC is a standardized remote procedure call protocol where messages are encoded as JSON objects. Remote procedure call, often abbreviated RPC, is not always emphasized in mainstream CS curricula, but the idea is simple: call a function that lives somewhere else as if it were local, by packaging the method name, arguments, and response in a structured message format. JSON-RPC gives MCP a widely understood request-response envelope: method name, params, id, result, or error. That choice matters because it makes the protocol predictable, debuggable, and transport-agnostic.

Transport-agnostic is the next major idea. MCP does not force a single deployment model. It supports multiple transport styles because agent environments differ:

stdio for local servers: the host spawns a subprocess and communicates over standard input/output
HTTP + SSE for remote servers: the client sends requests over HTTP and receives server-sent events for streaming updates
WebSocket for web and long-lived bidirectional environments

This flexibility is a big part of why MCP took off. A local SQLite assistant, a remote documentation server, and a browser-based collaborative IDE can all participate in the same capability ecosystem without pretending they have the same runtime constraints. The protocol stays stable while deployment changes.

The most famous structural benefit of MCP is the N+M advantage. Without a shared protocol, if you have N agent frameworks and M tool providers, you need N × M integrations. Every host must write a custom adapter for every tool family. That scales terribly. With MCP, a host implements MCP once and a tool provider implements MCP once. The total work approaches N + M instead of N × M. This is the key economic argument for standards. Standards are not just elegant; they change the cost curve.

That cost curve explains adoption. By 2026, MCP support had spread far beyond Anthropic’s own products. Claude Desktop gave the protocol its first major home, but adoption quickly broadened to Cursor, VS Code, Zed, and eventually ChatGPT and many third-party agent shells. At the community layer, there are now 1,000+ MCP servers, ranging from GitHub search and browser control to documentation retrieval, databases, design tools, Slack, local memory systems, and internal enterprise connectors. The exact count matters less than the pattern: MCP became the default packaging format for “capabilities that an agent can use.”

That packaging standard changes how agent architectures are designed. Before MCP, tool integration was often a private implementation detail hidden inside each agent product. A company shipping an agent had to decide not only how to reason, plan, and act, but also how to define tool schemas, how to launch local processes, how to handle auth, how to expose data sources, and how to normalize errors. This meant every agent product became part runtime, part app platform, part integration SDK. MCP unbundles that problem. The host can focus on reasoning, UX, permissions, and orchestration; the server can focus on a domain capability.

This is why MCP changes not just interoperability, but division of labor. It lets specialized teams build specialized servers without needing to own an entire agent runtime. A documentation company can ship a docs MCP server. A search company can ship a search MCP server. A design tool vendor can ship a design MCP server. Meanwhile, agent frameworks can compete on context engineering, tool selection policy, safety, and user experience. That is a healthier ecosystem because it creates modular competition instead of monolithic duplication.

There is also a subtler benefit: MCP improves conceptual hygiene. In agent design, many failures come from mixing layers. We confuse the model with the tool runtime, the host UI with the data provider, or prompt templates with executable actions. MCP draws cleaner boundaries. The client is where model-facing adaptation happens. The server is where capability exposure happens. The transport is just transport. JSON-RPC is just message framing. That separation makes systems easier to reason about, test, secure, and swap.

Of course, MCP does not solve every problem. It does not magically make tools safe, nor does it eliminate the need for strong permission models, schema validation, output truncation, or contextual grounding. A bad host can still expose dangerous tools badly. A bad server can still return enormous or misleading outputs. But standards do not need to solve everything to be transformative. They only need to stabilize the interfaces that were previously slowing everyone down.

The long-term significance of MCP is therefore architectural. It turns tool access from a product-specific feature into a shared substrate. Once that substrate exists, ecosystems compound. Open-source communities can build on it. commercial vendors can extend it. agent platforms can differentiate above it. That is exactly what happened with USB, HTTP, JSON, and OAuth: once a common interface became trustworthy enough, innovation moved up the stack.

For AI coding agents, this is the turning point. Models may still vary in quality, context windows may still expand, and orchestration patterns may still evolve. But MCP establishes something more durable than a model release cycle: a common language for capability exchange. That is why it changes everything.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 7 — MCP: The USB-C of AI
Token Usage: ~12,500 input + ~2,000 output (estimated)

7.2 MCP Implementation Across Three Systems

If Chapter 7.1 explained why MCP matters in theory, this section asks a different question: what does it look like when three real agent systems implement it? OpenCode, Claude Code, and Oh-My-OpenCode (OMO) all treat MCP as strategically important, but they express that importance differently. Reading the code makes the contrast clear. OpenCode aims for a broad, reusable client substrate. Claude Code builds a much deeper, product-grade MCP platform. OMO takes a more creative route by combining built-in remote MCPs, Claude Code compatibility loading, and skill-embedded MCP servers that can travel with reusable skills.

OpenCode: the broad open-source MCP substrate

OpenCode’s MCP implementation lives primarily in packages/opencode/src/mcp/, with index.ts as the center of gravity and auth.ts, oauth-provider.ts, and oauth-callback.ts filling in authentication support. The design is notable for how much standard MCP surface it covers in a relatively compact package.

First, OpenCode is a full MCP client, not a token gesture. It supports multiple transports from the official SDK, including:

StdioClientTransport for local subprocess servers
SSEClientTransport for remote SSE servers
StreamableHTTPClientTransport for streamable HTTP servers

That already gives it strong deployment breadth. A local filesystem MCP, a remote SaaS search MCP, and a streaming web service can all be consumed through the same client layer.

Second, OpenCode actively converts MCP capabilities into its internal tool world. In mcp/index.ts, convertMcpTool() turns an MCP tool definition into an AI SDK dynamic tool. This is an important bridging step. MCP servers speak in JSON Schema terms; OpenCode’s execution pipeline expects tool objects that its runtime and models can consume. So the system normalizes schemas, forces type: "object", disables unexpected extra properties, and wraps execution in a call to client.callTool(...). This is a classic adapter pattern: MCP becomes a native citizen inside the host runtime rather than an alien subsystem.

Third, OpenCode manages more than tools. It also fetches prompts and resources through listPrompts() and listResources(), caches metadata, and publishes events when tool lists change. That matters because a serious MCP client cannot stop at “tool calls work.” MCP is a richer protocol, and OpenCode respects that richness.

Fourth, OpenCode includes meaningful OAuth support. The code in auth.ts stores access tokens, refresh tokens, client information, code verifier state, and OAuth state in mcp-auth.json. oauth-provider.ts implements an MCP OAuth client provider, while oauth-callback.ts runs the callback server and validates state to reduce CSRF risk. This is not enterprise-scale auth machinery, but it is enough to make real remote MCP usage practical instead of theoretical.

The result is that OpenCode has excellent breadth. It covers most of the protocol surface an open-source host needs: tools, resources, prompts, local and remote transports, OAuth-aware remote flows, dynamic conversion into the host’s tool runtime, and lifecycle cleanup. It feels like infrastructure meant for others to build upon.

Claude Code: the deep, productized MCP stack

Claude Code’s implementation, centered on src/services/mcp/client.ts, is on another scale entirely. The file is 3,348 lines long, which is a signal in itself. This is not just “MCP support.” It is a full product subsystem with auth, policy, UI integration, transport variations, Claude.ai integration, internal server support, and advanced interaction models.

The first striking difference is transport depth. Claude Code supports stdio, sse, http, ws, and sdk server types in its schemas, plus special internal types such as claudeai-proxy. It also has an explicit InProcessTransport for linked in-process communication and SdkControlTransport for SDK MCP servers that run inside another process boundary but are controlled by Claude Code. That means Claude Code is not just consuming the public MCP ecosystem; it is using MCP as an internal systems boundary as well.

Second, Claude Code goes far beyond basic OAuth. Its MCP auth stack handles OAuth discovery, token refresh, token revocation, cached auth state, callback ports, claude.ai auth interactions, and even XAA-style configurations. The auth.ts implementation is large because remote MCP in a production product is messy: consent flows fail, tokens expire, metadata varies, and users need recovery paths. Claude Code embraces that complexity rather than hiding from it.

Third, Claude Code integrates MCP into the product’s broader UX and governance model. There is explicit support for elicitation, where an MCP server can request structured user input or URL-based confirmation and the client can queue, render, and complete that interaction through the app state. elicitationHandler.ts is a good example: the client registers request handlers, waits for user response, emits analytics, and processes completion notifications. In other words, Claude Code does not treat MCP as just tool execution. It treats MCP as interactive product functionality.

Fourth, Claude Code adds channel permissions and policy-aware gating for certain MCP servers. The channelNotification.ts and related files show that some MCP servers are treated as communication channels with specific authentication and allowlist rules. This is an example of commercial hardening: once MCP reaches real users, you need differentiated trust policies, not just generic connectivity.

Fifth, Claude Code includes an official registry layer through officialRegistry.ts, which fetches and caches official MCP URLs from Anthropic’s registry endpoint. This is a subtle but important move. It indicates ecosystem curation, not just raw protocol support. In open ecosystems, discoverability and trust become product problems.

Finally, Claude Code ties MCP to Claude.ai integration and internal SDK servers. It can fetch eligible Claude.ai MCP configs, proxy them as MCP servers, and bridge SDK-side servers through SdkControlClientTransport. That is a very deep use of MCP: one protocol used for external integrations, internal components, product connectors, and ecosystem distribution.

If OpenCode emphasizes breadth, Claude Code emphasizes depth. It is what MCP looks like when a protocol becomes part of a commercial operating environment.

OMO: creative recombination on top of OpenCode

Oh-My-OpenCode approaches MCP differently. It does not try to out-build Claude Code’s full MCP client stack. Instead, it combines several clever ideas.

First, OMO ships three built-in remote MCPs in src/mcp/:

websearch via Exa or Tavily
context7 for documentation lookup
grep_app for GitHub code search

These are small configuration wrappers, but they are strategically chosen. They map directly to recurring agent research needs: web discovery, official docs, and real-world code examples. OMO is not trying to expose every possible MCP server by default. It picks the three that most improve autonomous agent performance.

Second, OMO includes a Claude Code MCP loader in src/features/claude-code-mcp-loader/. It reads .mcp.json configurations from multiple scopes, transforms them, and imports them into OMO/OpenCode format. This is a migration bridge. Instead of telling users to rebuild their MCP setups from scratch, OMO says: bring your Claude Code MCP ecosystem with you.

Third, OMO introduces skill-embedded MCPs. This is arguably the most original idea of the three systems. A skill can ship with its own MCP configuration, and SkillMcpManager manages connections for those servers. The manager supports both stdio and HTTP-style connections, performs env expansion, handles retry/reconnect behavior, manages OAuth providers, and maintains connection pooling/caching keyed by session, skill, and server name. Idle clients are cleaned up automatically. This turns MCP from a global app configuration into a modular capability bundled with reusable expertise.

That design has major implications. In most systems, skills are prompt-level artifacts and MCPs are host-level infrastructure. OMO partially fuses them. A skill is no longer just “instructions for how to think”; it can also carry “private external capability for how to act.” That is an unusually powerful extensibility pattern.

The skill_mcp tool in src/tools/skill-mcp/tools.ts makes this usable. It lets the agent invoke a tool, resource, or prompt from an MCP server declared inside a loaded skill. That means the skill package can ship both behavioral guidance and its own external integration surface. It is an elegant answer to a real problem: specialized expertise often needs specialized tools.

Comparative judgment

We can summarize the three implementations with a simple triad:

OpenCode = breadth
Claude Code = depth
OMO = creativity

OpenCode’s strength is being an open, capable, reusable MCP foundation. Claude Code’s strength is turning MCP into a fully governed product subsystem with auth, registry, permissions, interactive elicitation, in-process bridges, and cloud integration. OMO’s strength is recombination: built-in research MCPs, Claude Code compatibility loading, and skill-embedded MCPs with managed lifecycles.

This comparison also reveals an important lesson for agent architecture. MCP support is not one thing. It has at least three layers:

Protocol coverage: transports, tool calls, resources, prompts, auth
Product integration: permissions, UI, state, recovery, governance
Extensibility design: how MCP fits plugins, skills, and user migration paths

OpenCode is strongest at layer 1. Claude Code dominates layer 2. OMO is the most inventive at layer 3.

And that is exactly why MCP is such a powerful lens for comparing agent systems. It reveals not just whether a system supports a protocol, but what kind of platform the system is trying to become.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 7 — MCP: The USB-C of AI
Token Usage: ~10,000 input + ~1,800 output (estimated)

7.3 MCP vs A2A

Once MCP became widely discussed, another protocol question immediately followed: if MCP is standardizing tool access, what standard will govern communication between agents themselves? That is where A2A, Google’s Agent-to-Agent Protocol, enters the picture. The easiest mistake is to treat MCP and A2A as rivals. They are not. They solve different coordination problems at different layers.

The shortest distinction is this:

MCP = agent ↔ tools/data
A2A = agent ↔ agent

MCP is primarily about vertical integration. An agent sits at the top and reaches downward into capabilities: tools, resources, prompts, files, APIs, search systems, databases, memory stores, documentation servers, or browsers. A2A is about horizontal communication. One agent delegates to another agent, negotiates work, tracks progress, and retrieves results.

That difference sounds simple, but it has major design consequences.

What MCP optimizes for

MCP assumes that the main intelligence sits in the client host. The server exposes capabilities, but it does not need to be a planner. It just declares what it can do and responds to requests. The host decides when to call a tool, which resource to read, which prompt to fetch, and how to combine those results into ongoing reasoning.

So MCP is usually shaped like request-response capability access. Even when it streams progress or supports richer interaction, the core relationship is still: the client is orchestrating and the server is exposing capabilities.

MCP also tends to be session-scoped. A client connects to a server, lists tools/resources/prompts, and uses them inside an ongoing session. State may exist, but it is usually subordinate to the host’s session lifecycle. The server is part of the environment, not a peer principal with its own explicit task queue and work contract.

This is why MCP feels natural for things like:

search servers
docs servers
filesystem bridges
browser automation
design APIs
local database tools

These are capabilities an agent uses. They are not independent workers with their own agenda.

What A2A optimizes for

A2A starts from a different assumption: the thing on the other side is not just a capability provider, but another agentic actor. That means the other side may have its own model, memory, planning loop, permissions, and execution environment. Communication therefore cannot be reduced to “call method X with JSON params.”

Instead, A2A emphasizes task delegation. One agent submits work to another. The receiving agent may accept, reject, defer, clarify, start working, stream status, and eventually complete the task. The protocol therefore needs an explicit task lifecycle, often represented as states such as:

submitted → working → completed

In practical systems there may be richer variants such as failed, canceled, blocked, or needs-input, but the core idea is that work has durable lifecycle semantics.

A2A also introduces the idea of Agent Cards. An Agent Card is a machine-readable description of an agent’s identity and capabilities: what it does, what inputs it expects, what constraints it has, and how to contact it. If MCP servers are capability endpoints, Agent Cards are more like service advertisements for autonomous workers.

This makes A2A better suited for patterns like:

delegating research to a specialist agent
handing off implementation to a coding agent
asking a verification agent to validate a result
coordinating multi-step workflows across organizational boundaries
allowing agents from different vendors to collaborate without sharing the same internal runtime

In other words, A2A is built for coordination among intelligences, not just access to utilities.

Intelligence placement: the most important distinction

The most useful conceptual difference is where the intelligence primarily lives.

In MCP, the intelligence is mostly in the client. The server may be sophisticated internally, but from the protocol’s perspective it is a tool/data provider.

In A2A, the intelligence exists on both sides. The caller must decide whom to delegate to and how to interpret progress. The callee must decide how to execute the task and what intermediate states to expose.

That difference changes everything from trust to UX. In MCP, the host can usually flatten results into tool output. In A2A, the host often needs explicit task tracking, partial completion handling, cancellation, retries, and responsibility boundaries.

Why they are complementary

Once you see the vertical-versus-horizontal split, the false competition disappears. A capable future agent system will probably need both.

Imagine a software engineering workflow in 2026:

A coordinator agent receives a feature request.
It uses A2A to delegate architecture review to one agent, codebase exploration to another, and test strategy to a third.
Each of those agents, while doing its work, uses MCP to access tools and data: docs servers, GitHub search, browser tools, file tools, build systems, memory systems, design systems.
The coordinator aggregates the completed A2A task results and decides the next step.

This is the natural layered future:

MCP for tool access
A2A for agent coordination

One gives agents hands. The other gives them coworkers.

Comparison table

Dimension	MCP	A2A
Primary relationship	Agent to tool/data server	Agent to agent
Integration type	Vertical	Horizontal
Main unit	Tool, resource, prompt	Task, delegate, status
Intelligence placement	Mostly in client	In both caller and callee
Interaction style	Request-response capability access	Delegation and collaboration
State model	Usually session-scoped	Explicit task lifecycle
Discovery artifact	Server capabilities	Agent Card
Best for	Search, docs, browser, DB, files, APIs	Research delegation, specialist agents, workflow routing
Failure mode	Tool error, auth error, bad schema, timeout	Task blocked, rejected, incomplete, conflicting responsibilities
Future role	Standard substrate for external capabilities	Standard substrate for multi-agent ecosystems

Design implications for coding agents

For coding agents, the practical lesson is straightforward. If you are designing a single-agent coding assistant, MCP should likely come first. A single agent becomes dramatically more useful when it can access standardized tools, resources, and prompts. That is the minimum viable extensibility layer.

But once your system evolves into orchestration, background specialists, or organization-level automation, MCP alone is insufficient. Passing every delegated subtask through tool calls is a conceptual mismatch. At that point, you need agent-native delegation semantics: ownership, progress, cancellation, and result return. That is where A2A becomes necessary.

The deeper lesson is that the industry is converging on a two-protocol architecture. One protocol connects an agent to the world of capabilities. Another connects the agent to a world of peers. That mirrors distributed systems history. We use one class of protocols for services consuming resources and another for services coordinating work. Agent ecosystems are arriving at the same separation.

So the right question is not “MCP or A2A?” The right question is: where does your current boundary lie? If the boundary is between an agent and a capability provider, MCP is the right abstraction. If the boundary is between one autonomous worker and another, A2A is the right abstraction.

The future is not a winner-take-all protocol war. The future is layered interoperability. MCP will standardize what agents can use. A2A will standardize how agents cooperate. Together, they define the control plane of the agentic software stack.

Model: openai/gpt-5.4 Generated: 2026-04-01 Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 8 — Configuration and Customization Token Usage: N/A (runtime does not expose per-file token counts)

8.1 Multi-Level Config Precedence

Configuration precedence is where an agent stops being a toy and becomes an operating system. The core question is simple: when the same option is defined in multiple places, which one wins? But behind that question lies a deeper design problem: how much flexibility can a system offer before it becomes mentally expensive to use?

OpenCode, Claude Code, and Oh-My-OpenCode (OMO) answer this question very differently.

OpenCode: seven layers, maximum flexibility

OpenCode implements the richest precedence stack of the three. In config/config.ts, the loading order is explicitly documented from low to high precedence:

remote .well-known/opencode
global ~/.config/opencode/opencode.jsonc
custom OPENCODE_CONFIG path
project opencode.jsonc
.opencode/ directory config
inline OPENCODE_CONFIG_CONTENT
managed enterprise config

This is not just a list of files. It is a policy hierarchy.

The remote .well-known/opencode layer acts like organization defaults. It allows a hosted control plane to publish baseline behavior. The global config in the user home directory then personalizes the tool for one developer across projects. OPENCODE_CONFIG adds an explicit override path, useful for scripts, experiments, or containerized environments. Project config makes the repository itself opinionated. The .opencode/ directory adds a content-oriented layer: not only config files, but agents, commands, modes, and plugins. Then OPENCODE_CONFIG_CONTENT provides an inline ephemeral override, ideal for automation. Finally, managed enterprise config wins over everything else.

In other words, OpenCode treats configuration as a merge graph rather than a single settings file. That is powerful, especially in teams that need central policy, local personalization, and per-project conventions simultaneously.

config/paths.ts reinforces this design by traversing upward for project files and .opencode directories. This upward search is important. It means configuration is not only attached to the current directory, but to the directory tree. The closer a config is to the working directory, the more specific it becomes.

There is also a subtle implementation detail with large consequences: OpenCode uses JSONC parsing, environment substitution, file inclusion, and deep merging. JSONC means JSON with comments and trailing commas. That sounds minor, but in practice it converts a machine-only format into a human-maintainable one.

Claude Code: fewer layers, lower cognitive load

Claude Code is much simpler. At a high level, its configuration model is:

global ~/.claude/settings.json
project .claude/settings.json
managed settings

The surrounding codebase contains more nuance, including managed file directories and remote managed settings, but the mental model exposed to the user is intentionally smaller than OpenCode’s. The settings system centers on one global file and one project file, with enterprise policy layered on top.

This simplicity is a product decision. Claude Code is optimized for reliability and predictability in a commercial environment. Fewer layers mean fewer surprise interactions. A developer usually knows where to look: either their global settings, the repository’s .claude/settings.json, or the organization policy.

The tradeoff is reduced composability. OpenCode supports use cases like inline config injection and alternate config roots much more naturally. Claude Code instead prefers a bounded configuration surface. That makes support, documentation, and enterprise governance easier, but it gives advanced users fewer knobs.

OMO: three-level plugin config, deliberately narrow

OMO adds its own configuration system on top of OpenCode, but it keeps that system intentionally small. In src/plugin-config.ts, the load order is:

defaults
user ~/.config/opencode/oh-my-opencode.jsonc
project .opencode/oh-my-opencode.jsonc

This is just three levels. Compared with OpenCode’s seven, it is refreshingly constrained.

Why can OMO afford to do that? Because it is not replacing OpenCode’s base config. It is configuring the OMO plugin layer itself. OpenCode still governs host-level behavior such as global agent configuration, project discovery, remote config, and managed enterprise overrides. OMO only needs enough structure to express plugin-specific concerns: agents, categories, hooks, commands, skills, Claude Code compatibility options, experimental behavior, and so on.

The schema footprint is large even though the precedence chain is small. The directory src/config/schema/ contains 22 Zod v4 schema files. Zod is a TypeScript-first validation library; a schema here means a machine-checked description of valid config structure. This separation is important: OMO keeps the number of precedence layers low while still allowing a rich internal shape.

OMO also supports JSONC. That matters even more in OMO than in many systems because OMO config is often read and edited by humans experimenting with agent orchestration. Comments and trailing commas make exploratory tuning safer.

Another notable design choice is partial fallback. If a config file contains an invalid section, OMO can still load the valid sections and skip the broken ones. This is a developer-experience choice. It favors graceful degradation over all-or-nothing rejection.

How OMO’s three layers coexist with OpenCode’s seven

The cleanest way to think about OMO is as a nested configuration subsystem.

OpenCode first resolves its own precedence stack and loads the host environment. Inside that host, OMO then loads oh-my-opencode.jsonc using its own narrower precedence chain. So the systems do not compete for the same namespace; they operate at different architectural levels.

That separation is one of OMO’s most elegant design decisions. If OMO had tried to mirror all seven OpenCode layers, the resulting mental model would be chaotic: users would have to reason about two overlapping precedence ladders. Instead, OMO says: host-level concerns belong to OpenCode; orchestration-level concerns belong to OMO.

This is a useful general lesson in extensible agent design. Plugins should not duplicate the host’s configuration hierarchy unless they absolutely must. Otherwise, every extension becomes a second operating system inside the first.

Flexibility versus confusion

This chapter’s real theme is not “which system is best,” but “what kind of confusion each system is willing to tolerate.”

OpenCode maximizes flexibility. It is excellent for power users, platform engineers, and plugin authors. But seven precedence levels are hard to hold in working memory. When something behaves unexpectedly, debugging config interactions can become a genuine systems problem.

Claude Code minimizes that problem by shrinking the stack. It gives up some flexibility in exchange for a configuration model that is easier to explain, teach, and support.

OMO chooses a hybrid path. It inherits a complex host, but its own layer stays intentionally compact. That is arguably the most scalable pattern: let the platform be expressive, while each extension remains disciplined.

The broader design principle is clear. More layers increase adaptability, but they also increase interpretive burden. In human-computer interaction terms, configuration precedence is a form of hidden control flow. The more hidden branches exist, the more users must simulate the system in their heads.

The best agent architectures therefore do not merely add precedence levels because they can. They add them only when each level corresponds to a distinct social scope: organization, user, project, runtime, or policy. When the levels map cleanly to real-world ownership boundaries, users can understand them. When they do not, flexibility turns into fog.

Model: openai/gpt-5.4 Generated: 2026-04-01 Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 8 — Configuration and Customization Token Usage: N/A (runtime does not expose per-file token counts)

8.2 Project Memory System

Configuration tells an agent how to behave before a session starts. Memory tells it what should persist after the session ends. In modern coding agents, memory is no longer just “chat history.” It is a structured persistence layer for conventions, context, and learned collaboration patterns.

All three systems in this book share one foundational convention: markdown files such as CLAUDE.md, AGENTS.md, and README.md serve as a human-readable knowledge base. This convention matters because it bridges two audiences at once: humans can edit the files directly, and agents can ingest them as instruction context.

Shared convention: project markdown as durable context

The common pattern across OpenCode, Claude Code, and OMO is that project knowledge is stored in plain text files checked into, or adjacent to, the repository. README.md explains the project to humans first, but it is also useful agent context. CLAUDE.md and AGENTS.md are more agent-oriented: they encode rules, conventions, workflows, and local expectations.

This is significant because it represents a move away from opaque memory. Instead of hiding all persistence inside a database, these systems externalize key parts of memory into artifacts that developers can inspect, diff, review, and version.

Claude Code: a typed memory model

Claude Code goes furthest in formalizing memory as a taxonomy. In src/memdir/memoryTypes.ts, it defines four memory types:

user
feedback
project
reference

This is a powerful design choice because it narrows what “memory” is allowed to mean.

user memory stores facts about the user’s role, preferences, and expertise. feedback memory stores guidance about how Claude should work with the user or the team. project memory captures ongoing work, deadlines, goals, and incidents that are not directly derivable from the codebase. reference memory stores pointers to external systems such as dashboards, ticket trackers, or documentation.

Just as important is what Claude Code explicitly forbids saving. The memory rules state that code patterns, architecture, file paths, git history, and information already present in CLAUDE.md should not be stored as memory. That distinction is crucial. Memory is reserved for non-derivable context, not for facts the agent can simply re-read.

Claude Code also supports scope. There is private memory and team memory, reflected in directories such as .claude/memory/ and .claude/memory-team/ at the conceptual level, with team-aware path logic implemented in files like teamMemPaths.ts. This turns memory into a social system as much as a technical one: some knowledge is personal, some should be shared.

Even more mature is Claude Code’s staleness logic. memoryAge.ts computes human-readable memory age like “today,” “yesterday,” or “47 days ago,” and emits freshness warnings for older memories. This is a subtle but deeply important safeguard. Persistent memory can become false over time. Claude Code acknowledges that memory is a snapshot, not a source of truth.

OpenCode: instruction memory rather than typed memory

OpenCode approaches persistence from another angle. In session/instruction.ts, it discovers AGENTS.md, CLAUDE.md, and legacy CONTEXT.md, both globally and by traversing upward through project directories. It also supports config-driven instruction paths and URL-based instruction sources.

This is less a “memory database” and more an instruction-loading framework. The persistent layer is document-centric. OpenCode assumes that long-lived knowledge is best represented as instruction files, not as many small typed records.

That does not make it weaker; it makes it different. OpenCode is optimized for explicit, inspectable context injection. The system walks the directory tree, discovers instruction files, and loads them into the system prompt pipeline. It also resolves directory-level instruction inheritance as the agent reads files deeper in the project.

OpenCode therefore treats memory mainly as curated instruction context. Claude Code treats memory as both instruction context and an evolving semantic store.

OMO: automatic context injection plus working memory

OMO extends the memory idea in a highly operational way.

First, its context-injector feature automatically injects important project documents, especially AGENTS.md and README.md, into the active prompt flow. This means the agent does not have to remember to read them manually every time. OMO turns project context into an always-on substrate.

Second, the rules-injector hook adds .sisyphus/rules/*.md content when relevant files are touched. This is a form of targeted memory activation. Instead of loading every possible rule up front, OMO can inject rule fragments in response to actual work. In computer systems language, this is closer to demand paging than static initialization.

Third, OMO emphasizes directory-level instruction locality. The codebase and related docs explicitly support hierarchical AGENTS.md placement. The deeper the agent goes into the tree, the more local instruction files can become active. This is important because large repositories often have subdomains with different norms.

Fourth, OMO introduces the Atlas notepad system under .sisyphus/notepads/. These notepads store learnings, issues, decisions, and unresolved problems. This is not exactly the same as Claude Code’s typed long-term memory. It is closer to structured working memory across a larger plan or orchestration run. In classical CS terms, it resembles an external scratchpad: durable enough to survive turns, but still tightly tied to a specific effort.

The distinction matters. Long-term memory stores durable truths about the user or project. A notepad stores intermediate operational knowledge accumulated during coordinated work.

From prompt engineering to context engineering

The deeper shift across all three systems is conceptual. Early AI tooling focused on prompt engineering: write a clever instruction once and hope it generalizes. These systems instead practice context engineering.

Context engineering means designing what information is available, when it becomes available, how it is scoped, and how it ages.

Claude Code expresses this through typed memories, freshness checks, and scope-aware storage. OpenCode expresses it through file discovery, frontmatter-aware instruction parsing, and prompt assembly rules. OMO expresses it through automatic injection, event-triggered rule loading, directory-level knowledge, and orchestration notepads.

This is a major architectural evolution. A prompt is a string. A context system is an information architecture.

Design implications

The best memory systems for coding agents have at least four properties.

First, they distinguish derivable facts from non-derivable facts. If the agent can re-read it cheaply from the repo, it usually should not be stored as memory.

Second, they acknowledge staleness. Memory without aging becomes accumulated hallucination.

Third, they support multiple scopes: personal, team, project, and task-local. These correspond to different ownership boundaries.

Fourth, they make persistence inspectable. Markdown files, notepads, and typed memory documents all share one virtue: humans can audit them.

That is why the future of coding agents is not simply “larger context windows.” Bigger windows help, but they do not solve organization. The real challenge is deciding what belongs in durable instructions, what belongs in long-term memory, what belongs in a temporary working scratchpad, and what should be recomputed from live state.

That is the true shift from prompt engineering to context engineering: not writing better words, but designing better memory boundaries.

Model: openai/gpt-5.4 Generated: 2026-04-01 Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 8 — Configuration and Customization Token Usage: N/A (runtime does not expose per-file token counts)

8.3 Instruction System

If tools are the hands of a coding agent, the instruction system is its nervous system. The system prompt is not merely an introductory paragraph sent to the model. It is a composite artifact assembled from multiple sources: base identity, behavioral policy, tool descriptions, project instructions, memory, and runtime constraints.

Across OpenCode, Claude Code, and OMO, the most important shared pattern is this: the “system prompt” is no longer a single prompt. It is a prompt assembly pipeline.

OpenCode: prompt assembly as layered composition

OpenCode makes this explicit in session/prompt.ts. During prompt creation, it combines environment instructions from session/system.ts with loaded instruction documents from session/instruction.ts. The result is not just a generic role description, but a stitched context bundle.

session/system.ts provides environment facts such as model identity, working directory, platform, date, and repository status. This matters because an agent behaves differently when it knows whether it is inside a git repository, which OS it is running on, and where it is operating.

session/instruction.ts then gathers instruction files like AGENTS.md and CLAUDE.md, both from global locations and by walking upward through the project tree. It also supports config-provided instruction paths and remote URLs. This means OpenCode’s system prompt is partly static and partly discovered.

The rest of the prompt assembly process adds agent prompt material, project context, tool schemas, and MCP-derived capabilities. The effect is a layered prompt architecture: environment, instructions, tools, and active task context each contribute a slice.

This is a very “systems” way to build prompts. Instead of authoring one giant monolithic template, OpenCode composes the final behavior from typed subsystems.

Claude Code: multi-source prompt synthesis

Claude Code follows the same broad pattern, but with a stronger emphasis on memory and policy-rich runtime context.

At a high level, Claude Code’s effective system prompt draws from:

CLAUDE.md and related rules files
memory files and memory instructions
MCP prompt/command material
tool descriptions
permission context and execution mode
optional appended system prompt fragments

The file utils/claudemd.ts shows that Claude Code loads managed memory, user memory, project memory, and local memory in a priority-aware order. It also supports @include directives, allowing one memory file to reference other files. This makes the instruction layer compositional and modular.

QueryEngine.ts shows the assembly step clearly: the final system prompt can combine a default system prompt or custom prompt, optional memory-mechanics instructions, and appended policy text. Elsewhere in the codebase, tool descriptions and MCP prompt data are prepared so the model sees not just the user request, but the action vocabulary available to it.

Claude Code also injects permission context. This is easy to underestimate. Permission mode is not just a UI concern; it shapes the agent’s behavioral boundary. An agent running in a strict approval environment should reason differently from one running in a permissive auto mode. In that sense, permission context is part of the prompt, because it changes what the agent believes it is allowed to do.

This is one reason commercial agents often feel more coherent in practice. They do not rely on a single prose identity statement. They encode runtime policy directly into the instruction substrate.

OMO: tailored prompts per agent

OMO pushes the instruction system further toward specialization. In dynamic-agent-prompt-builder.ts, prompts are assembled from modular sections such as tool selection guidance, delegation tables, category-and-skill rules, anti-pattern warnings, and hard behavioral constraints.

This is especially visible in agents like Prometheus and the Sisyphus variants. Rather than giving every agent one universal identity, OMO generates a tailored prompt for each role.

Typical sections include:

identity and role framing
interview or task-intake behavior
plan generation rules
tool and agent selection tables
delegation guides
cost awareness
anti-patterns and hard blocks

In other words, OMO treats prompt construction almost like compiling a job description. The system does not ask, “What should the model generally be?” It asks, “What exact constraints and affordances should this particular agent instance carry?”

That is a major step beyond standard prompt templates. OMO is essentially doing prompt modularization. A module is a reusable instruction component with a specific responsibility. This resembles software architecture more than classic prompt writing.

System prompt as personality, policy, and perimeter

There is a recurring anti-pattern in AI discourse: people say the system prompt is the agent’s “personality.” That is partly true, but incomplete.

The system prompt actually defines three things at once.

First, it defines personality: tone, initiative level, response style, and collaboration posture.

Second, it defines policy: what the agent should prioritize, what workflows it should follow, and how it should interpret project instructions.

Third, it defines perimeter: what the agent must not do, when it must ask for approval, when it should delegate, and what kinds of reasoning shortcuts are forbidden.

OpenCode emphasizes composition of environment plus instructions. Claude Code emphasizes policy-rich synthesis from memory, tools, MCP, and permission context. OMO emphasizes role-specific prompt specialization.

These are not mutually exclusive; they are different emphases along the same architectural spectrum.

Why modular instruction systems matter

As coding agents grow more powerful, the instruction system becomes a scalability bottleneck. A monolithic prompt is hard to maintain, hard to debug, and hard to localize. Modular assembly solves several problems.

It allows source attribution: a behavior can be traced to AGENTS.md, memory, managed policy, or a generated agent section. It allows selective override: team policy can replace local habit, or project context can override generic defaults. It also allows specialization: different agents can inherit shared foundations while diverging in carefully controlled ways.

This has a direct analogy in computer science. A monolithic prompt is like a giant global variable. A modular instruction system is like dependency injection: behavior is assembled from components with clearer ownership boundaries.

That analogy matters because prompt complexity is now large enough to deserve software engineering discipline.

The road ahead

The likely future is not one giant universal system prompt. It is a prompt graph.

A prompt graph is a network of instruction components activated by scope, role, tool availability, and task type. Some components are always present, such as identity and safety policy. Others are conditional, such as project rules, memory reminders, MCP-specific prompts, or delegation protocols.

OpenCode already hints at this with layered discovery and composition. Claude Code hints at it with memory scopes, permission-aware context, and MCP prompt conversion. OMO makes it most explicit by generating different prompt bodies for different agents.

The key lesson is that an instruction system should not be written like copy. It should be engineered like infrastructure.

The best coding agents of the next generation will therefore be distinguished less by having a more eloquent base prompt and more by having a better instruction assembly architecture: modular, inspectable, scope-aware, role-aware, and tightly coupled to the tool and memory systems around it.

That is when the system prompt stops being a paragraph and becomes what it really is: the software-defined personality boundary of the agent.

Chapter: 9 — OpenCode’s Unique Contributions Book Title: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Model: openai/gpt-5.4 Token Usage: ~2,900 tokens Generated: 2026-04-01

9.1 Multi-Interface Architecture

Among the three systems discussed in this book, OpenCode stands out for one reason that is architectural rather than cosmetic: it is not merely a command-line agent. It is a shared agent core exposed through multiple user interfaces. In practice, OpenCode is built as one execution engine with four frontends: a CLI built on yargs, a terminal user interface (TUI) rendered with Solid.js on top of terminal rendering libraries, a browser-based web app built with Solid.js, Tailwind, and Vite, and a desktop application built with Tauri and Rust. This is a meaningful design choice because it separates agent capability from presentation layer.

The CLI entry point in packages/opencode/src/index.ts makes the first part of this strategy obvious. OpenCode uses yargs to register commands such as run, serve, web, acp, and TUI-related commands. This means the shell interface is not a separate product; it is simply one control surface over the same underlying services. In many coding agents, the CLI is the product. In OpenCode, the CLI is one client.

The TUI takes the same core and turns it into a richer interactive terminal environment. packages/opencode/src/cli/cmd/tui/app.tsx shows a deep Solid.js provider tree: route providers, SDK providers, sync providers, local state providers, dialog providers, prompt history providers, theme providers, and more. That stack matters because it reveals a terminal application treated like a real reactive app rather than a thin ANSI wrapper. In computer science terms, a TUI is a terminal user interface: a full-screen interactive program inside the terminal, not just line-by-line command output. OpenCode invests heavily in this mode, making the terminal experience stateful, navigable, and synchronized.

The web interface extends the same idea into the browser. packages/app/package.json shows a Solid.js + Vite + Tailwind stack, which is a modern frontend stack for reactive rendering, fast development builds, and utility-first styling. The significance is not aesthetic. Because the web app is built as a separate package instead of being hardwired into the CLI, OpenCode can expose agent functionality through HTTP and browser-native workflows. This makes the system easier to embed into local dashboards, internal tools, or hosted control planes.

The desktop layer pushes the architecture one step further. packages/desktop/package.json and packages/desktop/src-tauri/Cargo.toml show a Tauri-based application with a Rust host and a web frontend. Tauri is a desktop framework that wraps a web UI in a native shell while delegating system-level tasks to Rust. This provides access to OS integrations such as file dialogs, deep links, notifications, window management, and local storage without abandoning the shared UI/application model. packages/desktop/src/index.tsx makes this concrete: the desktop app imports AppBaseProviders and AppInterface from @opencode-ai/app, which means the desktop surface is reusing the shared application layer rather than reinventing it.

What keeps these interfaces coherent is OpenCode’s event and service infrastructure. packages/opencode/src/bus/index.ts defines a Bus namespace with publish/subscribe behavior and wildcard subscriptions. In software architecture, an event bus is a messaging backbone inside one application: components publish events, other components subscribe, and the system stays loosely coupled. OpenCode uses this to synchronize state changes without forcing every layer to know every other layer directly. When sessions update, instances dispose, or global events occur, different interfaces can react in near real time.

The server layer in packages/opencode/src/server/server.ts is the bridge between core engine and external clients. It uses Hono as the HTTP framework, exposes OpenAPI-backed routes, supports server-sent events through streamSSE, and enables real-time terminal or process interaction via WebSocket endpoints. This matters because once the agent core is available over HTTP and streaming protocols, the UI stops being privileged. The browser app, desktop shell, CLI helpers, and external tools can all connect through a shared service layer.

This is where OpenCode differs from both Claude Code and Oh-My-OpenCode. Claude Code is highly refined, but its center of gravity remains the terminal product. Oh-My-OpenCode, meanwhile, is an orchestration layer built on top of OpenCode rather than a separate multi-interface runtime. Neither system presents the same “one core, many first-class frontends” pattern. OpenCode therefore contributes something structurally important to the coding-agent ecosystem: it suggests that an agent should be designed like a platform service with multiple native clients, not like a single shell program with optional wrappers.

The broader lesson is strategic. Multi-interface architecture is not just about user preference. It changes extension strategy, deployment strategy, and product resilience. A team can prototype in CLI, operate visually in TUI, embed in browser workflows, and ship to end users as a desktop application—all while relying on the same session model, tool system, provider abstraction, and event stream. That drastically lowers duplication.

For agent designers, the implication is clear: if the core logic is truly separable from presentation, new interaction surfaces become cheap. If it is not, every new interface becomes a rewrite. OpenCode chose the former path. That choice may be one of its most underappreciated innovations.

Chapter: 9 — OpenCode’s Unique Contributions Book Title: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Model: openai/gpt-5.4 Token Usage: ~2,800 tokens Generated: 2026-04-01

9.2 ACP: Agent Client Protocol

OpenCode’s support for ACP, or Agent Client Protocol, is one of its most strategically important features. The code under packages/opencode/src/acp/—especially agent.ts, session.ts, and types.ts—shows that OpenCode is not only a standalone agent runtime. It is also designed to be controlled by an external client through a formal protocol boundary. That distinction matters because it moves the system from “tool you run” to “agent engine other software can drive.”

ACP in OpenCode is implemented through the @agentclientprotocol/sdk package. At a high level, the pattern is JSON-RPC over stdio. JSON-RPC is a lightweight remote procedure call protocol where client and server exchange structured JSON messages describing methods, parameters, results, and errors. stdio means standard input and standard output, the simplest transport available to local processes. Put together, JSON-RPC over stdio creates a robust way for an editor or IDE to launch an agent process and talk to it without inventing a custom wire protocol.

This should immediately be contrasted with MCP, the Model Context Protocol. MCP standardizes how an agent talks to tools and external resources. ACP standardizes how a client application talks to the agent itself. The two protocols live at different layers. MCP is agent-to-capability. ACP is client-to-agent. That separation is conceptually clean and practically useful.

acp/types.ts contains the minimal session state structure OpenCode needs to maintain ACP-backed conversations: session ID, current working directory, MCP servers, optional model selection, variant, and mode ID. acp/session.ts then wraps this into an ACPSessionManager, which can create, load, retrieve, and update ACP sessions. This is important because editor integration is never just about sending prompts. It requires lifecycle control: restoring sessions, switching models, changing modes, and maintaining working-directory context.

The heavy lifting lives in acp/agent.ts. There, OpenCode defines an ACP.Agent class implementing the protocol-facing agent surface. The class owns the connection, references the OpenCode SDK, manages session state, subscribes to global events, translates permission requests, and emits updates back to the client. In other words, ACP is not a thin adapter. It is a full mediation layer between the OpenCode runtime and an external host.

Several details reveal why this matters. First, the ACP layer listens to OpenCode event streams and forwards meaningful updates. That lets a client stay synchronized with agent progress, usage, and permissions. Second, the ACP implementation contains explicit logic for permission handling. When the agent needs approval for an operation such as file editing, ACP can request that decision from the external client instead of assuming a terminal prompt. This is critical for IDE integration, where the approval UX may need to appear as a native editor dialog rather than a shell confirmation.

Third, OpenCode includes URI handling for editor-specific contexts. In agent.ts, parseUri includes support for zed:// URIs, directly signaling an intended integration path with the Zed editor. This is not hypothetical protocol design. It is protocol design informed by a concrete client.

Why is this such a big deal? Because most agent-integrated editors today rely on one of two awkward strategies. Either they embed the entire agent logic inside an editor extension, creating duplication and maintenance burden, or they communicate with a CLI in an ad hoc way, parsing logs and inventing custom commands. ACP offers a third path: treat the coding agent as a protocol-speaking local service that any editor can host.

That has several consequences. IDE vendors no longer need to write a totally custom plugin layer for every agent. Agent developers no longer need to maintain separate integrations for each editor from scratch. The integration point becomes standardized enough that clients can focus on user experience, while the agent runtime focuses on planning, tools, permissions, and execution.

This is especially notable when compared with Claude Code. Claude Code integrates well with editor-adjacent workflows, but its architecture is not centered on an openly inspectable client-agent protocol of this kind. Oh-My-OpenCode extends agent behavior dramatically, but it inherits OpenCode’s surfaces rather than redefining the external client protocol layer. ACP is therefore a distinct OpenCode contribution.

There is also a philosophical point here. The first protocol wave in agent systems focused on tool interoperability. That was necessary, but incomplete. If agents are to become infrastructure, the industry also needs standards for agent hosting, session control, permission mediation, streaming updates, and editor context attachment. ACP hints at this second wave.

For builders of future coding agents, the design lesson is straightforward: do not fuse the agent and the client into one inseparable application. Put a protocol boundary between them. Once that boundary exists, IDEs, terminals, notebooks, browsers, and automation systems can all speak to the same engine. OpenCode’s ACP support is an early but important step in that direction.

Chapter: 9 — OpenCode’s Unique Contributions Book Title: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Model: openai/gpt-5.4 Token Usage: ~2,700 tokens Generated: 2026-04-01

9.3 Namespace Organization Pattern

One of OpenCode’s least glamorous but most intellectually interesting contributions is its heavy use of TypeScript namespaces as an organizational pattern. In much of modern TypeScript, namespaces are treated as old-fashioned. Developers often default to flat exports, utility modules, or class-heavy designs. OpenCode chooses differently. Across the codebase, we repeatedly see structures such as Agent, Tool, Session, Provider, Bus, and others defined as namespaces that contain schemas, types, state initialization, helper methods, and operational logic together.

This is visible immediately in files such as agent/agent.ts, tool/tool.ts, session/index.ts, provider/provider.ts, and bus/index.ts. Each file exports a conceptual module as a named namespace rather than as a loose collection of unrelated functions. For example, the Tool namespace defines the core tool interface, context types, helper types for inference, and the define function that standardizes tool registration. The Agent namespace contains schema definitions, default built-in agent definitions, permission composition, model parsing, and related logic. The Session namespace combines session schemas, row conversion helpers, event definitions, and lifecycle operations. This is not accidental style; it is a recurring architecture.

Why does this matter? Because naming is one of the quiet failure modes of large agent systems. Coding-agent codebases accumulate many concepts with overlapping vocabulary: agent info, session info, provider config, tool metadata, event data, message parts, and so on. In flat module structures, this often produces either verbose naming (createSessionInfoFromRow, toolDefine, providerDefaultModel) or collision-prone generic names (Info, create, get, update) exported from many places. Namespaces let OpenCode keep internally natural names while maintaining external disambiguation.

Inside Session, it is perfectly reasonable to define Info, create, fromRow, toRow, Event, or GlobalInfo. Inside Tool, it is perfectly reasonable to define Context, Info, and define. Because those identifiers are scoped under Session.* or Tool.*, the code remains readable without becoming globally chaotic.

There is also a state-management benefit. OpenCode frequently pairs a namespace with an Instance.state(...) pattern, meaning the namespace is not just a bag of utilities but the home of a concept’s local state and lifecycle rules. This creates what we might call a self-contained domain module. In software design terms, a domain module is a unit organized around one concept in the business logic rather than around one technical layer. OpenCode’s namespaces often bundle schema, state, events, and operations for a single domain concept in one place.

That matters in agent systems because the complexity is conceptual before it is algorithmic. The hard part is not usually clever data structures. It is maintaining clean boundaries between sessions, tools, providers, permissions, events, prompts, and user interfaces. Namespace-based organization gives OpenCode a way to express those boundaries directly in code.

Compare this with flatter module structures, including what we often see in commercial codebases such as Claude Code. Flat export systems can be elegant for small packages, and they align well with modern ES module idioms. But as the surface area grows, they can drift toward a sprawl of helper functions, wrapper files, and ever longer symbol names. OpenCode accepts a slightly more opinionated style in exchange for stronger conceptual locality.

There are trade-offs, of course. Some TypeScript developers dislike namespaces because they feel less idiomatic in the ES module era. Tooling conventions also tend to emphasize direct named exports. Yet OpenCode demonstrates that the relevant question is not fashion; it is whether a pattern reduces confusion in a large, fast-moving codebase. Here, the answer appears to be yes.

This pattern is especially effective for agent architecture because agent platforms are full of parallel abstractions. There is an Agent concept, but also agent permissions, agent prompts, agent models, and agent modes. There is a Session concept, but also session messages, session events, session summaries, session compaction, and session status. Namespaces let these systems grow inward before they grow outward.

The deeper lesson is that extensible agent runtimes need not choose between “everything is a class” and “everything is a loose function.” OpenCode shows a third option: domain-centric namespaces that act almost like internal modules with their own vocabulary and gravity. This pattern will not fit every project, but it deserves more attention than it gets.

In a field obsessed with models and benchmarks, it is easy to overlook code organization. That would be a mistake. Architecture is not only about distributed systems and protocols. It is also about whether humans can navigate the code six months later. OpenCode’s namespace organization pattern is a serious answer to that problem.

Deep Dive: Semantic Adjacency Explained

Model: openai/gpt-5.4 Token Usage: ~1,950 tokens

To understand why OpenCode’s namespace pattern matters, it helps to leave programming for a moment and think about a desk. Imagine two ways of organizing a student’s study space.

In Method A, everything is sorted by object type. All pens go into one cup. All paper goes into one tray. All notebooks go onto one shelf. At first glance, this looks neat. Pens are with pens, paper is with paper, notebooks are with notebooks. But now imagine the student wants to do math homework. Suddenly, the required items are split across three different places: the pen is in one area, the paper in another, and the math notebook somewhere else. Every small task requires physical jumping.

In Method B, everything is sorted by subject or purpose. There is one math drawer, and inside it are the math pen, math paper, math notebook, maybe even the calculator. There is an English drawer with English notes and English pens. There is a science drawer with science materials. This arrangement looks less “pure” from a classification point of view, because pens are no longer all together. But for real work, it is dramatically better. When doing math, everything needed for math is already nearby.

OpenCode is much closer to Method B.

That is the heart of the idea. Many codebases optimize for taxonomic tidiness: all types together, all schemas together, all queries together, all services together. OpenCode instead optimizes for task-oriented understanding: all the things that belong to the concept of Agent live close to each other. If you want to understand Agent, you do not go on a scavenger hunt across the entire repository. You stay near Agent.

This is particularly different from how many TypeScript projects are commonly structured. A conventional codebase often separates code by technical layer. If you want to understand the concept of an agent, you may need to visit files like these:

types/agent.ts
services/agent.ts
queries/agent.ts
schemas/agent.ts

Sometimes there are even more: constants/agent.ts, validators/agent.ts, hooks/useAgent.ts, db/agent.ts, mappers/agent.ts. None of those files is wrong by itself. The problem is cumulative. To form one complete mental model of “what is an Agent in this system?”, the reader must open four, five, or eight files and mentally stitch them together.

This style can work well for CRUD applications with stable layers, but agent systems are not simple CRUD applications. In agent systems, concepts are behavior-heavy. An Agent is not just data. It has a schema, default configuration, permission rules, parsing logic, lifecycle operations, often state transitions, and sometimes provider-specific handling. When those pieces are scattered across multiple directories, the developer pays a repeated context-switch cost.

OpenCode takes a different approach. Instead of saying “all schemas belong in the schema folder” and “all logic belongs in the service folder,” it says: the concept of Agent deserves one coherent home. That home is often a single file exporting one namespace. Inside Agent.*, the concept can keep its own natural internal vocabulary without leaking confusion into the rest of the system.

The pattern looks roughly like this:

export namespace Agent {
  export const Info = z.object({
    id: z.string(),
    name: z.string(),
    model: z.string(),
    description: z.string().optional(),
  })

  export type Info = z.infer<typeof Info>

  export async function create(input: Info) {
    // validate, normalize, persist
    return input
  }

  export async function list() {
    // fetch agent definitions
    return [] as Info[]
  }

  export async function start(id: string) {
    // start lifecycle / attach session / initialize state
  }

  export async function stop(id: string) {
    // cleanup resources / update status
  }
}

The exact details vary, but the shape is what matters. The schema is there. The type is there. The core operations are there. If there are helper converters, they can be there. If there is local state or lifecycle logic, it can be there too. When a developer sees Agent.Info, Agent.create(), Agent.list(), Agent.start(), and Agent.stop(), the meaning is obvious, and the reading path is short.

This is why the phrase semantic adjacency is useful.

Let us break it apart. Semantic means “related by meaning.” Two things are semantically related when they belong to the same idea or serve the same conceptual purpose. For example, hospital, pharmacy, and rehabilitation center are semantically related because they all belong to healthcare. Adjacent means physically close together. So semantic adjacency means: things that belong together in meaning are also placed together in space.

A city-planning analogy makes this clearer. Suppose a city places a hospital in the east district, the pharmacy in the west district, and the rehabilitation center in the south. Technically, all three facilities exist. But for patients, families, and doctors, the city has created friction. The healthcare workflow is scattered. Now imagine another city where the hospital sits next to the pharmacy, and the rehabilitation center is across the street. This is semantic adjacency. Meaning-related institutions are physically neighboring each other. The city becomes easier to use because the arrangement reflects the real task flow of human life.

OpenCode applies the same philosophy to code. Instead of semantic scatter—schema over here, logic over there, validation somewhere else—it creates semantic neighborhoods. Agent things live near Agent things. Tool things live near Tool things. Session things live near Session things.

This matters even more in agent systems than in many other software categories. Why? Because agent runtimes contain a large number of conceptually parallel modules: tools, sessions, providers, MCP integrations, permissions, prompts, buses, message parts, event streams, and so on. And each of these concepts often has the same internal quadrilateral:

a type or data structure,
a logic layer or operations,
a query/loading mechanism,
a validation/schema definition.

In a layer-first architecture, every one of those concepts gets broken into multiple locations. To understand Tool, you jump between type files, schema files, execution logic, and registry code. To understand Session, you do the same. To understand Provider, again the same. The result is not just more files. It is more interrupted thinking.

OpenCode’s namespace pattern reduces that jump from four or five files to often one primary file. That does not eliminate complexity, but it compresses it into a more navigable shape. This is a major advantage for systems where developers constantly need to answer questions like:

What exactly is a session?
Where is provider validation defined?
How does this tool get registered?
What data shape does an agent expose?
Where do permissions and defaults meet?

If the answer is “mostly in one conceptual home,” maintenance becomes easier.

This does not mean namespaces are universally superior. There is a real trade-off. Some developers see TypeScript namespaces as old-fashioned because the broader ecosystem has moved toward ES modules and direct named exports. They may associate namespaces with earlier eras of TypeScript, or worry that the pattern feels less trendy, less minimal, or less aligned with current style guides.

That criticism is not entirely irrational. Every pattern carries cultural baggage. But software architecture should not be judged mainly by whether it looks fashionable on a conference slide. The deeper question is simpler: does this pattern reduce confusion for the kind of system being built? In OpenCode’s case, the namespace pattern appears to do exactly that.

So the real lesson is not “everyone should use namespaces.” The lesson is more precise: for concept-dense systems such as coding agents, semantic adjacency may matter more than stylistic popularity. If a pattern lets readers understand a module without bouncing across half the repository, that pattern deserves serious consideration.

Put differently, OpenCode chooses operational clarity over trend conformity. It is willing to group a concept’s schema, type, validation, and behavior together if that is what helps human readers think. That decision may look small, even boring. In practice, it is one of the reasons the codebase feels more coherent than many equally powerful systems.

And that is what semantic adjacency really names: not a syntax trick, but a design principle. Things that belong together in meaning should stay together in code. OpenCode treats that principle as architecture, not housekeeping.

Chapter: 9 — OpenCode’s Unique Contributions Book Title: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Model: openai/gpt-5.4 Token Usage: ~2,850 tokens Generated: 2026-04-01

9.4 SDK & Programmability

One of OpenCode’s most forward-looking ideas is that a coding agent should be programmable as infrastructure, not only operated as an interactive tool. This idea appears clearly in two places: the JavaScript SDK under packages/sdk/js/ and the HTTP server in packages/opencode/src/server/server.ts. Together, they reveal a product vision larger than “a good terminal assistant.” OpenCode is trying to become an agent service with a developer-facing API surface.

The SDK package, @opencode-ai/sdk, is not an afterthought. Its package.json exports multiple entry points, including a v2 client and server surface. The build script in packages/sdk/js/script/build.ts is especially revealing: it runs bun dev generate inside the OpenCode package to produce openapi.json, then feeds that spec into @hey-api/openapi-ts to generate client code. In other words, the SDK is derived from the server contract rather than maintained manually.

This is a significant design choice. OpenAPI is a machine-readable format for describing HTTP APIs: routes, parameters, schemas, authentication, and responses. When a system publishes an OpenAPI specification, it becomes much easier to generate typed clients, validate requests, and keep server and client behavior aligned. OpenCode’s generated SDK implies that its API is treated as a first-class interface, not just an internal convenience for its own frontend.

The server file reinforces that conclusion. server.ts builds a Hono application, wires in route groups like /project, /session, /config, /provider, /mcp, /tui, and /permission, and exposes /doc using openAPIRouteHandler. That means the runtime is self-documenting through an API schema. It also supports streaming and real-time interaction: server-sent events for subscription to internal events, and WebSocket support for PTY-style real-time communication. This is much closer to an application platform than to a traditional CLI utility.

Why does that matter? Because programmable agents unlock very different use cases from interactive agents. A CLI user wants help in a terminal. A programmable agent can also be embedded into editor integrations, custom dashboards, CI systems, internal platforms, test harnesses, research workflows, and multi-agent orchestrators. Once the agent is accessible through a stable API, other software can compose it.

OpenCode’s architecture therefore supports a crucial inversion: instead of forcing all automation through shell scripting around a CLI, developers can talk to the agent through typed client libraries and HTTP endpoints. That improves reliability, error handling, and maintainability. Shell wrappers are convenient but brittle. SDK-backed integrations are heavier initially but scale better organizationally.

This design also complements OpenCode’s multi-interface story. The web app and desktop app are easier to build because the core engine is already service-oriented. The SDK and HTTP server are not extra features stuck on top; they are part of the same philosophy that keeps frontends thin and the runtime centralized.

Compare this with many coding-agent products whose extensibility still revolves around “run the CLI and parse its output.” That approach can work surprisingly well for small automations, but it becomes painful when the agent needs session continuity, structured event streams, permission mediation, tool invocation, or model/provider configuration. OpenCode’s API-oriented approach solves those at the right level of abstraction.

It also explains why Oh-My-OpenCode could emerge as a substantial orchestration layer. OpenCode already behaves enough like a programmable substrate that another system can build on top of it. That is only possible when the host exposes composable primitives instead of burying them behind a monolithic UI.

There is a larger industry lesson here. We are moving from “AI assistant as app” to “AI agent as platform primitive.” In the first phase, success is measured by interaction quality in a primary UI. In the second, success is also measured by how easily the system can be embedded, automated, and extended by other software. OpenCode leans strongly toward the second phase.

That does not mean every team should build an HTTP server and generated SDK on day one. But it does suggest a useful maturity path. If an agent proves valuable interactively, the next leverage point is not necessarily another command. It may be an API, a schema, a typed client, and a stable service boundary.

OpenCode’s contribution, then, is not just technical. It is conceptual. It argues that the right mental model for a coding agent is increasingly “programmable service with multiple clients,” not “smart terminal app with some extras.” As coding agents become core engineering infrastructure, that viewpoint is likely to become more important, not less.

Chapter: 9 — OpenCode’s Unique Contributions Book Title: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Model: openai/gpt-5.4 Token Usage: ~2,750 tokens Generated: 2026-04-01

9.5 Plan Mode

OpenCode’s plan mode is a deceptively simple idea with profound implications for agent safety and workflow design. Rather than treating planning as a soft instruction inside a general-purpose agent, OpenCode gives planning its own agent identity, its own permission profile, and its own transition mechanism. The relevant files include packages/opencode/src/tool/plan.ts and the built-in agent definitions in packages/opencode/src/agent/agent.ts.

The key principle is straightforward: plan first, execute later. But OpenCode turns that principle into runtime structure. In agent.ts, the built-in plan agent is defined separately from the default build agent. Its description is explicit: “Plan mode. Disallows all edit tools.” The permission rules confirm this. Under the plan agent, edit is denied broadly, with narrow exceptions only for plan files stored under approved plan paths. This means planning is not merely encouraged; file modification is structurally constrained.

This is important because many agent systems rely on prompt wording alone to preserve a planning phase. The model is told to think before acting, or to propose a plan before editing files. That works sometimes, but it remains probabilistic. OpenCode instead adds an operational boundary. The planner can inspect, search, reason, and write the plan artifact itself, but it cannot casually drift into implementation. In effect, plan mode is a read-only exploration mode with a limited write target for planning output.

In computer science terms, this is a capability restriction. A capability is the set of actions a process is permitted to perform. By changing the capability set of the agent, OpenCode changes behavior more reliably than prompt-only instruction can. This is an important design lesson for AI systems: when behavior matters, prefer enforceable constraints over advisory text.

tool/plan.ts shows the transition side of the design. The PlanExitTool asks whether the user wants to switch from the completed plan to the build agent and begin implementation. If approved, OpenCode creates a synthetic user message targeting the build agent and instructing it to execute the approved plan. This is elegant for two reasons. First, it keeps plan and execution as separate phases in the session history. Second, it provides an explicit handoff from analysis to action.

That handoff is more valuable than it may seem. One of the persistent problems in coding agents is action leakage: the system begins researching, then makes a small change, then another, and soon the distinction between exploration and implementation disappears. That can lead to premature edits, shallow solutions, or user distrust. OpenCode’s plan mode creates a formal checkpoint. The planning phase concludes, the plan is reviewed or accepted, and only then does the execution-capable agent take over.

This is especially useful for larger engineering work: refactors, migrations, debugging with uncertain root causes, architecture changes, or any task where early edits are risky. A read-only planning agent can inspect the repository, compare options, map dependencies, and propose sequencing without the temptation to “just start patching.” The result is often better strategy and fewer reversible mistakes.

Compared with Claude Code, this is a noteworthy distinction. Claude Code certainly supports planning behavior, but OpenCode makes planning an explicit runtime mode with hard permission boundaries. Oh-My-OpenCode, in turn, expands orchestration and specialized agents, yet this foundational plan/build split originates in OpenCode itself. It is part of the host’s core design vocabulary.

Plan mode also reflects a broader principle in agent architecture: not every cognitive phase should have the same action privileges. Research, planning, execution, review, and summarization are different kinds of work. If they all share the same tool permissions, the system becomes simpler to implement but harder to control. OpenCode shows that even a minimal agent platform can benefit from phase-specific agents.

There is an educational angle here as well. Human engineers often work best with the same separation. We investigate first, design second, implement third, and review fourth. Experienced developers know that collapsing those phases can save time on trivial tasks but create chaos on serious ones. OpenCode encodes that mature workflow into software.

The larger lesson for future agent systems is clear. If planning truly matters, do not merely ask for it in prose. Give it a protected mode, a separate permission set, a dedicated artifact, and an explicit exit path. OpenCode’s plan mode is not flashy, but it may be one of the cleanest examples of how to turn good engineering process into enforceable agent behavior.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 10 — Oh-My-OpenCode’s Innovations
Token Usage: ~4,200 input + ~1,050 output

10.1 Three-Layer Orchestration

Oh-My-OpenCode’s most important innovation is not a single agent, a single hook, or a single model choice. It is the decision to organize coding work as a three-layer orchestration stack. In practical terms, OMO treats software work less like “one assistant answering a request” and more like “a small engineering organization with planning, management, and execution roles.” That distinction explains much of its architecture.

graph TB
    subgraph "Layer 1: Planning"
        P["Prometheus<br/>📝 Plan Generator"] 
        Met["Metis<br/>🔍 Pre-Analyzer"]
        Mom["Momus<br/>✅ Plan Reviewer"]
        Met -->|"hidden intentions"| P
        P -->|"plan draft"| Mom
        Mom -->|"approved plan"| Atlas
    end
    
    subgraph "Layer 2: Execution"
        Atlas["Atlas<br/>🎯 Conductor"]
    end
    
    subgraph "Layer 3: Workers"
        Atlas -->|"visual task"| SJ["Sisyphus-Junior<br/>⚡ Task Executor"]
        Atlas -->|"deep problem"| Hep["Hephaestus<br/>🔨 Deep Worker"]
        Atlas -->|"need advice"| Ora["Oracle<br/>👁️ Read-Only Consultant"]
        Atlas -->|"search external"| Lib["Librarian<br/>📚 External Search"]
        Atlas -->|"search codebase"| Exp["Explore<br/>🔎 Fast Search"]
    end
    
    Atlas -.->|"📓 wisdom"| NB["Notepad System<br/>learnings / decisions / issues"]
    NB -.->|"inject to all"| SJ
    NB -.->|"inject to all"| Hep
    
    style Atlas fill:#ff6b6b,color:#fff
    style P fill:#4a9eff,color:#fff
    style Ora fill:#ffd43b,color:#000

The first layer is Planning. Here the central agents are Prometheus, Metis, and Momus. Prometheus is not merely a planner that writes markdown checklists. Its prompt system is explicitly interview-driven. It asks questions, clarifies scope, and converts vague user requests into a structured plan. This matters because many agent failures begin before implementation: the system answers the wrong question because it never modeled the real problem well enough. Prometheus is designed to reduce that failure mode by acting like a strategic consultant rather than an eager coder.

But OMO does not trust a planner alone. Before Prometheus finalizes a work plan, Metis performs a pre-analysis pass. In Greek mythology, Metis represents wisdom and cunning; in OMO, that translates into catching hidden intentions, unspoken constraints, missing acceptance criteria, implicit assumptions, and likely failure points. This is a notable design move. In textbook CS terms, it resembles an additional validation phase inserted before a plan is committed. Instead of assuming the first plan is good, OMO introduces a dedicated agent whose job is to ask, “What did the planner miss?”

Then comes Momus, the critic. Momus is not optional window dressing. In high-accuracy mode, Prometheus must loop until Momus returns an approval verdict, typically the literal “OKAY.” This mandatory retry loop is one of OMO’s most revealing design choices. Many agent systems contain review prompts, but few make the review agent a real gate. OMO does. The result is that plan quality is treated as an enforceable constraint rather than a polite suggestion. That is a substantial difference from systems where planning and critique are merged into one model call.

The second layer is Execution, and the key figure is Atlas. If the planning layer decides what should happen, Atlas decides how it should happen operationally. Atlas decomposes plans into executable units, identifies which parts can run in parallel, dispatches work to specialized subagents, collects results, verifies completion, and maintains working memory through the notepad system. In distributed-systems language, Atlas acts like a runtime scheduler plus verifier. In management language, Atlas is the project lead. It does not simply hand out tasks once. It keeps checking whether the tasks were actually done, whether the evidence is good enough, and whether accumulated knowledge should be passed downstream.

This middle layer is crucial because it solves the gap between plan generation and real-world execution. A good plan alone does not guarantee a good run. Work must be chunked correctly, delegated to the right specialists, synchronized, and verified. Atlas is therefore the control plane of OMO’s organization model.

The third layer is the Worker layer, where specialized agents perform the actual task-level labor. The user-highlighted examples show the design clearly. Sisyphus-Junior, typically aligned with Sonnet 4.5, is the focused executor: narrow, disciplined, and meant to finish a bounded task without turning into a general coordinator. Oracle, commonly resolved toward GPT-5.2 or equivalent high-reasoning models, is intentionally read-only. It is the expensive consultant for debugging, architecture, and difficult judgment calls. Librarian, often mapped to GLM-4.7, searches external references and documentation. Explore, mapped through a fast search-oriented model such as Grok Code Fast or Haiku-class fallbacks, is optimized for codebase exploration. Hephaestus, associated with GPT-5.3 Codex, is the deep worker for hard engineering tasks requiring thorough autonomous execution.

Those are not the only agents in the system. OMO’s broader roster includes Sisyphus, Prometheus, Metis, Momus, Atlas, Oracle, Librarian, Explore, Hephaestus, Multimodal-Looker, and category-driven Sisyphus-Junior variants. The exact count is less important than the principle: OMO explicitly separates orchestration roles from execution roles.

This is what makes the architecture novel. Most coding agents still behave like a single polymath model that alternates between planning, searching, coding, and summarizing in one long context window. OMO instead distributes cognition across role-specific agents. That has several benefits.

First, it reduces role conflict. A planner can stay conservative while an executor stays action-oriented. Second, it allows model specialization. Different roles can use different fallback chains and effort levels. Third, it enables parallelism. Atlas can dispatch multiple workers at once rather than forcing a single sequential thought stream. Fourth, it improves auditability. Because the planner, critic, and executor are distinct, the system can reason about failure more clearly: was the problem misunderstanding, planning, execution, or verification?

There is also a philosophical implication. OMO is not merely adding more agents because more agents sound impressive. It is implementing a theory of software work: complex tasks should pass through distinct cognitive phases—intent extraction, pre-analysis, criticism, decomposition, execution, and verification. In classical software engineering, similar phase distinctions appear in requirements analysis, design review, implementation, and QA. OMO compresses those phases into an agent runtime.

That is why the three-layer architecture deserves to be seen as a real innovation. It is not just a multi-agent feature list. It is an argument that good coding agents should resemble organizations, not monologues.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 10 — Oh-My-OpenCode’s Innovations
Token Usage: ~4,000 input + ~980 output

10.2 Semantic Category System

One of OMO’s smartest ideas is that categories should describe intent, not models. This sounds simple, but it solves a real architectural problem. In many agent systems, prompts or users select models directly: “use GPT-X for this,” “switch to model Y,” and so on. That approach works in the short term, but it couples task semantics to backend infrastructure. OMO introduces an abstraction layer that says: describe the nature of the work first, then let the runtime resolve the best model and fallback chain.

This logic is implemented in files such as src/tools/delegate-task/category-resolver.ts, src/tools/delegate-task/subagent-resolver.ts, src/tools/delegate-task/categories.ts, and src/shared/model-requirements.ts. Together, these files form a small scheduling system. The caller specifies a semantic category like visual-engineering, ultrabrain, quick, deep, or writing; OMO then resolves that category into actual model candidates, variants, and guardrails.

The concrete mappings are revealing. visual-engineering prioritizes Gemini 3 Pro and related high-visual-capability options. ultrabrain prefers GPT-5.3 Codex with a high reasoning variant. quick aims for Haiku 4.5 or similarly fast, low-cost models. deep also centers on GPT-5.3 Codex, but with different activation rules. writing leans toward K2P5 and other writing-friendly fallbacks. These are not arbitrary labels. They encode assumptions about task structure: visual work needs strong multimodal or layout-sensitive reasoning, quick work values latency and cost, deep work values high-end code reasoning, and writing values fluency and long-form composition.

The immediate benefit is operational flexibility. If a model disappears, gets rate-limited, becomes too expensive, or is replaced by a better one, the category interface does not need to change. Users and higher-level agents still say “this is deep work” or “this is quick work.” The resolver layer handles the rest. In software design terms, this is classic indirection: a stable semantic interface above an unstable implementation substrate.

But OMO’s argument goes further. The category system also tries to eliminate model self-perception bias. That phrase deserves explanation. Large models are often prompted with role claims that mention their own identity or expected strengths. If a system says “you are the best creative model” or “you are a fast search model,” some behaviors come not from actual capability routing but from narrative priming inside the prompt. OMO tries to avoid over-relying on that. By assigning tasks through category semantics and resolver code, it shifts more of the intelligence into system architecture and less into the model’s self-description.

This matters because models are unreliable narrators about themselves. Their prompt identity can influence style, confidence, verbosity, and even failure behavior. OMO’s category resolver therefore works as a structural antidote. Rather than asking the model to believe what it is, the runtime decides what it should do based on explicit category requirements and availability checks.

category-resolver.ts shows this clearly. It checks user categories, merged defaults, model availability, explicit overrides, and fallback requirements. If a category has a hard requirement—for example, a particular Codex-class model—then the resolver can reject execution or explain which model is missing. If the preferred model is not available, it walks a fallback chain. It also computes prompt append behavior and unstable-agent flags. In other words, the category is not a cosmetic label. It is a policy bundle.

subagent-resolver.ts complements this by resolving named subagents against available agent configs and their own model requirements. This lets OMO distinguish two orthogonal dimensions: named agents such as Oracle or Explore, and semantic categories such as quick or deep. The agent answers “who should do the job?” The category answers “what kind of job is this?” That separation is elegant.

Another advantage is that categories improve prompt portability. A higher-level orchestrator can write reusable task templates like:

use visual-engineering for UI implementation,
use quick for small obvious edits,
use writing for docs,
use ultrabrain for difficult reasoning.

Those templates survive model churn much better than hard-coded provider references. They also make the system easier to teach. Humans naturally classify tasks by intention, not by model SKU.

This design compares favorably with both naive auto-routing and pure manual model selection. Naive auto-routing hides too much and can feel magical or arbitrary. Pure manual selection exposes too much backend detail and causes configuration sprawl. OMO’s semantic categories sit between those extremes. They expose meaningful control while preserving backend flexibility.

There is also an economic dimension. Category routing allows the system to reserve expensive models for high-value work. If every task defaults to a frontier reasoning model, the agent becomes powerful but financially irrational. If every task defaults to a cheap model, quality collapses. Categories create a middle path: cheap where cheap is enough, expensive where expensive is justified.

Ultimately, the semantic category system is a sign of maturity. It treats model selection as infrastructure, not user interface. More importantly, it recognizes that the enduring abstraction in coding agents is not the model name but the work intent. That is a lesson many future agent systems will likely copy.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 10 — Oh-My-OpenCode’s Innovations
Token Usage: ~3,900 input + ~980 output

10.3 Ultrawork Mode

Among OMO’s most opinionated features, Ultrawork mode is probably the clearest statement of intent. It is triggered by user keywords such as ultrawork or ulw, with implementation living under src/hooks/keyword-detector/ultrawork/. Once activated, the agent is supposed to shift from ordinary conversational assistance into an aggressively autonomous execution pattern.

The best way to understand Ultrawork is not as a small prompt trick, but as a runtime philosophy. Its underlying belief is simple and radical: human intervention is a failure signal. If the human must repeatedly remind the agent to continue, inspect more carefully, verify results, or stop simplifying the task, then the system has failed at autonomy. Ultrawork tries to operationalize the opposite standard.

The injected instructions are intentionally severe. The agent is told not to begin implementation until it fully understands the request, has explored the codebase, resolved ambiguity, and formed a precise plan. It is explicitly instructed to use specialist agents such as Explore, Librarian, Oracle, and the plan agent. It is told that partial work, scope reduction, and “you can extend this later” behavior are unacceptable. In other words, Ultrawork is not mainly about speed. It is about reducing the classic assistant failure mode of stopping at eighty percent and then narrating excuses.

This is why the keyword trigger matters. By making ultrawork a lexical switch, OMO gives the user a way to opt into a more autonomous contract. That contract includes several expectations.

First, the system should explore before acting. Rather than patching the first plausible file, it should inspect architecture, conventions, edge cases, and neighboring implementations. Second, it should research best practices when external knowledge matters. Third, it should implement end to end, not just sketch. Fourth, it should verify through builds, tests, or observable evidence. Fifth, it should continue iterating when the first attempt fails.

Those steps—explore, research, implement, verify, continue—form the real Ultrawork loop.

The feature is especially important because it addresses a subtle weakness in many coding agents: they are too conversationally polite. They often optimize for sounding helpful rather than for finishing the job. Ultrawork flips that priority. In CS terms, it introduces a stronger liveness expectation. Liveness means the system should keep making forward progress toward completion. Ultrawork tries to enforce that by combining prompt pressure, planner invocation, delegation, and downstream continuation hooks.

It is also notable that Ultrawork is agent-sensitive. OMO’s keyword detector does not blindly inject the same instructions everywhere. Planner-family agents may be treated differently, and model-specific prompt variants exist, such as GPT-oriented or planner-oriented ultrawork messages. This is an important engineering detail. It shows OMO recognizes that autonomy patterns should be adapted to role, not sprayed uniformly across the system.

There is another reason Ultrawork matters: it changes the economics of context use. Instead of forcing the main agent to do all exploration inline, the mode encourages heavy use of background subagents. Explore agents can search the repository. Librarian agents can fetch documentation or examples. Oracle can review architecture. This reduces clutter in the primary reasoning stream and turns autonomy into orchestration rather than mere verbosity.

From a design standpoint, Ultrawork also makes a strong claim about user experience. Ordinary assistants ask for frequent confirmation because that is safe. OMO argues that, for many engineering tasks, this is not actually good UX. It interrupts flow, shifts burden back to the user, and encourages underpowered execution. Ultrawork therefore treats reduced interruption as a feature. The ideal is that the user provides intent once, then watches the system work.

Of course, this philosophy has trade-offs. Higher autonomy can increase token cost, runtime length, and the risk of overcommitted behavior if the task was misunderstood. OMO’s answer is not to reject autonomy, but to sandwich it between planning and verification. That is why Ultrawork pairs naturally with Prometheus, Atlas, Ralph Loop, and Todo Continuation. The keyword is only the trigger; the broader orchestration system is what makes it sustainable.

In this sense, Ultrawork is a small feature with large conceptual significance. It translates a slogan into an executable control mode. The slogan is that coding agents should behave less like chatbots and more like responsible workers. Whether or not one agrees with its intensity, OMO deserves credit for making that ambition explicit, configurable, and technically grounded.

Many future agent systems will likely include something similar, even if under different names. A mature coding agent probably needs a high-autonomy mode, a low-autonomy mode, and a way to move between them. OMO’s Ultrawork mode is one of the earliest serious attempts to formalize the high-autonomy end of that spectrum.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 10 — Oh-My-OpenCode’s Innovations
Token Usage: ~4,100 input + ~1,020 output

10.4 Ralph Loop and Todo Continuation

Autonomy is easy to promise and hard to enforce. OMO’s answer to this enforcement problem appears in two linked mechanisms: the Ralph Loop under src/hooks/ralph-loop/ and the Todo Continuation Enforcer under src/hooks/todo-continuation-enforcer/. Together they try to solve one of the most persistent agent failures: stopping too early.

The Ralph Loop is conceptually simple. If the agent did not actually finish, the system should push it back into work rather than accept a polished but incomplete response. The prompt builder in continuation-prompt-builder.ts makes this explicit: if the completion promise was not emitted, the system injects a continuation directive telling the agent to review progress, continue from where it left off, and stop only when the task is truly done. In Ultrawork mode, the continuation prompt is even prefixed with ultrawork, preserving the autonomous posture across retries.

The name “Ralph Loop” is idiosyncratic, but the deeper metaphor in OMO is the Sisyphus myth. In Greek mythology, Sisyphus is condemned to keep pushing a boulder uphill, only for it to roll back again. OMO transforms that image into a design principle: unfinished work must be pushed forward again and again until it reaches completion. The “boulder pushing” metaphor therefore means relentless continuation under interruption or incomplete output. This is not a standard CS textbook term, so it helps to translate it into system language: Ralph Loop is a forced continuation mechanism with persistent retry semantics.

The Todo Continuation Enforcer addresses the same problem from a second angle. Instead of looking for a missing completion promise, it looks at the task state. If todos remain incomplete, the session should not be allowed to conclude normally. The event handler watches session lifecycle events such as session.idle, session.error, and recovery-related transitions. When the session goes idle, the hook can inspect whether work remains and trigger further continuation logic.

This is important because todos in OMO are not treated as decorative UI elements. They are operational commitments. A system that writes a todo list and then ignores it is not really using structured task management; it is performing task theater. OMO tries to close that loophole. The continuation enforcer makes incomplete todos a runtime condition, not merely a social expectation.

The two mechanisms complement each other well.

Ralph Loop says: if the explicit completion signal is missing, continue.
Todo Continuation says: if the state still shows unfinished tasks, continue.

One is output-oriented; the other is state-oriented. Together they reduce the space in which the agent can escape.

This design also reveals a broader truth about advanced agent systems: prompt instructions alone are insufficient. A model can always ignore, forget, compress away, or prematurely summarize instructions. OMO therefore pushes these expectations into hooks and event handlers. In systems theory terms, it converts soft norms into hard control flow.

The reminder injection behavior matters too. OMO can insert system reminders when the task is incomplete. That means the agent is not merely scolded after failure; the runtime actively re-contextualizes the situation and tells it what is still missing. This resembles a watchdog or supervisor process in conventional computing, where one component monitors whether another component has reached a desired state and intervenes if not.

There is also a recovery benefit. If a session is aborted or interrupted, OMO can preserve enough continuation state to resume the same unfinished run rather than start over. This prevents a common failure mode in long tasks: the model forgets why it stopped and gives a generic wrap-up instead of re-entering the exact missing step.

The main trade-off is obvious. Aggressive continuation can create longer runtimes, more tokens, and occasional over-persistence when a task truly should stop. OMO addresses this with stop guards and explicit stop-continuation commands, but the default bias is unmistakable: better to continue than to quit too soon.

That bias is arguably justified. In real coding workflows, premature stopping is more common than dangerous over-completion. Users of coding agents complain far more often that the agent left a task half-finished than that it was slightly too persistent. OMO therefore optimizes for the dominant failure mode.

Seen together, Ralph Loop and Todo Continuation amount to a theory of agent discipline. The theory is that completion should not be inferred from tone, confidence, or narrative closure. It should be inferred from explicit signals and state checks. That is a strong engineering instinct. Human readers are easily fooled by fluent prose; a continuation runtime should not be.

This is why these hooks are more significant than they first appear. They are not just persistence hacks. They are part of OMO’s attempt to make “continue until done” an architectural property rather than a motivational slogan.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 10 — Oh-My-OpenCode’s Innovations
Token Usage: ~4,000 input + ~1,000 output

10.5 Wisdom Accumulation System

One of OMO’s deepest innovations is the wisdom accumulation system centered on .sisyphus/notepads/. At first glance, this may look like a simple note-taking convention. It is more than that. It is OMO’s answer to a fundamental multi-agent problem: how can one task’s discoveries become durable working knowledge for later subtasks without forcing every future agent to reread the entire transcript?

The mechanism appears in prompts, hooks, and Atlas reminders. The notepad protocol instructs agents to append findings into structured files such as:

learnings.md
decisions.md
issues.md
verification.md
problems.md

The exact emphasis varies across prompts and reminders, but the idea is consistent. Execution should produce not only code changes, but also distilled operational knowledge.

This is conceptually important because transcript history is not the same thing as knowledge. A transcript is raw chronological memory. It contains dead ends, repeated discussion, and irrelevant phrasing. A notepad entry is distilled memory. It captures what matters for future execution: conventions discovered, architectural decisions made, blockers encountered, tests run, and unresolved risks.

In CS terms, this is a derived memory layer. The system is compressing high-entropy conversational context into lower-entropy structured artifacts optimized for reuse. That distinction matters greatly in multi-agent settings, where every new subagent spawned from scratch faces a context budget problem.

Atlas is the key beneficiary and distributor of this knowledge. Its prompt instructs it to read notepad files before delegation and to treat them as accumulated wisdom. Verification reminders explicitly tell Atlas to inspect the notepads after a subagent run. The orchestration idea is elegant: every completed subtask should leave behind a usable residue, and every later subtask should start by consuming that residue.

This produces several advantages.

First, it reduces repeated mistakes. If one agent discovers a local convention, a naming pattern, a hidden dependency, or a blocker, the next agent does not need to rediscover it. Second, it improves consistency. Architectural decisions recorded once can shape many later edits. Third, it helps verification. If a worker ran certain checks or encountered certain issues, the orchestrator can reason more clearly about what remains uncertain.

Fourth, and perhaps most importantly, it transforms the system from isolated episodes into a cumulative workflow. The system is not merely answering a sequence of prompts. It is building a task-specific memory base.

This is fundamentally different from Claude Code’s context isolation tendency. Claude Code has strong persistence and session recovery, but its orchestration style is generally more transcript-centric and less built around explicit accumulated wisdom artifacts shared across later delegated workers. OMO takes a more aggressive stance: knowledge should be extracted after each task and passed to all subsequent subagents.

That difference reflects two philosophies. One philosophy says: each subagent should work from the local context it is given, minimizing coupling. The other says: valuable discoveries should propagate. OMO chooses the second.

The trade-off is worth noting. Shared wisdom can spread errors as well as insights. If a wrong assumption is written into decisions.md, later agents may inherit it confidently. This means notepad quality matters. The system therefore works best when Atlas and verification steps act as filters rather than blindly accepting every note as truth.

Still, the architecture is powerful. Multi-agent systems often suffer from what might be called context amnesia through delegation: every subagent starts smart in general but ignorant in specifics. OMO’s notepads are a direct countermeasure. They create local, project-specific, task-specific memory that survives beyond any single subagent’s context window.

There is a broader lesson here for agent design. Long-term memory in coding agents does not always require vector databases, embeddings, or global retrieval systems. Sometimes a simpler approach works better: structured plaintext files, clear semantics, append-only discipline, and orchestrator-level reminders to read them. Because the notes are human-readable, they are also inspectable. That improves trust and debuggability.

In effect, OMO’s wisdom accumulation system sits between raw logs and formal knowledge bases. It is lightweight, local-first, and execution-oriented. That middle ground is often underappreciated.

The deeper innovation is not the directory name or file list. It is the recognition that autonomous work should leave behind reusable judgment, not just changed files. Coding is not only about modifying source code. It is also about preserving what the system learned while modifying source code. OMO turns that insight into runtime policy.

For future coding agents, this may be one of the most transferable ideas in the whole project. Multi-agent autonomy becomes dramatically more useful when work products include accumulated wisdom, not just task outputs.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 10 — Oh-My-OpenCode’s Innovations
Token Usage: ~4,300 input + ~1,060 output

10.6 41-Hook Five-Tier System

OMO’s hook system is one of the clearest examples of how far a plugin architecture can be pushed. Internally, OMO organizes 41 hooks into five tiers:

Session — 19 hooks
Tool-Guard — 9 hooks
Transform — 4 hooks
Continuation — 7 hooks
Skill — 2 hooks

These groupings are visible in files such as create-session-hooks.ts, create-tool-guard-hooks.ts, create-transform-hooks.ts, create-continuation-hooks.ts, and create-skill-hooks.ts.

Why does this matter? Because OMO is not merely adding more tools or more prompts. It is programming behavior across the agent lifecycle. A hook is essentially an interception point: code that runs when a certain event or phase occurs. OMO uses OpenCode’s limited plugin hook surface and multiplexes many internal behaviors onto it.

The word multiplex deserves explanation because it is not always familiar outside systems or networking contexts. In CS, multiplexing means carrying multiple logical channels over a smaller number of physical channels. OMO does something analogous. OpenCode exposes only a handful of major plugin hook points, but OMO routes dozens of internal policies through them. Many logical behaviors share the same host entry point.

The Session tier contains behaviors tied to runtime state and lifecycle, such as context-window monitoring, session recovery, think mode, Ralph Loop, delegate-task retry, start-work support, notepad injection, and preemptive compaction. This tier is about shaping the overall rhythm of a session.

The Tool-Guard tier governs tool usage. This is where OMO can block bad patterns, inject rules, preserve safety constraints, or clean up tool outputs. Important examples include comment-checker, rules-injector, write-existing-file-guard, hashline-read-enhancer, and the task-todowrite disabler. Tool-guards matter because the tool boundary is where an LLM’s intentions touch the external world.

The Transform tier handles message transformation before the model sees context. Here live pieces such as the keyword detector, context injection, Claude Code hook compatibility, and the thinking-block validator. This tier is especially powerful because it can alter the model’s effective prompt without changing the core host runtime.

The Continuation tier includes the systems that keep work alive across interruptions and incomplete runs: stop-continuation guard, compaction context injector, compaction todo preserver, todo continuation enforcer, unstable-agent babysitter, background notification, and Atlas integration. This tier is a major reason OMO feels more persistent and more supervisory than ordinary agents.

The Skill tier is smaller but strategically important. It includes category-skill reminders and auto-slash-command behavior, helping skills become active participants in runtime guidance rather than static assets.

Several hooks illustrate OMO’s design maturity particularly well.

comment-checker exists to prevent generic AI-generated comments from polluting code. This is an unusual but insightful safeguard. It recognizes that low-value comments are a common output artifact of LLM coding and treats them as a quality issue worth intercepting systematically.

rules-injector helps bring repository-specific rules into the working context automatically. That makes the agent more likely to honor local conventions without the user re-explaining them every time.

think-mode manages reasoning posture and token budget. In practical terms, it helps the system dynamically decide whether to act in a shallow or deeper thinking style. This is an important pattern because agent quality is not only about which model is used, but also about how much reasoning budget is allocated at the right moment.

Another key point is configurability. Hooks can be enabled or disabled through configuration. That matters because a 41-hook system would be unbearable if it were rigid. OMO instead treats the hook inventory as a configurable policy graph. Teams can adopt the full orchestration posture or dial parts of it down.

The bigger lesson is architectural. Many agent builders focus first on model choice and only later on behavioral control. OMO shows the reverse path: once a host exposes enough lifecycle hooks, a plugin can become an entire policy engine. That policy engine can control quality, safety, continuity, prompt assembly, and UX far more precisely than prompt text alone.

This is also why OMO feels much more like an operating layer than a thin extension. The 41-hook system is effectively its nervous system. It watches, transforms, retries, guards, reminds, and preserves. Without it, OMO would still have interesting agents. With it, OMO becomes a true orchestration runtime.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 10 — Oh-My-OpenCode’s Innovations
Token Usage: ~3,900 input + ~980 output

10.7 Skill-Embedded MCPs

OMO’s skill-embedded MCP design is a significant extensibility innovation because it collapses what are often three separate things—prompt guidance, packaged capability, and external tool connectivity—into one portable unit. The relevant implementation lives around src/features/skill-mcp-manager/, supported by tools like skill_mcp and integrated with the broader plugin loader system.

To appreciate why this matters, it helps to recall what MCP is for. Model Context Protocol gives agents a standard way to talk to external tools, resources, and prompts. In many systems, MCP servers are configured globally or at the project level. Skills, meanwhile, are just prompt assets or instruction bundles. OMO bridges these worlds by allowing a skill to embed its own MCP server definitions.

That means a skill can carry not only instructions about how to do a task, but also the machine-accessible capabilities needed to do it.

The architecture is three-tiered.

First are OMO’s built-in MCPs, including remote integrations such as Context7, Exa/Websearch, or grep.app-style services. Second is the Claude Code compatibility layer, which can import MCP definitions from .mcp.json. Third is the genuinely distinctive layer: skill-embedded MCPs. This third tier is what makes OMO stand out.

The SkillMcpManager class shows the core responsibilities. It handles connection pooling, pending-connection tracking, retry and reconnect logic, step-up authentication flows, idle cleanup, and per-session client identity. It supports both stdio transport and HTTP transport. That dual support is important. Stdio means a local process communicates over standard input and output streams; HTTP means the tool is exposed as a remote network service. OMO can treat both as first-class MCP backends for skills.

This is not a trivial convenience feature. It solves a packaging problem. Suppose a skill teaches the agent how to use a certain service or workflow. In a weaker system, the user must separately install and configure the needed external tools, then hope the prompt and tooling line up correctly. In OMO, the skill can arrive with its own MCP declaration, reducing mismatch between instructions and capability.

In effect, the skill becomes an executable knowledge package.

This also improves portability. If a team shares a skill, they can share the operational surface with it. That is much closer to dependency packaging in software engineering than to traditional prompt snippet sharing. The skill is no longer just advice. It is advice plus attached machinery.

The manager’s design also shows engineering seriousness. It keys clients by session, skill name, and server name. It retries operations, handles “not connected” states through forced reconnection, and exposes listing and invocation methods for tools, resources, and prompts. That means OMO is not merely stuffing MCP metadata into skill files; it is providing lifecycle management robust enough for real use.

There is a strategic implication here as well. OMO is moving toward a model where extensibility is content-centric rather than platform-centric. Instead of saying “all capabilities must be installed into the platform globally,” it says “capabilities can travel with the skill that teaches them.” This resembles the way modern software ecosystems increasingly bundle code, configuration, and metadata together.

Compared with Claude Code and basic OpenCode setups, this is unusual. Both can certainly use MCPs, and Claude Code has its own MCP configuration story. But the explicit integration of MCPs inside reusable skill artifacts is much more distinctive in OMO.

The trade-off, of course, is complexity and security surface. If skills can bring executable capability with them, then skill loading becomes much more sensitive. Trust, review, and cleanup become essential. OMO’s architecture therefore becomes more powerful but also more supply-chain aware.

Even so, the direction is compelling. Skills are far more useful when they can do more than speak. MCPs are far easier to use when they are bundled with the context that explains them. OMO’s skill-embedded MCP architecture unifies those two facts.

For future coding agents, this may prove to be an important design pattern: treat skills not as static prompt files, but as capability capsules containing instructions, conventions, and tools in one deployable package.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 10 — Oh-My-OpenCode’s Innovations
Token Usage: ~4,000 input + ~990 output

10.8 Tmux Visual Multi-Agent

Most multi-agent systems are visually disappointing. They may truly run several workers in parallel, but from the user’s perspective everything still looks like a black box. One terminal, one transcript, one hidden scheduler. OMO’s features/tmux-subagent/ changes that by making background agents visible in separate tmux panes.

This is more important than it first sounds. A black-box agent system asks for trust without observability. OMO’s tmux feature turns orchestration into something the user can literally watch.

The architecture behind this is not a gimmick. When a background subagent session is created, OMO can invoke tmux callbacks, query the current window state, decide whether there is room to split or whether an older pane should be replaced, spawn or replace panes, wait for sessions to become ready, then poll for session stability and cleanup finished panes. Files such as session-created-handler.ts, spawn-action-decider.ts, action-executor.ts, polling-manager.ts, and related helpers show a fairly complete pane lifecycle manager.

That means the feature is doing several real systems tasks:

maintaining mappings between subagent sessions and panes,
respecting layout capacity limits,
preserving a main pane area,
recycling older agent panes when space runs out,
closing panes once sessions complete or disappear.

This is much closer to a visual scheduler than to a decorative terminal split.

The key design effect is observability. In systems engineering, observability means the ability to understand internal state by inspecting outputs and behavior. OMO’s tmux view makes multi-agent execution more observable in a very direct way. You can see which subagents exist, when they started, what they are doing, and when they finish.

That has several benefits.

First, it improves user trust. If three Explore agents and one Librarian agent are supposedly running in parallel, the user can watch them. Second, it improves debugging. If one pane stalls, errors repeatedly, or appears to chase the wrong path, that is easier to spot than if everything is hidden behind a final summary. Third, it improves intuition about cost and concurrency. Users develop a better mental model of what “multi-agent” actually means in practice.

There is also a cultural effect. Many agent systems advertise autonomy as magic. OMO’s tmux mode instead makes autonomy look like work. Panes appear, commands run, logs update, and background effort becomes visible. This is arguably healthier. It encourages users to see the system as an orchestrated process rather than as an oracle.

The feature also reinforces OMO’s core architectural claim: the future agent is not one model with one voice. It is a coordinated set of workers. Tmux simply exposes that structure instead of hiding it.

From a UX standpoint, this is clever because it uses infrastructure developers already understand. Tmux is a familiar terminal multiplexer, especially in Unix-like environments. OMO is therefore not inventing a custom GUI dashboard for observability. It is leveraging an existing developer-native medium. That lowers cognitive friction.

There are trade-offs. Tmux-based observability is most natural in terminal-centric workflows and less useful for users outside that environment. It also adds complexity to session management and pane cleanup. But as a design experiment, it is extremely valuable.

It suggests a broader lesson for coding agents: if multi-agent orchestration becomes normal, users will need ways to observe it. Today that might be tmux panes. Tomorrow it might be richer dashboards, timelines, dependency graphs, or live task boards. OMO’s implementation is an early but concrete example of this shift.

In that sense, the tmux visual multi-agent system is not merely a nice quality-of-life feature. It is a statement about transparency. The system is saying: if we claim many agents are working for you, we should be willing to show them working. That moves coding agents away from theatrical autonomy and toward inspectable autonomy.

For a field that often hides its most important runtime behavior behind a single stream of chat bubbles, that is a meaningful step forward.

Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 11 — Claude Code’s Commercial Design Model: openai/gpt-5.4 Generated: 2026-04-01 Token Usage: unavailable in current environment

11.1 Security Architecture

Claude Code’s security design is notable because it does not treat safety as a single permission dialog. Instead, it builds a layered execution pipeline around risk classification, mode switching, semantic command analysis, and OS-level isolation. In source terms, this logic concentrates in src/utils/permissions/permissionSetup.ts, with supporting roles played by yoloClassifier.ts, bashClassifier.ts, and dangerousPatterns.ts. The result is a commercial-grade model in which autonomy is increased only after the system has narrowed the risk surface.

The first visible layer is the four-mode permission model: default, auto, bypass, and plan. Default mode is the conventional guarded state: sensitive actions still require explicit confirmation. Auto mode is more ambitious. Here Claude Code tries to approve low-risk operations automatically, but only after classifier checks. Bypass mode is the high-trust escape hatch for expert users who accept the consequences of fewer prompts. Plan mode, by contrast, is deliberately restrictive: it privileges thinking and task decomposition over execution. Commercially, this is a strong design because it gives Anthropic a gradient between safety and speed rather than forcing a binary “locked down vs. unlimited” experience.

The most interesting mechanism inside auto mode is the so-called ML YOLO classifier. The term “YOLO” here does not mean reckless execution; it refers to a learned approval layer that decides whether a pending action is safe enough to run without interrupting the user. yoloClassifier.ts builds prompts, tracks classifier transcripts, estimates token usage, and records decisions. In practical terms, Claude Code is using one model-driven judgment system to supervise another model-driven agent. This is important commercially because it reduces friction while preserving a review step. Anthropic has publicly tied this architecture to an 84% reduction in permission prompts when paired with sandboxing, which shows the point of the design: fewer interruptions without abandoning control.

The second intelligence layer is the bash command classifier. A command string is not judged only by syntax; it is judged by semantics. That matters because rm -rf build and python -c "..." are both shell commands, but they present very different threat profiles. Claude Code therefore analyzes what a command is trying to do, not just whether it belongs to the Bash tool. This is what “semantic classification” means in practice: inferring intent from command structure, arguments, and context. Compared with simpler allowlists, this is much closer to how a human security reviewer thinks.

The third layer is explicit dangerous pattern detection. dangerousPatterns.ts and the related logic in permissionSetup.ts blacklist rule shapes that would silently open arbitrary code execution. Examples include script interpreters such as python, node, ruby, perl, php, and lua; shell wrappers such as bash, sh, and zsh; package runners like npx, bunx, and npm run; and wildcard-style permission rules such as python:*, node*, or bare *. The reasoning is straightforward: once a model can freely invoke an interpreter or a wildcard shell prefix, it can generate effectively unlimited new behavior beyond what the permission rule seemed to describe. Claude Code also extends this logic to PowerShell and even sub-agent spawning, because unrestricted delegation can itself become a safety bypass.

The fourth layer is OS-level sandboxing. Claude Code does not rely only on prompt engineering or classifier accuracy. It also uses platform isolation: Bubblewrap on Linux and Seatbelt on macOS. Bubblewrap is a process sandboxing mechanism common in Linux desktop security; Seatbelt is Apple’s policy-based sandbox framework. Both move safety enforcement below the agent layer and into the operating system boundary. This matters because commercial agents cannot assume perfect model behavior. If the classifier makes a mistake, the sandbox can still constrain filesystem writes, process access, or network behavior. The system is therefore designed according to defense in depth: model guardrails first, platform guardrails second.

Architecturally, the commercial value of this system is not merely that it is “secure.” The real value is that it makes high-autonomy behavior operationally sellable. Enterprises do not buy autonomy alone; they buy bounded autonomy. Claude Code’s permission stack answers that requirement by combining policy modes, semantic command analysis, pattern blacklists, and kernel-adjacent containment. OpenCode and Oh-My-OpenCode expose more of their control surface to users and plugin authors, which is excellent for experimentation. Claude Code instead packages safety as an integrated product feature. That is a different philosophy: less freedom at the edges, more confidence at the center.

This chapter therefore highlights an important principle for future agent design. The best security architecture is rarely the harshest or the most permissive one. It is the one that can selectively convert low-risk work into invisible flow, while still escalating ambiguous or dangerous operations to explicit control points. Claude Code’s four-mode model, ML approval layer, semantic Bash analysis, dangerous-pattern stripping, and OS sandboxing together form one of the clearest examples of that principle in production AI tooling.

Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 11 — Claude Code’s Commercial Design Model: openai/gpt-5.4 Generated: 2026-04-01 Token Usage: unavailable in current environment

11.2 Cost Control

One of Claude Code’s most commercially mature traits is that cost is not hidden behind a provider bill or a back-office report. It is surfaced inside the product as a first-class runtime concern. The key implementation lives in src/cost-tracker.ts, supported by src/costHook.ts, the /cost command, and budget checks in QueryEngine.ts. This matters because enterprise AI tools are judged not only by quality and safety, but also by whether they can be governed financially.

At the center is a built-in USD cost tracker per session. Claude Code records more than just total input and output tokens. It tracks per-model usage, total API duration, wall-clock duration, lines added and removed, cache creation tokens, cache read tokens, and web search calls. In cost-tracker.ts, usage is accumulated by model and normalized through canonical model names so the session can show both a total bill and a per-model breakdown. This is not a toy estimate. It is an attempt to translate agent behavior into accounting units that teams can understand.

This design becomes more interesting once caching is included. Modern model APIs often price cached token creation and cached token reads differently from fresh prompt tokens. Claude Code explicitly stores cache creation input tokens and cache read input tokens, then factors them into cost accounting. That means the system can distinguish between “expensive new context” and “cheap reused context.” Architecturally, this is important because many agent systems aggressively cache without making the savings visible. Claude Code instead turns caching into an auditable efficiency mechanism.

The session lifecycle is also handled carefully. cost-tracker.ts can restore the saved cost state when a session resumes and persist the current totals when a session exits. costHook.ts attaches to process exit so the summary can be printed and the latest values saved. In other words, cost accounting is not tied to a single uninterrupted CLI invocation. It survives continuation, which is essential for long-running coding sessions and remote workflows.

Claude Code also supports a hard session budget through maxBudgetUsd. In QueryEngine.ts, after each yielded message, the runtime checks whether total cost has reached the configured ceiling. If it has, the session stops with an explicit error_max_budget_usd result. This is more than a warning banner. It is overspend blocking. The agent is not merely informed that it is becoming expensive; it is prevented from crossing a user- or organization-defined spending line. For enterprises, this is the difference between observability and enforceability.

The /cost command completes the loop by making this information available on demand. From a UX perspective, this is subtle but important. Many AI products expose cost only in dashboards after the work is done. Claude Code exposes it inside the working conversation, where it can change user behavior in real time. A developer can decide to switch models, compact context, or end a session before costs drift upward. In commercial design terms, /cost is not just a diagnostic command; it is a behavioral control surface.

This architecture also supports enterprise chargeback. Chargeback means attributing usage to the correct team, project, repository, or business unit so internal billing is possible. The source code already tracks usage with a granularity that makes downstream allocation feasible: model-level cost, session identity, duration, and work output metrics. Even when Claude Code does not implement a full ERP-style billing layer inside the CLI, it clearly prepares the telemetry needed for one. That is exactly what enterprise buyers want: not only “How much did we spend?” but “Who spent it, on what, and was it worth it?”

Compared with many open-source agents, Claude Code is less romantic about token usage. It treats tokens as economic events. That perspective shapes product design. Compaction becomes a cost tool, not just a context tool. Model choice becomes a budget decision, not just a quality decision. Session continuation becomes an accounting boundary, not just a conversational convenience. OpenCode and Oh-My-OpenCode can certainly add their own cost layers, but Claude Code integrates cost into the core control plane.

The larger lesson is that serious agent systems need a financial architecture, not just an inference architecture. Once agents can search the web, call tools, spawn workers, and remain active for long periods, token consumption stops being an invisible backend detail. It becomes part of system design. Claude Code’s session-level USD accounting, cache-aware pricing, budget ceilings, and /cost visibility show how a commercial coding agent turns that reality into product discipline.

Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 11 — Claude Code’s Commercial Design Model: openai/gpt-5.4 Generated: 2026-04-01 Token Usage: unavailable in current environment

11.3 Enterprise Readiness

Claude Code differs from many open-source coding agents in one decisive way: it is designed not only for individuals, but for organizations that need centralized control. That design shows up in services such as remoteManagedSettings, policyLimits, teamMemorySync, GrowthBook-based feature gates, and Anthropic-backed authentication. Together they form a management plane around the agent runtime.

The first enterprise pillar is remote managed settings sync. In src/services/remoteManagedSettings/index.ts, Claude Code can fetch centrally managed settings from an Anthropic backend, cache them, validate them, and poll for updates. The implementation uses checksums, retries, eligibility checks, and fail-open behavior. “Fail open” here means the CLI continues to function if the settings endpoint is unavailable; it does not brick the user’s workflow. This is a very enterprise-style compromise: central control is important, but availability remains critical.

The second pillar is team memory sharing. src/services/teamMemorySync/index.ts synchronizes repo-scoped memory files through a server API so knowledge can be shared across authenticated organization members. In practical terms, Claude Code is not treating memory as a purely local artifact under ~/.claude; it is treating memory as a collaborative organizational asset. This is significant because many agent systems have excellent personal memory but weak team memory. Claude Code moves toward shared institutional context, which is exactly what enterprises want when multiple developers work in the same repository.

The third pillar is policy limits. src/services/policyLimits/index.ts fetches organization-level restrictions and uses them to disable CLI features. The code comments explicitly mention admin-configurable restrictions such as remote sessions. More broadly, the policy-limits layer is the mechanism through which an enterprise can say: remote control is allowed or not allowed; certain MCP pathways are permitted or forbidden; specific plugin surfaces can be constrained. This is the difference between a user-owned tool and an admin-governed tool.

That policy model naturally extends to managed plugin lists and plugin whitelisting. Claude Code’s plugin system is not simply an unrestricted marketplace. The broader codebase includes managed-plugin and plugin-installation paths precisely because enterprises need supply-chain control. In a commercial setting, “extensibility” without curation is often unacceptable. A plugin whitelist solves the governance problem by allowing extension only through approved packages, publishers, or marketplaces.

Another major enterprise element is GrowthBook feature gates. src/services/analytics/growthbook.ts implements runtime feature evaluation, refresh listeners, targeting attributes, cached remote values, and exposure logging. In product terms, this enables live rollout control, segmentation, and A/B testing. An A/B test is a controlled experiment in which different users receive different variants of a feature so the vendor can compare outcomes such as engagement, latency, or success rate. For consumer apps this is usually about growth. For enterprise software it is often about safe rollout. Claude Code can enable a feature for internal users first, then for selected organizations, then for general availability. That is a mature operating model.

Authentication is also part of enterprise readiness. Remote managed settings, team memory, and policy limits all depend on Anthropic-backed auth, supporting both API-key and OAuth-style flows depending on user type. This gives Claude Code a trusted identity substrate for enterprise features. A central backend knows which organization a user belongs to, what subscription tier they have, and which controls should apply. Open-source agents often push this burden onto self-hosted infrastructure or local config. Claude Code internalizes it as product capability.

The commercial significance of all this is easy to underestimate. Enterprises do not merely ask whether an AI coding agent works. They ask whether it can be rolled out, governed, audited, restricted, and updated across a fleet of users. Remote settings answer rollout. Policy limits answer governance. Team memory answers organizational knowledge sharing. Managed plugins answer supply-chain control. GrowthBook answers staged experimentation. Auth answers identity and entitlement.

In comparative terms, OpenCode and Oh-My-OpenCode excel at openness, composition, and user-owned extensibility. Claude Code excels at operational controllability. That difference does not make one universally better than the others, but it does explain why Claude Code looks more like a commercial platform than a hackable framework. Its enterprise readiness is not a marketing add-on. It is encoded directly into the architecture.

Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 11 — Claude Code’s Commercial Design Model: openai/gpt-5.4 Generated: 2026-04-01 Token Usage: unavailable in current environment

11.4 Custom Ink Implementation

Claude Code’s terminal UI is far more than a thin wrapper around prompts and logs. It contains a substantial custom rendering system under src/ink/, described here as a 52-file terminal React renderer with 148 component directories in the broader UI surface. The important architectural point is not the exact count, but the ambition: Claude Code treats the terminal as a serious application platform, not as a fallback shell.

At the core is a custom React reconciler in src/ink/reconciler.ts. A reconciler is the part of React that decides how declarative component changes are translated into updates in a rendering target. In a browser, that target is the DOM. In Claude Code, the target is a terminal-specific object tree with layout nodes, event handlers, screen buffers, and cursor state. This means the team did not simply use React for component organization; they adapted React’s rendering engine to a non-browser environment.

Layout is handled through a Yoga-based engine. Files such as layout/yoga.ts, layout/engine.ts, render-to-screen.ts, and screen.ts show a pipeline from React tree to Yoga layout to terminal paint buffer. This matters because terminal UIs are historically awkward: widths are variable, wrapping is complex, and reflow is fragile. Claude Code solves that by borrowing a flexbox-style layout model and then translating the computed geometry into screen cells. In effect, it imports modern UI layout thinking into the command line.

Rendering is equally sophisticated. The termio layer, Ansi.tsx, RawAnsi.tsx, colorize.ts, and related files handle ANSI output, alternate screens, cursor movement, and text measurement. ANSI escape codes are control sequences that terminals interpret as commands rather than plain text: move the cursor, change color, clear a line, switch screen buffers, and so on. Claude Code uses these primitives to produce a responsive full-screen interface with styling, selection, search highlighting, and dynamic redraws. This is one reason the product feels closer to a native TUI application than to a streaming chatbot.

The system also supports animation and interaction at a deeper level than many CLI tools. The presence of frame timing, throttling constants, focus management, event dispatchers, resize events, paste events, click events, and input hooks suggests that Claude Code’s UI is built like a real reactive app. This is commercially valuable because it improves perceived quality. Responsiveness, smooth spinners, preserved layout, and stable scrolling all make the product feel deliberate rather than improvised.

Input ergonomics are another major theme. Claude Code includes Vim keybindings through dedicated command support and a larger vim-state subsystem. That matters because expert users in terminal environments value modal editing, fast navigation, and keyboard-first control. The product is not designed only for casual newcomers; it explicitly accommodates power users.

Accessibility is also present. Comments in App.tsx, ink.tsx, and cursor-related hooks reference screen readers, screen magnifiers, IME composition, and visible cursor behavior in accessibility mode. In other words, the team is thinking not only about raw terminal cleverness but also about assistive technologies and input diversity. Commercial software increasingly requires this. Open-source projects may accept “works on my terminal”; enterprise-grade products usually cannot.

The comparison with OpenCode is instructive. OpenCode’s TUI is built around Solid.js, a reactive UI framework optimized for fine-grained updates. Claude Code instead doubles down on React with a custom terminal reconciler. Solid.js tends to favor leaner reactive granularity and simpler runtime costs. React offers a larger ecosystem, more standardized mental models, and a mature component workflow. Claude Code’s choice reflects a commercial bias toward a widely understood UI architecture, even if it requires more internal renderer engineering.

The broader lesson is that interface architecture matters for agent quality. A coding agent is not only a model plus tools. It is also a human control environment. Claude Code’s custom Ink implementation shows what happens when a company treats the terminal as a premium surface: layout, rendering, accessibility, interaction, and editor ergonomics become part of the product’s competitive advantage.

Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 11 — Claude Code’s Commercial Design Model: openai/gpt-5.4 Generated: 2026-04-01 Token Usage: unavailable in current environment

11.5 Multi-Strategy Compaction

Claude Code’s context-management system is commercially important because it does not rely on a single “summarize the conversation” fallback. Instead, it uses multiple compaction mechanisms distributed across services/compact/ and services/contextCollapse/. The best way to understand it is as a five-layer defense against context-window exhaustion.

The first layer is auto-compact, implemented in autoCompact.ts. This is the proactive threshold-based system. Claude Code estimates token usage, reserves headroom for summary output, computes an effective context window, and triggers compaction before the session reaches a blocking limit. This is the standard commercial answer to long conversations: compress before failure, not after failure. The important point is that auto-compact is policy-driven. It can be enabled, disabled, and tuned against context-window thresholds.

The second layer is snip-compact, associated here with history snipping. Conceptually, snipping removes or trims older context segments rather than fully re-summarizing everything. In product terms, this is a lower-latency, more surgical response to context bloat. Instead of immediately producing a large replacement summary, the runtime can discard or replay only the history segments that are no longer worth their token cost. This keeps the working set small while preserving recent conversational fidelity.

The third layer is micro-compact, implemented in microCompact.ts. This mechanism operates below the level of full conversation summaries. It focuses on compacting bulky tool results from selected tools such as file reads, shell output, grep, glob, web fetch, web search, and file edits. That is a crucial insight. In coding-agent sessions, the biggest context offender is often not user dialogue but verbose tool output. Micro-compaction therefore attacks the real payload inflation source: large, stale, low-value tool results.

The fourth layer is session memory compaction, implemented in sessionMemoryCompact.ts. Claude Code maintains session memory as a separate artifact, then compresses or truncates it when needed while preserving API invariants such as tool-use and tool-result pairing. This is a more advanced idea than ordinary conversation summarization. It recognizes that memory itself can become oversized and must therefore be compacted as an object in its own right. In other words, Claude Code manages both the visible transcript and the hidden memory substrate.

The fifth layer is context collapse, exposed through services/contextCollapse/. Even though some external builds stub parts of this area, the architectural idea is clear: old messages are progressively converted into committed summaries or collapsed context structures. This is not just “compaction after overflow.” It is a long-horizon strategy for turning raw history into compressed state. The distinction matters. Compaction is reactive; collapse is structural.

What makes this system impressive is the combination. Auto-compact handles predictable growth. Snip-compact removes low-value history segments. Micro-compact shrinks tool-result payloads. Session-memory compact reduces persistent memory inflation. Context collapse turns aging conversation into durable summary state. Each mechanism is addressing a different failure mode.

Commercially, this is a much stronger approach than one-shot summarization. Large agent sessions fail in several ways: too many messages, too much tool output, too much remembered state, or too much long-tail history. A single summarizer is rarely optimal across all four. Claude Code’s architecture therefore resembles storage hierarchy design in systems engineering: short-term context, cached artifacts, compressed memory, and collapsed history each play distinct roles.

This also connects directly to cost and reliability. Smaller active context means lower token spend, fewer prompt-too-long failures, and more predictable latency. It also makes high-autonomy modes more viable because the system can keep working deep into long-running sessions instead of degrading suddenly. OpenCode and Oh-My-OpenCode also care deeply about context engineering, but Claude Code’s commercial edge here is the breadth of mechanisms assembled behind one user-facing experience.

The larger lesson is that future agent systems will need context hierarchies, not just context windows. Claude Code’s multi-strategy compaction stack shows one path forward: treat conversational history, tool artifacts, memory files, and long-tail transcript state as different data classes, then compress each one differently.

Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 11 — Claude Code’s Commercial Design Model: openai/gpt-5.4 Generated: 2026-04-01 Token Usage: unavailable in current environment

11.6 Bridge and Coordinator Mode

Claude Code’s commercial design becomes especially distinctive when we look at Bridge, Coordinator mode, and the related assistant mode (KAIROS). These are not just interface features. They are operating modes that extend the agent beyond a single local REPL. In the source tree, this capability spans src/bridge/ and src/coordinator/.

Bridge is the remote-control substrate. The user-facing entry point is claude remote-control, while the implementation is spread across files such as bridgeMain.ts, initReplBridge.ts, replBridge.ts, bridgeApi.ts, and session-spawning utilities. Architecturally, Bridge allows a local Claude Code session to become remotely accessible from the web or mobile surfaces. This means the CLI is no longer a purely local interface; it can become a resumable endpoint in a wider Anthropic ecosystem.

That capability matters for two reasons. First, it enables remote session support. A session can persist, be resumed, and be managed across devices rather than dying with the terminal window. Second, it gives Claude Code a pathway into enterprise and mobile workflows where local shell presence is inconvenient. The command-line product effectively becomes a remotely addressable work session.

The implementation reflects real product maturity. bridgeMain.ts handles environment registration, spawning, work dispatch, heartbeat logic, token refresh scheduling, timeouts, worktree creation, and session continuation flags such as --continue and --session-id. In other words, Bridge is not a simple websocket tunnel. It is session infrastructure.

Coordinator mode addresses a different problem: not remote access, but multi-agent orchestration. In coordinatorMode.ts, Claude Code defines a coordinator persona whose job is to launch workers, synthesize findings, direct implementation, and manage verification. The file even includes detailed guidance about when to parallelize workers, when not to delegate trivial tasks, how worker notifications arrive, and how to continue or stop workers. This is a formal orchestration layer, not an improvised “spawn an agent” helper.

The significance is that Claude Code productizes a pattern that many open-source agent users try to build manually: one supervising agent coordinating several worker agents. The coordinator is given explicit rules for concurrency, synthesis, verification rigor, and worker lifecycle. This is a notable commercial move because it turns multi-agent work from a power-user experiment into a supported operating mode.

Then there is assistant mode, associated with KAIROS gates throughout the codebase. KAIROS is tied to long-running, continuous interaction patterns: scheduled check-ins, resumed sessions, remote-control continuation, brief views, and related behavior that makes the agent feel less like an isolated prompt-response loop and more like an ongoing collaborator. From an architectural perspective, assistant mode shifts the frame from “single request execution” to “persistent relationship with the session.”

These features are especially important because they are still relatively rare in coding agents. OpenCode and Oh-My-OpenCode are stronger in open extensibility and community-driven orchestration experimentation. Claude Code is stronger in turning advanced orchestration into a product surface. Bridge gives remote continuity. Coordinator mode gives supervised parallelism. Assistant mode gives longer-lived engagement. Together they form a suite of enterprise and power-user capabilities that go beyond the classic local-agent mold.

The strategic lesson is clear. The future of coding agents is not only better reasoning inside one context window. It is better session topology: local and remote, single-agent and multi-agent, synchronous and long-running. Claude Code’s Bridge and Coordinator modes are important precisely because they show how a commercial system begins to design that topology explicitly.

Deep Dive: Claude Code’s Sub-Agent Architecture

Model: openai/gpt-5.4
Token Usage: unavailable in current environment

The previous discussion framed Bridge and Coordinator mode as product surfaces. But underneath those surfaces sits a deeper idea: Claude Code treats multi-agent work as a first-class runtime capability rather than a prompt trick. That distinction matters. In many agent systems, “sub-agent” really means “the main agent writes a chunk of text that imitates delegation.” Claude Code goes further. It provides explicit spawning, task tracking, result retrieval, and constrained role definition. In practice, this creates a structured sub-agent architecture with its own design philosophy.

The best way to understand that philosophy is through the lens of context management. A sub-agent is not valuable merely because it is another model call. It is valuable because it can operate with a different objective, different tool permissions, different temporal lifecycle, and—most importantly—a different context boundary. Claude Code’s architecture consistently pushes toward the idea that one problem should not automatically drag the entire conversation history into every worker. That is why the system is best understood as an implementation of a context firewall: a deliberate boundary that prevents irrelevant or noisy history from contaminating specialized work.

graph TB
    Main["🧠 Main Agent<br/>(Parent Context)"]
    
    Main -->|"AgentTool"| SA1["🤖 Sub-Agent 1<br/>Isolated Context"]
    Main -->|"AgentTool"| SA2["🤖 Sub-Agent 2<br/>Isolated Context"]
    Main -->|"TaskCreate"| BG1["⏳ Background Task<br/>(DreamTask)"]
    
    SA1 -->|"compressed result"| Main
    SA2 -->|"compressed result"| Main
    BG1 -->|"TaskOutput"| Main
    
    SA1 -.->|"SendMessage"| SA2
    
    subgraph "Context Firewall"
        SA1
        SA2
        BG1
    end
    
    CA[".claude/agents/<br/>Custom Agent Definitions"] -.->|"defines"| SA1
    CA -.->|"defines"| SA2
    
    style Main fill:#4a9eff,color:#fff
    style SA1 fill:#51cf66,color:#fff
    style SA2 fill:#51cf66,color:#fff
    style BG1 fill:#ffd43b,color:#000

1. AgentTool: spawning a new agent behind a context firewall

At the center of this design is AgentTool. Conceptually, AgentTool is Claude Code’s canonical sub-agent launcher. The parent agent uses it to create another agent instance and delegate a task. But the key point is not just spawning. The key point is how the spawned agent is framed.

The child does not simply inherit the parent’s full transcript. Instead, it receives a more controlled package: the delegated goal, relevant instructions, selected constraints, and a fresh working context. This is what people in modern agent engineering often call a context firewall. The term is not a classical textbook expression, so it deserves explanation. In networking, a firewall filters traffic crossing a boundary. In agent systems, a context firewall filters informational baggage crossing from one agent process to another. It is an architectural mechanism for saying: “Take the task, not the entire psychological residue of the session.”

This has several benefits. First, it reduces distraction. A research-oriented worker does not need every earlier implementation debate. Second, it improves robustness. If the parent took a wrong turn, not all of that confusion has to be inherited. Third, it makes roles clearer. A worker can behave like a worker instead of half-replaying the parent’s indecision. Finally, it is good for token economics: less inherited context means more room for actual work.

The tradeoff is equally important. A clean child context may lose subtle tacit knowledge accumulated by the parent. If the parent learned three failed approaches earlier in the session, a fully isolated worker may rediscover them. Claude Code accepts that tradeoff because it prioritizes reliability and role purity. In other words, it would rather risk some duplicated discovery than allow cross-task contamination to silently degrade reasoning quality.

2. Compressed return path: why results matter more than raw transcripts

AgentTool also implies a return contract. The child agent is not supposed to dump an unbounded transcript back into the parent. Instead, it returns a compressed result: findings, conclusions, recommended actions, possibly structured summaries. This is architecturally crucial. If every sub-agent returned its full chain of exploration, the parent context would quickly collapse under its own weight.

So Claude Code’s design is asymmetric in a productive way. Outbound delegation is selective; inbound reintegration is compressed. That means the parent works more like a coordinator reading reports than like a memory sink swallowing raw cognition. This is a major difference between “multi-call prompting” and real orchestration. Real orchestration needs bandwidth control between roles.

Bandwidth control is another non-classical term worth clarifying. In distributed systems, bandwidth describes how much data can move between nodes. In agent systems, the equivalent question is how much reasoning artifact should travel between contexts. Too little transfer and the system fragments. Too much transfer and every worker pollutes every other worker. Claude Code’s sub-agent system leans toward disciplined summarization.

3. Custom agents in `.claude/agents/`: declarative specialization

Claude Code’s sub-agent model becomes more interesting when we look at custom agents. These are defined in .claude/agents/ as Markdown files with YAML frontmatter. That format is deceptively simple but strategically important. It means specialization is not hard-coded only in TypeScript classes or internal runtime enums. It is also exposed as a declarative artifact that users or teams can author.

A custom agent definition can typically express several dimensions:

agent name and description,
a system prompt or role framing,
allowed tools or tool restrictions,
model preference,
invocation metadata for slash commands or agent selection.

This is Claude Code’s closest analogue to OMO’s specialized agents such as Oracle, Explore, and Librarian. But the implementation philosophy differs. OMO’s agents are more programmatically embedded: factory functions, routing logic, fallback chains, category resolution, and runtime prompt builders are part of the orchestration engine itself. Claude Code’s model is more declarative. The agent is described as content, and the runtime interprets that content.

That distinction matters because it changes who owns extensibility. Declarative agent files are easier to inspect, version, and share. They are friendlier to teams that want policy-readable specialization. Programmatic agent systems are often more powerful because they can encode richer logic, fallback trees, and dynamic assembly. So the contrast is not “one is better.” It is “one optimizes legibility and packaging, the other optimizes orchestration logic density.”

4. Task Tools: sub-agents that do not block the parent

If AgentTool is the primary spawn mechanism, the Task Tools are the lifecycle manager for asynchronous work. This is where Claude Code becomes recognizably more than a single-threaded conversational agent.

The key components are:

TaskCreateTool: creates a background task,
TaskGetTool: checks task status,
TaskListTool: enumerates tasks,
TaskOutputTool: retrieves results,
DreamTask: supports longer-running background processing patterns.

The architectural significance is that the parent agent does not have to block while waiting for delegated work. This is a classic asynchronous design move. In operating-systems terms, blocking means a process waits idly until another process finishes. Non-blocking or asynchronous behavior allows the controller to continue other useful work while the background unit runs. Claude Code applies that logic to agent orchestration.

This matters especially for engineering workflows that mix different timescales. Code search may finish quickly. A broad analysis pass may take longer. A long-running synthesis or remote continuation flow may run much longer still. If everything is forced into one synchronous loop, the system becomes either sluggish or context-bloated. Task Tools separate delegation time from result collection time. That is one of the clearest marks of a serious orchestration runtime.

Compared with OMO, the similarity is obvious: both systems support background workers and later retrieval. The difference is in the surrounding architecture. OMO wraps background work in a more explicit hierarchy with model/provider concurrency policies, wisdom accumulation, and parent session continuation. Claude Code’s task layer feels flatter and more product-integrated. It has the feel of a curated operating system primitive rather than an externally visible orchestration doctrine.

5. SendMessageTool: minimal but real inter-agent communication

Claude Code also includes SendMessageTool, which enables one running agent to send a message to another running agent. This is a modest feature on paper, but conceptually it is important. Without it, sub-agents are mostly isolated workers that report only back to the parent. With it, Claude Code gains a primitive form of inter-agent communication.

Again, this needs a small conceptual explanation. In textbook distributed computing, processes can coordinate through shared memory or message passing. Claude Code clearly leans toward message passing rather than shared memory. Agents do not appear to merge into one universal context pool. Instead, they send explicit messages across boundaries.

That is a safer choice. Shared memory makes coordination powerful, but it also makes contamination easy: every node can implicitly modify the cognitive environment of every other node. Message passing is narrower and slower, but more inspectable. You can ask who said what, when, and to whom. Claude Code’s SendMessageTool is still primitive relative to a full actor framework or a research-grade agent bus, but it proves the architecture is not limited to one-way delegation. Workers can coordinate without collapsing all isolation guarantees.

6. Coordinator mode: from sub-agent capability to orchestration doctrine

The existence of AgentTool and Task Tools would already make Claude Code a capable multi-agent system. Coordinator mode goes further by turning those raw capabilities into an explicit operating pattern.

Coordinator mode formalizes a hub-and-spoke arrangement. One central coordinator acts as the hub. Multiple workers act as spokes. The coordinator decomposes the task, launches sub-agents, monitors progress, requests verification, and aggregates outputs into a final decision. Each spoke gets its own context window, which keeps the workstreams relatively clean.

This is comparable to OMO’s Atlas concept in spirit, but there is a structural difference. OMO is more clearly layered: planning agents, execution agents, and specialist workers often belong to a more visible hierarchy. Claude Code’s coordinator design is flatter. The hub is strong, the spokes are specialized, but the architecture is less explicitly stratified into multiple authority tiers. That flatter model has advantages: it is easier to understand, easier to productize, and easier to keep safe. The downside is that it may be less expressive for very large workflows where planning, execution, validation, and archival memory all need distinct control planes.

Still, Coordinator mode matters because it shows Claude Code moving from “I can launch extra agents” to “I know when and how to use them.” In system design, that is the difference between capability and protocol. A capability is a primitive. A protocol is a disciplined way of combining primitives to produce reliable behavior.

7. Context isolation versus OMO’s wisdom transfer

The deepest comparison point between Claude Code and OMO is not simply tooling. It is memory philosophy.

Claude Code’s instinct is isolation. Each sub-agent gets a clean or mostly clean context window. Results come back summarized. Cross-contamination is minimized. This aligns strongly with the context firewall principle.

OMO’s instinct is accumulation. Earlier work can be distilled into reusable “wisdom” that later sub-agents inherit. This creates continuity across a longer mission. It also means later workers may benefit from prior discoveries without rerunning the same exploration.

These are two different answers to the same systems question: when one agent learns something useful, should later agents automatically receive it?

Claude Code answers: only in compressed, controlled form. OMO answers: often yes, because cumulative task memory is strategically valuable.

Neither answer is universally correct. Isolation improves local reliability, reduces context drift, and preserves role integrity. Accumulation improves mission continuity, decreases redundant search, and can increase strategic coherence. But accumulation also raises the risk of what we might call context pollution—another non-textbook term that means stale, irrelevant, or mistaken information continuing to shape future reasoning simply because it is present.

This is why Claude Code’s approach feels closer to a commercial safety posture. Commercial systems tend to prefer bounded behavior, predictable failure modes, and clearer responsibility boundaries. OMO, as a more ambitious orchestration layer, is willing to trade some purity for compounding mission intelligence.

8. Why this architecture matters for agent design generally

Claude Code’s sub-agent system is important beyond Claude Code itself because it demonstrates a mature answer to a recurring industry problem: how do you scale agent work without turning every task into one giant, polluted transcript?

Its answer has four parts:

spawn specialized workers explicitly,
isolate their contexts,
manage them asynchronously,
reintegrate only compressed outcomes.

That combination is one of the clearest reusable orchestration patterns in modern coding agents. It is not the only good pattern, but it is one of the most defensible for production use.

For a future “best-of-both-worlds” system, the likely synthesis is not choosing Claude Code or OMO. It is combining Claude Code’s context firewall discipline with OMO’s richer mission memory. In other words: keep worker contexts clean by default, but allow carefully validated knowledge objects—not raw conversation residue—to survive across sub-agents. That would preserve reliability while still letting the system learn cumulatively over longer projects.

Seen this way, Claude Code’s sub-agent architecture is not merely an implementation detail. It is a concrete argument about how coding agents should scale: not by making one context window infinitely smart, but by building controlled societies of bounded agents.

Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 11 — Claude Code’s Commercial Design Model: openai/gpt-5.4 Generated: 2026-04-01 Token Usage: unavailable in current environment

11.7 Slash Command System

Claude Code’s slash-command architecture is one of the clearest signs that it is a platform, not just a chat loop. The central registry lives in src/commands.ts, a file of roughly 754 lines that imports, filters, composes, and exposes a command surface spanning more than one hundred command directories. This system turns the CLI into a structured application shell.

Conceptually, Claude Code supports three command categories: prompt commands, action commands, and interactive commands. Prompt commands generate text or instructions for the model. Action commands execute behavior directly. Interactive commands launch richer UI flows or local views. This division is more important than it first appears. It means slash commands are not treated as one flat namespace of ad hoc shortcuts. They are typed interfaces into different execution modes.

The built-in catalog includes core commands such as /init, /compact, /memory, /session, /model, /effort, /cost, /mcp, /skills, /agents, /tasks, and /plugins, along with many others. The result is that Claude Code exposes configuration, context management, model control, extensibility, remote features, and diagnostics through a unified command language. This helps explain why the product feels operable rather than merely conversational.

What makes the system especially powerful is that the registry is dynamic. commands.ts does not only return built-in commands. It loads skill-directory commands, plugin commands, plugin skills, bundled skills, workflow commands, built-in plugin skill commands, and even dynamically discovered skills. In other words, the slash-command layer is the convergence point for several extensibility systems. Commands are where features become visible.

This design also blurs the boundary between commands and skills. The file explicitly defines filtering paths such as getSkillToolCommands, getSlashCommandToolSkills, and getMcpSkillCommands. Some commands are user-invocable from the slash interface; others are model-invocable via the Skill tool; some MCP-loaded commands become skills if they satisfy prompt-based criteria. The important architectural insight is that Claude Code uses one command substrate to serve both human control and model control.

That is commercially elegant because it reduces conceptual sprawl. Instead of having one system for human macros, another for prompt snippets, another for plugins, and another for skills, Claude Code unifies them around command objects with availability rules, enablement checks, source metadata, and loading pipelines. Even authentication or provider requirements can hide or expose commands at runtime.

The result is a command surface that behaves like a miniature operating system. /model changes model choice. /effort changes reasoning posture. /compact changes context state. /mcp manages external tool servers. /plugins and /skills extend capabilities. /session and /memory manage persistence. /cost reveals financial state. /agents and /tasks expose orchestration. This is much richer than the conventional “slash command as prompt template” model seen in many chat products.

In comparative terms, OpenCode and Oh-My-OpenCode also have strong command and extension stories, often with more open customization paths. Claude Code’s edge lies in how comprehensively the slash system is woven into the product architecture. It is not decoration. It is one of the main control planes through which the user manages the agent.

For future agent design, the lesson is simple: once an agent gains tools, sessions, memory, plugins, policies, costs, and multiple operating modes, natural-language chat alone becomes an inefficient control surface. A robust slash-command system is the structured complement to free-form prompting. Claude Code’s 100+ command architecture shows what that looks like when taken seriously.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 12 — Anatomy of a 130K-LOC Plugin
Token Usage: ~10,200 input + ~1,450 output

12.1 Plugin Entry Point and Bootstrap

The most important architectural fact about Oh-My-OpenCode, or OMO, is that it does not fork OpenCode. It enters through the host’s plugin surface and then expands itself into a much larger runtime. The whole maneuver starts in src/index.ts, a file barely over one hundred lines long. That small size is misleading. The file is not where the complexity lives; it is where the complexity is sequenced.

The exported value is an async Plugin function. OpenCode calls it during plugin loading, before the first real user message is processed. In other words, OMO boots early enough to shape the environment in which the first prompt will be formed. That timing matters. A later extension point could add tools or tweak output, but it could not reliably establish session policies, hook graphs, model caches, or message transforms before the first turn.

The first visible actions are infrastructural. OMO logs plugin entry, injects server auth into the client via injectServerAuthIntoClient(ctx.client), and starts a tmux environment check with startTmuxCheck(). These are not “business features” in the narrow sense. They are bootstrap stabilizers. They make sure the client can authenticate correctly and that visual multi-agent support knows whether tmux is viable.

After that comes configuration loading. loadPluginConfig(ctx.directory, ctx) merges user-level and project-level OMO config, parses JSONC, validates with Zod, and even falls back partially when one config section is broken. The result is immediately used to compute disabled_hooks, isHookEnabled, and safeHookEnabled. This is the first sign that OMO’s runtime is policy-driven. It does not hardwire all behaviors on every boot. It constructs a selectively enabled graph.

Next, createFirstMessageVariantGate() is instantiated. This gate is subtle but important. It remembers whether a newly created session has already received its first-message model variant override. That lets OMO treat the very first turn differently from later turns, which is useful for model variants such as thinking effort levels or special startup behavior.

Then OMO normalizes tmux settings into a dedicated tmuxConfig object. Even if the user omitted values, the plugin computes defaults like main-vertical layout, main pane size, and minimum widths. This is classic bootstrap design: convert optional raw config into explicit operational state as early as possible, so later subsystems do not have to repeatedly interpret missing values.

The next state object is modelCacheState = createModelCacheState(). This cache is shared across later phases, especially hooks that monitor context windows or provider-specific model capabilities. It allows one plugin invocation to maintain internal knowledge about model context limits and Anthropic 1M-context capability without repeatedly rediscovering them.

Only after those primitives are ready does OMO build managers through createManagers(...). This manager phase bundles stateful services that need to exist independently of any one hook or tool call: a BackgroundManager for background agents, a TmuxSessionManager for pane orchestration, a SkillMcpManager for skill-embedded MCP servers, and the all-important configHandler that will later inject agents, MCPs, commands, and permissions into OpenCode’s own config object. The manager layer is the bridge between bootstrap and runtime.

At this point we can describe OMO’s effective six-phase initialization pipeline:

Provider preparation — auth injection, tmux check, config load, model cache creation.
Plugin components — indirectly prepared through the config handler, which will later load Claude Code plugin components and compatibility assets.
Agents — builtin agents, Claude-style agents, plugin agents, and overrides are assembled during config handling.
Tools — createTools(...) loads skill context, available categories, and the full tool registry.
MCPs — builtin MCPs, .mcp.json servers, and plugin MCP servers are merged by the config handler.
Commands — builtin commands, skills-as-commands, Claude/OpenCode command directories, and plugin commands are merged last.

sequenceDiagram
    participant OC as OpenCode Host
    participant OMO as Oh-My-OpenCode Plugin
    participant Reg as OpenCode Registry
    
    OC->>OMO: Load plugin (src/index.ts)
    activate OMO
    OMO->>OMO: Phase 1: Initialize providers
    OMO->>OMO: Phase 2: Build plugin components
    OMO->>OMO: Phase 3: Create agents (11)
    OMO->>OMO: Phase 4: Create tools (26)
    OMO->>OMO: Phase 5: Setup MCPs (3 built-in)
    OMO->>OMO: Phase 6: Load commands
    OMO->>Reg: Register 8 hook handlers
    OMO-->>OC: Plugin ready
    deactivate OMO
    
    Note over OC,OMO: First user message arrives
    OC->>OMO: chat.message hook
    OMO->>OMO: Variant gate + session setup
    OMO-->>OC: Modified message

The user asked for the “actual entry point file,” and the entry file confirms that phases 3 through 6 are not all executed inline. Instead, index.ts prepares the objects that will later materialize them. This is a common bootstrap pattern in systems design: the entry point does not do all work directly; it wires the factories that will do the work at the correct lifecycle boundary.

The explicit tool phase begins with const toolsResult = await createTools(...). This is an important ordering decision. Hooks are created after tools, not before. Why? Because some hooks need information derived from tool creation, especially mergedSkills and availableSkills. OMO therefore treats tool construction as a prerequisite for part of prompt and context engineering.

Only then does createHooks(...) run. It receives the plugin config, model cache, background manager, hook enablement predicate, safe-creation flag, and skill inventories. The hook factory itself is layered: core hooks, continuation hooks, and skill hooks. In effect, the entry point is assembling a policy engine whose internal policies can be turned on or off individually.

After hooks comes createPluginInterface(...). This is the moment the internal machinery is projected onto OpenCode’s external plugin contract. The returned interface exposes exactly the host hook names OpenCode understands: tool, chat.params, chat.message, experimental.chat.messages.transform, config, event, tool.execute.before, and tool.execute.after. This is the “8-hook handshake.” OMO has many more internal policies, but it presents them through the eight hook points the host runtime actually calls.

There is one extra extension: experimental.session.compacting. This handler captures todos before compaction, delegates to Claude Code hook compatibility if present, and then pushes extra context produced by compactionContextInjector. That tells us something profound about OMO’s design. It does not merely react to turns; it also intervenes in the host’s memory-management path.

The final lesson of src/index.ts is architectural restraint. A 130K-LOC plugin could have become a monolith. Instead, the entry point remains a disciplined bootstrap sequencer: prepare state, build managers, build tools, build hooks, expose interface. That is why OMO can behave like an orchestration layer while still technically remaining “just a plugin.”

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 12 — Anatomy of a 130K-LOC Plugin
Token Usage: ~10,600 input + ~1,520 output

12.2 Eight Hook Handlers

OpenCode gives OMO eight plugin-facing hook handlers. OMO uses all of them. This is the minimum surface area through which a very large orchestration system enters the host runtime.

The first handler is config, created by createConfigHandler(...) in plugin-handlers/config-handler.ts. This is arguably the most strategic hook because it decides what OpenCode will even know exists. The handler first applies provider config into modelCacheState, then loads plugin components, then applies agent config, tool permissions, MCP config, and command config in that order. That sequence is not arbitrary. Agents must exist before agent-specific permission rewrites can be applied; MCPs and commands come later because they depend less on agent ordering. In practice, the config hook is how OMO injects its world into OpenCode’s registry.

The second handler is tool, bound to tools in createPluginInterface(...). This is not a “callback” in the event sense; it is the tool catalog itself. OMO returns a record of ToolDefinitions, built in plugin/tool-registry.ts, and OpenCode adds them to the tool namespace. This is how LSP tools, background-task tools, delegation tools, skill tools, session tools, and optional task-system tools become callable from the model.

The third handler is chat.message, implemented by createChatMessageHandler(...). This one runs on every incoming chat message and is the most immediate place where OMO can alter the request path before the model sees it. The handler does several things. It tracks session-to-agent mapping through setSessionAgent, applies the first-message variant gate, resolves model variants either from agent config or from model-specific rules, and then invokes a chain of sub-hooks: stop-continuation guard, keyword detector, Claude Code compatibility hooks, auto-slash-command, and start-work. It also detects Ralph Loop templates embedded in prompt text and starts or cancels loop state accordingly. The key idea is that chat.message is not only about editing a message; it is about session-aware pre-processing.

The fourth handler is chat.params, implemented by createChatParamsHandler(...). It looks modest, but it is where OMO can influence generation effort level. The handler normalizes raw host input into a stricter internal shape and then delegates to the anthropicEffort hook. In plain terms, this is where OMO can say: for this agent, session, provider, model, and message variant, adjust model options before inference. The word “effort” here refers to reasoning budget or generation mode, not human labor.

The fifth handler is event, implemented by createEventHandler(...). This is the session-lifecycle nerve center. It dispatches host events such as session.created, session.deleted, message.updated, and session.error into a long list of internal hooks: auto-update, Claude compatibility, background notifications, session notifications, todo continuation, context monitoring, directory injectors, rules injection, think mode, agent reminders, Ralph Loop, compaction preservation, Atlas, and more. It also deduplicates “synthetic idle” versus “real idle” events through a 500ms window. That small detail is an example of runtime maturity: when OMO normalizes session status into idle, it also avoids double-triggering downstream logic.

The sixth handler is tool.execute.before, implemented by createToolExecuteBeforeHandler(...). This is OMO’s pre-interception point for tool calls. The execution pipeline includes write-existing-file guard, question-label truncator, Claude Code hooks, non-interactive environment adaptation, comment checking, directory agents injection, README injection, rules injection, todo-write disabling for task mode, Prometheus markdown-only guard, Sisyphus Junior notepad logic, and Atlas logic. It also rewrites task tool arguments: if a category is provided, it forces subagent_type = "sisyphus-junior"; if a session_id continuation is provided without agent type, it resolves the session’s agent and uses that. This is pre-execution normalization as a policy pipeline.

The seventh handler is tool.execute.after, implemented by createToolExecuteAfterHandler(...). Here OMO can rewrite titles and metadata from a side-channel store, then run post-processing hooks: Claude compatibility, tool output truncation, preemptive compaction, context monitoring, comment checking, directory injectors, rules injection, empty-task detection, agent reminders, category/skill reminders, interactive bash handling, edit/json recovery, delegate-task retry, Atlas, task-resume info, and hashline read enhancement. If tool.execute.before is the input-side policy gate, tool.execute.after is the observation-side policy gate.

The eighth handler is experimental.chat.messages.transform, implemented by createMessagesTransformHandler(...). This hook is powerful because it transforms the message history that will be sent to the model. In current source, the handler explicitly calls context injection first and thinking-block validation second. The pipeline is therefore: augment context, then verify that thinking blocks remain structurally valid. The chapter prompt mentioned todo preservation here; in current source, todo preservation is implemented through the compaction path and lifecycle hooks rather than directly in this transform handler. That distinction matters because it shows OMO spreading related concerns across different host surfaces rather than cramming them into one mega-transform.

Put differently, OMO maps eight OpenCode hook handlers to three broad functions:

Registry mutation: config, tool
Prompt/inference shaping: chat.message, chat.params, experimental.chat.messages.transform
Lifecycle/tool interception: event, tool.execute.before, tool.execute.after

That decomposition explains why OMO feels deeper than a normal extension. A normal extension might add commands or tools. OMO adds registry entries, rewrites message flow, tunes generation parameters, intercepts tool execution, and supervises session lifecycle. The eight handlers are not many in number, but they cover the decisive choke points of an agent runtime.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 12 — Anatomy of a 130K-LOC Plugin
Token Usage: ~10,900 input + ~1,480 output

12.3 Tool Injection Architecture

OMO’s tool system is not just a bag of extra tools. It is a wrapping architecture. The central file is src/plugin/tool-registry.ts, where OMO turns OpenCode’s plugin SDK tool definitions into a larger operational vocabulary with extra context, permissions, state managers, and session-aware behavior.

The registry starts from a small builtin core: six LSP tools exported from src/tools/index.ts — goto definition, find references, symbols, diagnostics, prepare rename, and rename. That is the seed, not the whole story. The registry then layers in grep, glob, AST-grep, session-manager tools, background-task tools, call_omo_agent, optional look_at, the delegation tool named task, skill, skill_mcp, slashcommand, interactive_bash, and optionally the newer task-system tools plus hashline edit. In the base non-experimental path, this is the 26-tool universe typically associated with OMO.

The important design pattern is wrapping. At the outermost level, each exported tool is still an OpenCode SDK ToolDefinition. That means the host can register and call it normally. But the tool factory usually closes over extra OMO state: plugin config, managers, available categories, available skills, current session resolution helpers, or notification side channels. So the pattern is not “replace OpenCode tools.” It is “host-compatible tool object, plugin-specific execution context.”

createToolRegistry(...) makes that explicit. It accepts the plugin context, plugin config, a subset of managers, a skillContext, and available categories. Then it constructs higher-level tools from those ingredients. For example, createBackgroundTools(manager, client) does not merely expose background-output and cancel verbs; it binds them to the current BackgroundManager instance. createCallOmoAgent(...) does not just define an RPC-style tool; it binds disabled-agent rules and the current client. createSkillTool(...) binds merged skills, an MCP manager, session ID resolution, git-master config, and disabled-skill filters.

The most revealing example is delegate-task, exposed to the model as task. Its factory receives the background manager, host client, current directory, user category config, git-master config, the Sisyphus Junior model override, browser provider, disabled skills, available categories, and available skills. It also receives an onSyncSessionCreated callback that forwards session creation into the tmux session manager. In other words, task is not a simple subagent launcher. It is a policy-rich delegation gateway.

That gateway matters because OMO’s multi-agent story is built on OpenCode sessions. A delegated task is often implemented by creating a new session, setting its parent session, choosing an agent, choosing or inferring a model, pushing prompt content, and then either waiting synchronously or polling asynchronously. This is why the book repeatedly emphasizes that OMO is built on top of OpenCode rather than beside it. Delegation is not simulated outside the host. It is expressed through the host’s own session abstraction.

The call_omo_agent tool shows another variation on the wrapping pattern. It only allows a small whitelist of subagent types, normalizes the requested agent case-insensitively, rejects disabled agents, and then chooses either background execution or sync execution. In sync mode it can continue an existing session through session_id. In background mode it explicitly rejects session_id, because the background manager owns that lifecycle differently. This is tool design shaped by runtime semantics, not by surface convenience.

The background-task tools are equally important. background_output and background_cancel are thin from the model’s perspective, but behind them sits a concurrency-aware manager with polling, task history, notifications, descendant-task tracking, and cleanup. This is a good example of why the word “injection” is more accurate than “registration.” A registered tool might simply become callable. An injected tool arrives with a backend.

Concurrency management is one of the clearest backends in this design. ConcurrencyManager groups tasks by either exact model key (provider/model) or agent name. It has per-model and per-provider limits from config, with a default of five concurrent jobs. Acquire/release is implemented with queued waiters and a settled-flag pattern to avoid double resolution. This sounds low-level because it is low-level. Multi-agent UX requires queue correctness. Without it, “background agents” quickly degenerate into races, leaks, and duplicate wakeups.

The tool registry also enforces feature gating. If multimodal-looker is disabled, look_at is not registered. If the experimental task system is off, task_create, task_get, task_list, and task_update are absent. If hashline edit is disabled, the plugin does not expose the edit override. OMO therefore uses registration-time pruning as a safety and complexity control mechanism.

Another important detail is filtering. After building the full allTools record, OMO passes it through filterDisabledTools(allTools, pluginConfig.disabled_tools). This matters because the plugin does not assume that all of its own tools should always be live. Administrators can remove tool surface area at config time.

The larger architectural lesson is that OMO treats tools as the operational ABI, the application binary interface, between the model and the runtime. Once a tool is injected, it is not just a function name. It carries orchestration policy, concurrency rules, model restrictions, side-channel metadata, and in many cases access to whole sub-systems such as tmux, skill MCP, session history, or background polling. That is why OMO’s tool layer feels more like a micro-platform than a plugin appendix.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 12 — Anatomy of a 130K-LOC Plugin
Token Usage: ~11,100 input + ~1,560 output

12.4 Agent System Implementation

OMO’s agent system is where it most clearly turns OpenCode’s native agent abstraction into a higher-level orchestration design. OpenCode already understands the concept of an AgentConfig. OMO does not replace that concept. It manufactures many AgentConfigs, arranges them in a specific priority order, decorates them with prompts and permissions, and then uses those agents as a coordinated society.

The central factory is src/agents/builtin-agents.ts. The file defines agentSources, a map from builtin agent name to either a factory function or a direct AgentConfig source. The visible builtin set includes sisyphus, hephaestus, oracle, librarian, explore, multimodal-looker, metis, momus, and atlas; surrounding config logic also introduces sisyphus-junior and prometheus, which is why OMO is commonly described as having eleven first-class agents in its standard operating posture.

The first major job of createBuiltinAgents(...) is model discovery without deadlocking plugin initialization. The code contains a very explicit warning: do not call OpenCode client APIs here, because this function is reached from the config handler and could deadlock. Instead, it uses cached provider connectivity and fetchAvailableModels(...). That detail is revealing. OMO’s agent factory is constrained by the host lifecycle. Good plugin architecture means respecting host timing, not merely generating the right output eventually.

Next, categories are merged with mergeCategories(categories), producing a list of AvailableCategory objects. Skills are also turned into prompt-usable metadata through buildAvailableSkills(...). These two inventories are then passed into prompt builders later. This means OMO agents are not defined only by fixed text. They are partly synthesized from the current environment.

General builtin agents are assembled through collectPendingBuiltinAgents(...). This function iterates agent sources, skips disabled agents, checks model requirements, resolves a model through applyModelResolution(...), builds the agent, optionally injects environment context, applies overrides, and stores the result. The phrase fallback chain deserves explanation because it is easy to treat as jargon. In systems design, a fallback chain means an ordered list of substitutes tried when the preferred option is unavailable. OMO uses fallback chains for models: if the user’s preferred model is unavailable, it can fall back across providers or variants according to policy.

Sisyphus, Hephaestus, and Atlas are handled specially. maybeCreateSisyphusConfig(...) and maybeCreateHephaestusConfig(...) look at explicit overrides, available models, first-run-no-cache cases, and agent-specific requirements. In first-run conditions, they can choose the first fallback model even without cache evidence. Atlas, as an orchestrator, is also created specially because its prompt depends on available agents and skills.

The real differentiator, however, lies in prompt construction. dynamic-agent-prompt-builder.ts builds prompt modules such as key-trigger sections, tool-selection tables, explore/librarian usage guides, delegation tables, category-and-skill selection protocols, oracle consultation rules, hard blocks, and anti-pattern lists. This is modular system-prompt assembly. Instead of one giant static prompt file, OMO assembles prompt sections from reusable builders.

This modularity matters for two reasons. First, it keeps prompts synchronized with actual runtime capabilities. For example, categorizeTools(...) groups tool names into LSP, AST, search, session, command, and other. The resulting tables are derived from what the runtime actually exposes. Second, it lets OMO inject live inventories such as available skills or categories. The agent therefore knows not only abstract policy but current affordances.

Sisyphus is the clearest case. Its prompt in src/agents/sisyphus.ts is built from identity, intent gating, codebase assessment, exploration policy, implementation policy, delegation structure, session continuity rules, task or todo discipline, category-plus-skill guidance, hard blocks, and anti-patterns. That is far more than a persona prompt. It is an operating manual assembled into the prompt.

The builder also introduces hard blocks such as “never commit without explicit request” or “never speculate about unread code.” These are not low-level host permissions. They are behavioral constraints encoded into the agent’s reasoning context. This distinction is important. OMO combines prompt-level discipline with permission-level gating.

How does this map onto OpenCode’s native agent concept? In OpenCode, an agent is fundamentally a named prompt/mode/tool-permission package. OMO keeps that structure. What it adds is orchestration metadata and prompt-driven coordination logic. In other words, OMO agents remain native OpenCode agents at the data-structure level, but behave like specialized roles in a larger multi-agent protocol.

This is why AgentPromptMetadata exists. Metadata such as cost class, triggers, use-when guidance, avoid-when guidance, and prompt alias helps Sisyphus and Hephaestus reason about which specialist to use. The agent system is therefore two-layered:

Host layer: OpenCode sees a map of AgentConfigs.
Orchestration layer: OMO agents know about each other, categories, skills, and delegation economics.

The deeper lesson is that a strong agent system is not just “many prompts.” It is model resolution plus fallback policy plus permission shaping plus prompt modules plus specialization metadata. OMO demonstrates that once a host lets you register agents natively, you can build a sophisticated social structure on top of that primitive without changing the host core.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 12 — Anatomy of a 130K-LOC Plugin
Token Usage: ~10,800 input + ~1,500 output

12.5 Hook System Deep Dive

OMO is famous for its “41-hook system,” but the crucial architectural point is not the number 41 by itself. It is the compression ratio. OpenCode gives the plugin only a handful of major lifecycle hook points; OMO multiplexes dozens of its own logical behaviors onto them.

The term multiplex deserves a short explanation. In computer systems, multiplexing means carrying multiple logical channels over a smaller number of physical channels. OMO does something analogous. OpenCode exposes a small host-level hook surface, while OMO routes many internal behavioral channels through it. So the host sees a few hook points; internally, OMO runs a policy pipeline.

The assembly logic begins in create-hooks.ts, which merges three hook families:

createCoreHooks(...)
createContinuationHooks(...)
createSkillHooks(...)

createCoreHooks(...) further merges three subfamilies: session hooks, tool-guard hooks, and transform hooks. That gives a five-tier mental model:

Session hooks
Tool-guard hooks
Transform hooks
Continuation hooks
Skill hooks

The hook names themselves are enumerated in src/config/schema/hooks.ts, where OMO defines 41 named hooks. That schema is important because it means hooks are not an ad hoc implementation detail; they are part of the public configuration surface. Users can disable them individually via disabled_hooks.

Execution order matters. In a hook system, order is often equivalent to policy priority.

Consider tool.execute.before. The code runs hooks in an explicit pipeline, not through unordered iteration. The first three stages are especially revealing:

writeExistingFileGuard — prevent unsafe overwrite patterns.
questionLabelTruncator — normalize question labels.
claudeCodeHooks / other pre-processors / rulesInjector — inject extra rules and compatibility behavior.

The user prompt specified “file guard → label truncator → rules injector,” and the source confirms that this is the backbone ordering, even though OMO inserts additional policy steps between or after them. That ordering makes sense. Safety-sensitive write checks should run early; cosmetic or label normalization can run next; repository or policy rules should be injected before execution proceeds.

The rest of the pre-tool pipeline includes non-interactive environment handling, comment checking, directory agents injection, directory README injection, task/todo coordination guards, Prometheus markdown constraints, Sisyphus Junior notepad logic, and Atlas-specific preprocessing. Finally, task and slashcommand receive special semantic rewriting. In other words, the “before” hook is both a guardrail layer and a semantic adaptation layer.

Now consider tool.execute.after. It begins by consuming stored metadata for the tool call, allowing titles and metadata to be rewritten. Then comes another ordered chain: Claude compatibility hooks, tool output truncation, preemptive compaction, context window monitoring, comment checking, directory injectors, rules injection, empty-task detection, skill reminders, interactive bash handling, error recovery, delegate-task retry, Atlas, task resume info, and hashline enhancement. This is observation-time governance. OMO sees what a tool produced and decides how that output should be shaped, explained, retried, or used for future control flow.

event is the broadest multiplexer. The event dispatcher in plugin/event.ts forwards each lifecycle event into a deterministic list: auto update checker, Claude hooks, background notification, session notification, todo continuation enforcer, unstable-agent babysitter, context monitor, directory injectors, rules injector, think mode, Anthropic context-window recovery, agent reminders, skill reminders, interactive bash session handling, Ralph Loop, stop-continuation guard, compaction todo preservation, and Atlas. This means OMO can attach multiple orthogonal policies to the same event without exposing separate host-level event channels.

The experimental.chat.messages.transform path is smaller but strategically powerful. In current source, the ordering is explicit: context injection first, then thinking-block validation. The user prompt also mentioned todo preservation, and conceptually that belongs to the same “preserve critical context” family. In code, however, todo preservation is implemented mainly through compaction hooks such as compactionTodoPreserver and experimental.session.compacting, not directly in the transform handler. That is an instructive design choice. OMO places each preservation policy at the lifecycle point where it is most reliable.

Hook enablement is configuration-driven. index.ts builds isHookEnabled from disabled_hooks, and all major hook constructors call safeCreateHook(...). The “safe” part means hook creation failures degrade to null rather than crashing plugin boot. That is a resilience feature. A policy module can fail to instantiate without taking down the whole orchestration layer.

Two more nuances deserve mention. First, some hooks are auto-disabled when the host grows native support. directory-agents-injector, for example, checks OpenCode version and stands down if native injection exists. Second, some hooks are gated by feature flags beyond simple disabled status. preemptive-compaction only activates when both the hook is enabled and the experimental config flag is true.

The broad lesson is that OMO’s hook system is not merely a list of callbacks. It is a policy router with explicit execution order, graceful failure semantics, version-aware fallback, and configuration-based pruning. Once you see that, OMO’s architecture becomes easier to understand: the agents supply behavior, the tools supply action, and the hook system supplies governance.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 12 — Anatomy of a 130K-LOC Plugin
Token Usage: ~10,400 input + ~1,420 output

12.6 Configuration Layering

Configuration is where OMO’s plugin identity becomes clearest. It is not just “using OpenCode config.” It introduces its own layered policy system and then coexists with OpenCode’s broader host configuration model.

At the plugin level, the entry point calls loadPluginConfig(ctx.directory, ctx) from src/plugin-config.ts. The loader resolves two main physical config locations: a user-level file under OpenCode’s config directory and a project-level file under .opencode/oh-my-opencode. In both places it prefers .jsonc over .json. That preference is deliberate. JSONC means JSON with comments. Comments are not a runtime necessity, but they are a developer-experience necessity. In a dense orchestration plugin, commented config dramatically lowers the cost of understanding what each switch does.

The user prompt asked for “OMO 3-level layering,” and that is the right conceptual model: defaults → user config → project config. The defaults are implicit in code via optional fields and hardcoded fallbacks. User config is loaded first as the base. Project config is then merged on top. This yields a plugin-specific three-layer precedence stack.

At the same time, OMO does not live alone. It is injected into OpenCode, whose own config ecosystem is larger. The config hook receives the host config object, mutates sections like agent, tools, mcp, command, permission, and inspects provider. So OMO’s three-layer plugin config coexists with OpenCode’s broader multi-section host config world. That is the practical meaning of “3-level coexisting with 7-level”: OMO has its own mini-stack, but it ultimately composes into a richer host config graph.

Schema design is central here. src/config/schema/ contains 22 Zod v4 schema files, including agent names, agent overrides, babysitting, background task, browser automation, categories, Claude Code compatibility, commands, comment checker, dynamic context pruning, experimental flags, git master, hooks, notification, the master oh-my-opencode-config, Ralph Loop, Sisyphus agent, Sisyphus core, skills, tmux, and websearch. This is not accidental fragmentation. It is domain partitioning. A large config surface becomes maintainable only when validation logic is modularized by concern.

The master schema, OhMyOpenCodeConfigSchema, stitches these domains together. It defines arrays for disabled_mcps, disabled_agents, disabled_skills, disabled_hooks, and disabled_commands; nested objects for agents, categories, claude_code, experimental, skills, background_task, notification, git_master, tmux, and more; and migration metadata under _migrations. The result is not a flat options bag. It is a typed policy tree.

Validation is strict, but not brittle. loadConfigFromPath(...) first parses JSONC, runs migration, and then calls OhMyOpenCodeConfigSchema.safeParse(rawConfig). If validation succeeds, the whole config is returned. If not, OMO logs issues, records a config-load error, and then calls parseConfigPartially(rawConfig). This partial parser loops through top-level keys, re-validates each section in isolation, and keeps valid sections while discarding invalid ones. This is an example of partial fallback: when a configuration document is partly broken, salvage the usable subtrees instead of failing closed on the entire plugin.

That strategy is especially valuable in long-lived user configs. A single experimental subsection can become outdated after an upgrade. Without partial fallback, the whole plugin could stop honoring unrelated settings. With partial fallback, broken areas degrade locally.

Merge semantics are also instructive. mergeConfigs(base, override) does more than shallow object spread. Nested agents, categories, and claude_code use deepMerge. Arrays of disabled items are unioned through Sets so that user- and project-level disables accumulate instead of overwrite each other. This is important because “disabled” is usually monotonic in policy design: if either layer wants something off, it should stay off unless a more explicit enable mechanism exists.

One subtle but important detail is that OMO distinguishes plugin config from runtime-derived config. For example, tmuxConfig in index.ts computes explicit defaults like layout and pane size. Likewise, the config handler derives additional host config sections from plugin policy, such as agent permissions and command registries. This means the validated user config is only the beginning. OMO then transforms it into an operational config.

The layering story also crosses compatibility boundaries. During config handling, OMO may merge Claude Code commands, skills, agents, or MCP servers into OpenCode’s host config, while still respecting OMO’s own plugin-level switches like claude_code?.commands or disabled_mcps. In other words, configuration layering is not only vertical across scopes; it is horizontal across ecosystems.

The best way to summarize OMO’s configuration architecture is this: typed inputs, comment-friendly files, partial salvage on failure, domain-specific schemas, precedence-aware merging, and runtime derivation. That combination is exactly what a large plugin needs. Without it, a 130K-line extension surface would collapse under its own options.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 12 — Anatomy of a 130K-LOC Plugin
Token Usage: ~10,700 input + ~1,470 output

12.7 Claude Code Compatibility Layer

One of OMO’s most strategically intelligent moves is not purely technical. It is ecosystem-political. Rather than forcing users to abandon Claude Code assets, OMO tries to absorb them. The compatibility layer is spread across claude-code-plugin-loader, claude-code-command-loader, claude-code-agent-loader, and claude-code-mcp-loader.

Start with commands. claude-code-command-loader/loader.ts recursively loads markdown command definitions from four locations: ~/.claude/commands, project .claude/commands, OpenCode global command under the OpenCode config directory, and project .opencode/command. This is important because it means OMO is not only reading Claude-native command directories; it is also offering an OpenCode-shaped landing zone for the same concept. Commands are parsed with frontmatter, wrapped into a common template containing <command-instruction> and <user-request>, then normalized into an OpenCode-compatible CommandDefinition.

Agents are a little different. claude-code-agent-loader/loader.ts loads markdown-defined agents from ~/.claude/agents and project .claude/agents, parses frontmatter, builds an AgentConfig with mode: "subagent", and optionally maps a comma-separated tools field into OpenCode-style tool permissions. Notably, current source does not implement a parallel .opencode/agents loader. That asymmetry is worth stating clearly because migration is not perfectly symmetric across all asset types.

MCP compatibility is implemented in claude-code-mcp-loader. Here OMO reads .mcp.json files from multiple scopes: ~/.claude.json, ~/.claude/.mcp.json, project .mcp.json, and project .claude/.mcp.json. It transforms Claude-style MCP definitions into OMO/OpenCode MCP server config via transformMcpServer(...). The transformation expands environment variables recursively using ${VAR} and ${VAR:-default} syntax through expandEnvVarsInObject(...). For remote servers, it emits { type: "remote", url, headers }; for stdio servers, it emits { type: "local", command: [cmd, ...args], environment }.

Plugins are the most complex part. Full Claude Code plugins are discovered by claude-code-plugin-loader/discovery.ts, which reads installed_plugins.json from ~/.claude/plugins, consults Claude settings for enabled/disabled state, loads manifests from .claude-plugin/plugin.json, and records install paths. This is another place where source accuracy matters: the current implementation discovers installed Claude plugins from Claude’s own plugin home, not from a project-local .opencode/plugins/ directory. Once discovered, plugin components are loaded in parallel: commands, skills-as-commands, agents, MCP servers, and hooks config.

Plugin commands and agents are namespaced as pluginName:assetName. Plugin MCP servers are also namespaced, and their JSON is processed through two compatibility transforms: plugin-root path substitution via ${CLAUDE_PLUGIN_ROOT} and environment expansion via ${VAR}. This is a subtle but important engineering choice. Compatibility does not just mean “read their files.” It means preserve their indirection model.

Why maintain all this compatibility? The simplest answer is migration path. A user coming from Claude Code may already have a library of commands, agents, skills, hooks, and MCP configs. If OMO required manual rewrites into a brand-new extension format, the switching cost would be high. By contrast, if OMO can ingest much of that surface automatically, it becomes a superset environment: “bring your Claude assets, gain OpenCode runtime control, and keep moving.”

This is the migration-path argument, and it is stronger than it may first appear. Tooling ecosystems usually win less by absolute superiority than by lower transition friction. OMO is effectively saying: your previous investments are not dead. They can be translated, layered, or at least partially reused.

The compatibility layer is also selective. plugin-components-loader.ts respects pluginConfig.claude_code?.plugins, plugins_override, and a plugin-load timeout. applyCommandConfig(...), applyAgentConfig(...), and applyMcpConfig(...) each have their own inclusion flags such as claude_code?.commands, claude_code?.agents, claude_code?.skills, and claude_code?.mcp. This means compatibility is not all-or-nothing. You can opt into only the parts you want.

There is also precedence logic. Builtin commands load first, then various command and skill sources, then plugin components. MCP merging preserves user-disabled entries and deletes explicitly disabled MCP names at the end. Plugin agent and command names are namespaced to reduce collisions. Compatibility, in other words, is governed rather than naive.

The bigger architectural lesson is that extensibility standards compete, but migration layers are what make ecosystems interoperable in practice. OMO’s Claude Code compatibility layer is not just a convenience feature. It is a strategic claim: the future coding-agent platform should be able to absorb neighboring ecosystems rather than forcing everyone to restart from zero.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 12 — Anatomy of a 130K-LOC Plugin
Token Usage: ~11,300 input + ~1,580 output

12.8 Background Agent Spawner

OMO’s background-agent system is where the plugin most clearly becomes a scheduler rather than a prompt wrapper. The core implementation lives under src/features/background-agent/, especially manager.ts, spawner.ts, concurrency.ts, and the supporting state and notification modules.

The first key fact is that background agents are implemented as real OpenCode sessions. In both spawner.ts and BackgroundManager.startTask(...), OMO calls client.session.create(...) with the parent session ID and a generated title such as Background: ${description} or ${description} (@${agent} subagent). This means a background agent is not an in-memory coroutine inside the parent session. It is a child session in the host runtime.

That choice buys OMO several things at once: persistence, message history, parent-child relationships, session IDs, tool execution within the host’s normal machinery, and compatibility with existing session inspection APIs. It also explains why OMO can later expose tools like background_output and session-manager reads. There is something concrete to inspect.

Task creation begins with a LaunchInput and becomes a BackgroundTask object containing ID, status, queue time, description, prompt, agent, parent session ID, parent message ID, parent model, parent agent, and possibly explicit model choice. The manager records tasks immediately in pending state, updates task history, and groups queued tasks by a concurrency key.

That concurrency key is central. If a model is explicitly chosen, the key is providerID/modelID; otherwise it falls back to agent name. ConcurrencyManager then enforces limits with three levels of policy: exact model limits, provider-wide limits, and a default limit. If no config is set, the default is five concurrent tasks per key. This is where the user prompt’s “5 concurrent per model/provider” comes from. It is not hand-wavy documentation; it is literally the fallback return value in getConcurrencyLimit(...).

The queueing logic is careful. acquire() either increments the running count or parks a waiter in a per-key queue. release() first tries to hand the slot directly to the next unsettled waiter. Only if no handoff occurs does it decrement the count. That handoff pattern avoids bursts where a released slot briefly appears free before being re-acquired, and the settled-flag pattern avoids double resolution during cancellation. This is the kind of detail that separates a demo multi-agent system from a production-grade one.

When a queued task starts, OMO looks up the parent session to inherit the correct working directory. That is an important source-level detail. Background agents do not blindly run in the plugin’s own startup directory. They attempt to continue in the parent session’s directory, which preserves project context in multi-root or nested-session cases.

After child session creation, OMO marks the session as a subagent session, optionally triggers a tmux callback to spawn a pane, flips task status to running, stores progress metadata, and then sends the prompt asynchronously with promptWithModelSuggestionRetry(...) or promptAsync(...). The launched prompt explicitly restricts tools: it applies agent-specific tool restrictions, disables task, keeps call_omo_agent enabled, and denies question. So the background worker is not just another free-running model turn; it is a constrained worker session.

Parent-session notification is another major part of the design. BackgroundManager maintains pending-task sets per parent, notification queues, completion timers, idle deferrals, and helper modules such as parent-session-notifier.ts. The goal is to avoid forcing the user to manually poll every child task. The parent session can be informed when background work finishes, errors, or batches of tasks complete.

Session continuation is supported in two distinct ways. First, the call_omo_agent tool supports sync continuation through session_id, verified in subagent-session-creator.ts: if session_id is present, OMO looks up the existing session and reuses it instead of creating a new one. Second, the background manager itself tracks descendant task trees by parent session. These are different continuation patterns: direct agent continuation for sync workflows, and task-tree continuity for async orchestration.

The user prompt asked for “preserving full context” through session_id, and that is exactly the right interpretation. Because the continued object is an actual host session, the child retains its transcript, message structure, and accumulated context within whatever compaction limits the host applies.

The final requested concept is boulder state. This lives in src/features/boulder-state/storage.ts. Boulder state stores active-plan metadata in a JSON file, including the active plan path, start time, associated session_ids, and plan name. The metaphor is obviously Sisyphus: the boulder is the active ongoing plan that must keep rolling. When continuation is interrupted, OMO can inspect persisted boulder state and the associated plan files under .sisyphus/plans/. It can append session IDs, clear state on explicit stop, and compute checkbox progress from the markdown plan itself.

This is an elegant design because it splits persistence responsibilities:

OpenCode sessions preserve conversational/runtime context.
Background task state preserves scheduling and notification state.
Boulder state preserves long-horizon plan continuity across interruptions.

That three-way split is why OMO’s background agents feel more durable than ordinary async tasks. They are not merely jobs. They are jobs backed by sessions, session trees, and plan persistence. In practical terms, this is how OMO turns “spawn a subagent” into a real orchestration substrate.

Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 13 — Architectural Philosophy Comparison Model: openai/gpt-5.4 Generated: 2026-04-01 Token Usage: unavailable in current environment

13.1 Open vs Closed

The contrast between OpenCode, Oh-My-OpenCode, and Claude Code begins with a foundational architectural question: should a coding agent be built as an open system or a closed system? In software architecture, “open” does not merely mean source-visible. It means the system is designed to be modified, forked, repurposed, and re-composed by third parties. “Closed” does not merely mean proprietary. It means the system is vertically integrated: model behavior, tool semantics, permission policy, UX, and product surface are optimized together under one controlling vendor. These two directions produce very different strengths.

OpenCode is open in the strongest practical sense. It is not just inspectable; it is model-agnostic, provider-agnostic, and extension-friendly. A user can attach different model providers, change the tool surface, alter prompt assembly, or fork the project entirely and take it in a new direction. That openness matters because coding agents are still evolving quickly. In an immature field, the ability to experiment is a strategic asset. OpenCode functions as infrastructure: a host that others can build on. That is exactly what OMO proves. OMO is not a small plugin that tweaks a few prompts. It is a large orchestration layer built on top of the host without replacing the host’s core identity. That kind of fork-friendly evolution is only possible when the base system exposes real control surfaces.

Claude Code follows the opposite philosophy. It is relatively closed, but closed in a productive rather than purely restrictive sense. Anthropic can co-design the model, the tool contract, the permission classifier, the compaction strategy, and the UI affordances as one product. This is what may be called model-tool co-optimization: tools are not designed in isolation and then handed to arbitrary models; they are shaped around the strengths, failure modes, and instruction-following patterns of the target model family. In classical systems language, this is closer to a tightly coupled stack than a modular platform.

That tight coupling produces real advantages. Claude Code can make stronger assumptions about tool output formatting, approval flow, error handling, and long-horizon agent behavior because the vendor owns the full pipeline. When the same organization controls both the model and the execution environment, optimization depth increases. Fewer unknowns exist at runtime. Safety systems can be tuned against observed internal model behavior. UX can be simplified because the product does not need to account for twenty providers with twenty slightly different capabilities. The closed design therefore trades ecosystem freedom for consistency.

The downside is the classic risk of vendor lock-in. In computer systems, lock-in means that the cost of leaving a platform becomes high because workflows, habits, APIs, configuration assumptions, and extension surfaces are all tied to one vendor’s choices. Claude Code users gain polish and integration depth, but they also inherit Anthropic’s boundaries: supported models, supported abstractions, supported extension mechanisms, and supported safety posture. If Anthropic does not expose a hook, the user cannot simply add it. If Anthropic changes a behavior, downstream users adapt. This is not accidental; it is the economic logic of a productized closed platform.

OpenCode makes the opposite trade. Because it must remain portable across models and environments, it cannot optimize as aggressively for a single stack. Portability here means more than “runs elsewhere.” It means the architecture avoids assumptions that only hold for one model vendor. The benefit is resilience and adaptability. If the frontier model changes, OpenCode can change providers. If an organization wants local models, sovereign infrastructure, or custom compliance layers, OpenCode is structurally suited to that. The cost is that the framework designer cannot squeeze every last drop of performance from one specific model-tool pairing.

OMO occupies an interesting intermediate position. It inherits OpenCode’s openness, but it uses that openness to push the system toward a much more opinionated orchestration regime. OMO is therefore open as a platform but internally forceful as a philosophy. It does not close the ecosystem in the Claude Code sense; instead, it closes over a specific worldview: multi-agent delegation, skill-based specialization, prompt-enforced operational discipline, and extreme autonomy. In other words, OpenCode is open infrastructure, OMO is open infrastructure with a strong doctrine, and Claude Code is a commercially closed integrated product.

This distinction also reveals an important design principle for future agents. Openness is best when the field is unstable, when experimentation matters, and when users need architectural sovereignty. Closed integration is best when the goal is product smoothness, predictable safety, and deep optimization between components. Neither side is universally superior. The real question is where one wants flexibility to live. OpenCode places flexibility in the hands of developers and plugin authors. Claude Code places optimization authority in the hands of the product vendor. OMO shows that openness can still support ambitious, highly opinionated systems—provided the host platform exposes enough structural seams.

The broader tradeoff, then, is portability versus optimization depth. Open systems maximize portability, inspectability, and evolutionary potential. Closed systems maximize tuning, coherence, and product-grade reliability. For an engineer building the next generation of agents, the lesson is not “always be open” or “always integrate vertically.” The lesson is to decide deliberately which layer should remain portable, which layer should be co-optimized, and who gets to control that boundary.

Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 13 — Architectural Philosophy Comparison Model: openai/gpt-5.4 Generated: 2026-04-01 Token Usage: unavailable in current environment

13.2 Simplicity vs Complexity

All three systems claim, in different ways, to help developers do complicated work with less friction. But they embody very different beliefs about where complexity should live. Should complexity be visible and structured? Hidden behind a clean interface? Or embraced as the necessary cost of extreme capability? This question is central to architectural philosophy because complexity is never eliminated; it is only moved.

OpenCode represents a relatively balanced position. It is not minimal in the sense of being tiny, but it is moderately complex and generally layered in a legible way. Provider abstraction, tool registration, session management, MCP integration, permission handling, and the plugin surface are distinct subsystems with reasonably clear boundaries. From an engineering perspective, this matters because systems become maintainable when their complexity is partitioned. Partitioning means that a developer can modify one subsystem without fully simulating the entire codebase in their head. OpenCode’s architecture therefore reflects a classic software engineering virtue: not the absence of complexity, but the disciplined organization of it.

Claude Code presents a different pattern: external simplicity resting on substantial internal complexity. To the end user, the product often feels simpler than OpenCode. There is a polished CLI/TUI experience, stronger defaults, integrated safety behavior, and less visible configuration burden. But this smoothness is not evidence of architectural simplicity. In fact, it often indicates the opposite. The product absorbs complexity so the user does not have to. A good symbol of this is the large main.tsx entry surface—roughly 4,691 lines in the version examined. A very large central file is not automatically bad, but it often signals accumulated orchestration logic, UI state coupling, and product-driven integration pressure. In other words, Claude Code looks simple because Anthropic internalized a lot of the mess.

This is a common pattern in commercial systems. When a vendor owns the whole experience, it can hide branching logic, heuristics, fallbacks, permission paths, compaction modes, and rendering strategies behind a cleaner facade. That is beneficial for users. However, it also means the internal codebase may become dense, because product simplicity and implementation simplicity are often opposites. A calculator looks simple because enormous complexity has already been compressed into the circuitry. Claude Code follows that product tradition.

OMO is the most explicit embrace of complexity. Its scale—around 129K lines of code in the period under discussion—is not accidental excess. It reflects a deliberate attempt to support very high degrees of autonomy, multi-agent delegation, prompt engineering infrastructure, skill loading, hook multiplexing, background tasks, session continuation, and operational doctrine. OMO’s core claim is that serious autonomy is not a small wrapper around a single-agent chat loop. It requires orchestration machinery. In that sense, OMO treats complexity as a capability investment.

This is where an important computer science distinction matters: essential complexity versus accidental complexity. Essential complexity comes from the nature of the problem itself. Accidental complexity comes from poor abstraction, bad coupling, or incidental implementation choices. OMO’s defenders would argue that much of its complexity is essential because extreme autonomy really does require more state, more policies, more coordination, and more recovery logic. Critics would answer that any 129K-LOC orchestration layer risks accumulating accidental complexity as well. Both views can be true. A system can pursue a genuinely hard problem and still pay too much architectural tax along the way.

OpenCode sits closer to the center of this spectrum. It is complicated enough to be useful as a general platform, but it still preserves conceptual clarity. Claude Code moves complexity inward so the interface becomes calmer. OMO moves complexity upward into orchestration because it values power, continuity, and agent discipline more than conceptual minimalism. These are not just implementation differences; they are philosophical commitments about who should bear the cognitive cost.

There is also a deeper UX implication. Simplicity for the user can produce opacity for the modifier. Claude Code users benefit from a straightforward product surface, but external developers cannot easily reshape the system. OpenCode is less “magic,” which makes it easier to understand and extend. OMO is the least gentle for newcomers, but potentially the most rewarding for expert operators who want a programmable autonomy framework rather than a streamlined assistant.

The chapter’s key conclusion is that simplicity should be judged at three levels, not one. First, there is interface simplicity: how easy the system is to use. Second, there is architectural simplicity: how understandable the code and module boundaries are. Third, there is operational simplicity: how much ongoing effort is required to steer the system successfully. Claude Code optimizes interface simplicity. OpenCode gives strong weight to architectural simplicity. OMO sacrifices both kinds of simplicity more readily in pursuit of operational power under hard tasks.

For future agent design, the lesson is clear. Do not ask whether a system is “simple” or “complex” in the abstract. Ask where the complexity resides, who pays for it, and whether that cost buys real capability. Good architecture does not always minimize complexity. Often it relocates complexity to the layer best equipped to manage it.

Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 13 — Architectural Philosophy Comparison Model: openai/gpt-5.4 Generated: 2026-04-01 Token Usage: unavailable in current environment

13.3 Control vs Autonomy

Perhaps the most consequential philosophical divide among coding agents is the balance between control and autonomy. In agent systems, control refers to how strongly humans, policies, or guardrails constrain execution. Autonomy refers to how much the system can decide and act without interruption. This is not a moral spectrum from good to bad. It is an engineering spectrum with tradeoffs in speed, safety, accountability, and task completion quality.

Claude Code is the clearest example of a safety-first, human-in-the-loop design. Human-in-the-loop is a standard systems term meaning a person remains part of the decision cycle for at least some important actions. Claude Code’s permission modes, classifiers, dangerous-pattern detection, and sandboxing all aim to preserve that relationship even when the agent appears highly capable. The product is willing to automate many low-risk steps, but it treats bounded autonomy as the acceptable commercial form. The underlying philosophy is straightforward: an agent should move fast only inside carefully engineered limits.

This is a rational stance for a commercial product. Enterprises need auditability, predictable failure surfaces, and confidence that the agent will not silently cross a boundary. In such environments, pauses, approvals, and safety checks are not pure friction; they are part of the trust contract. Claude Code therefore presents autonomy as a managed service, not as a pure ideal. It is always asking, implicitly, “How much freedom can we safely sell?”

OpenCode exposes a more user-centered control philosophy. It does not insist on one global answer. Instead, it gives the user and developer more authority to choose the autonomy level they want. This is a meaningful distinction. Claude Code tends to present autonomy within a product-defined operating envelope. OpenCode is closer to a toolkit: the host framework provides tools, sessions, plugins, permissions, and providers, but the final orchestration style is more open-ended. If one team wants careful approvals and another wants aggressive automation, the architecture is flexible enough to accommodate both.

That flexibility reflects a deeper commitment: autonomy should be configurable rather than prescribed. In control theory terms, OpenCode exposes more of the control surface to the operator. A control surface is the set of levers through which system behavior can be tuned. By keeping those levers visible, OpenCode empowers experimentation, but it also shifts responsibility outward. The user gets freedom, yet also bears more burden for defining safe and effective operating modes.

OMO pushes this philosophy much further. Its implicit slogan could be summarized as: human intervention is a failure signal. This is an extreme autonomy view. It does not mean humans are irrelevant; it means the ideal workflow is one in which the system decomposes, delegates, verifies, and resumes on its own so effectively that repeated human steering becomes evidence of architectural insufficiency. OMO therefore treats interruption not merely as inconvenience, but as a design bug to be reduced.

This is a radical and important departure from most commercial agent products. OMO’s prompt discipline, todo enforcement, background agents, session continuation, and specialized subagents all exist to keep work moving without constant re-guidance. In classical AI planning language, OMO is trying to raise the agent from a reactive assistant toward a more persistent autonomous executor. That ambition explains why it needs so much orchestration logic. Extreme autonomy is not just “let the model do more.” It requires recovery machinery, delegation policies, role separation, and continuity systems.

Of course, this philosophy carries obvious risks. High autonomy amplifies errors, makes invisible drift more dangerous, and raises the cost of poor tool usage. When a highly autonomous system misunderstands intent, it may act confidently in the wrong direction for a long time. This is why autonomy without verification becomes reckless. OMO’s strongest form is not blind automation; it is automation combined with structured operational discipline. Whether that discipline is sufficient is an empirical question, but philosophically it is attempting to replace ad hoc human interruption with engineered internal process.

The comparison therefore reveals three distinct positions. Claude Code says: autonomy is valuable, but only when subordinated to safety architecture and product trust. OpenCode says: autonomy is a user-selectable design variable. OMO says: autonomy should be pushed as far as architecture can responsibly support, because repeated human rescue does not scale.

For future agent builders, the lesson is subtle. The wrong question is “Should agents be autonomous?” Of course they should, to some degree. The right questions are: autonomous for whom, under what constraints, with what rollback paths, and with what verification loops? Control and autonomy are not opposites in a mature system. Good control enables useful autonomy. Claude Code achieves that by constraining the agent. OpenCode achieves it by exposing configuration power. OMO tries to achieve it by building a heavier internal operating system for autonomous work.

The best long-term architecture may not be any one of these in pure form. It may be a system that supports gradient levels of autonomy, matched to task risk, user expertise, and environmental guarantees. But if one wants to understand the philosophical extremes of the current generation, these three systems already map the territory clearly.

Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 13 — Architectural Philosophy Comparison Model: openai/gpt-5.4 Generated: 2026-04-01 Token Usage: unavailable in current environment

13.4 General vs Specialized

The final philosophical axis in this chapter is the tension between generality and specialization. General systems aim to work for many users, tasks, or environments without deep tailoring. Specialized systems deliberately optimize for a narrower audience or operating style. In computer science, this is closely related to the tradeoff between broad applicability and local optimality: a general solution covers more cases, while a specialized one may perform much better on a smaller subset.

Claude Code is best understood as pursuing commercial generality. Anthropic is not building only for AI researchers, plugin authors, or expert agent engineers. It is building for a broad market of developers who want an agent that can help with coding tasks in a reasonably consistent way. That requirement pushes the architecture toward standardization. Defaults must be safe enough for ordinary use. UX must be understandable without studying the codebase. Permission systems must work across many teams, not just high-trust power users. Tool semantics must be broad and stable. In short, Claude Code is trying to be general across developer personas.

This form of generality is not the same as technical neutrality. Claude Code is not especially general at the model layer because it is tied to Anthropic’s vertically integrated product logic. But it is general at the market layer. It aims to be the default professional agent that “just works” for many developers, across many workflows, with minimal ceremony. Its architectural choices therefore prioritize consistency, predictable support, and managed extensibility over maximal adaptability.

OpenCode pursues a different kind of generality: technical generality. Its core promise is not that every developer will have the easiest onboarding experience. Its promise is that the system can host many models, providers, deployment styles, and extension strategies. This is generality at the substrate level. OpenCode tries to be a broad platform rather than a narrowly optimized product. It is general not because it hides differences, but because it can accommodate them.

That distinction matters. A platform can be technically general yet socially narrower, because it may appeal most strongly to advanced users who value control. Likewise, a commercial product can be socially general yet technically narrower, because it deliberately constrains certain axes to make the overall experience more reliable. Claude Code and OpenCode therefore embody two different meanings of the word “general.” One is market-facing; the other is systems-facing.

OMO then moves decisively toward specialization. It is not trying to be the default coding agent for everyone. It behaves more like a power tool for senior engineers who are comfortable with orchestration, delegation, custom operational doctrine, and high-autonomy workflows. Its structure assumes a user who values leverage more than immediate simplicity. Specialization appears everywhere: the prompt regime, the task discipline, the subagent roles, the continuation model, the hook system, and the skill ecosystem. OMO is specialized not around one domain like “frontend” or “database work,” but around one working style: serious engineering through structured autonomous execution.

This is an important nuance. Specialization can target a domain, a user segment, or a mode of work. OMO specializes primarily in the third sense. It is optimized for people who want an agent that behaves less like a conversational assistant and more like an extensible autonomous engineering apparatus. That makes it extremely powerful for the right audience and potentially overbuilt for others.

There is also a strategic implication. General systems tend to dominate distribution because they serve more users. Specialized systems often dominate intensity of use among experts because they solve a smaller set of problems much better. Claude Code’s commercial generality gives it product reach. OpenCode’s technical generality gives it ecosystem reach. OMO’s specialization gives it depth among operators who want more than ordinary chat-based coding help.

From an architectural standpoint, specialization often permits stronger assumptions. A specialized system can assume more about user tolerance, workflow discipline, and desired outcomes. This allows sharper abstractions and more opinionated defaults. General systems must remain broader, which often makes them less aggressive. For example, OMO can assume interest in multi-agent delegation and todo rigor; Claude Code cannot assume every mainstream developer wants that degree of process. OpenCode can assume technical curiosity and configurability matter; a mass-market product cannot rely on that assumption.

The ideal future may combine these layers. One can imagine a technically general platform, supporting multiple models and deployment options, with commercially general default modes for mainstream users, plus specialized operating profiles for expert operators. In fact, the three systems studied here almost form that layered stack already: OpenCode as general substrate, Claude Code as general product benchmark, and OMO as specialized high-agency operating model.

The main lesson is that “general” is not automatically better than “specialized,” nor vice versa. The correct choice depends on what must scale: number of users, number of providers, or depth of task execution. Claude Code scales across users. OpenCode scales across technical environments. OMO scales capability for a narrower, more demanding class of operator. Understanding that difference is essential for anyone trying to design the next generation of coding agents.

Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 14 — Tool System Deep Comparison Model: openai/gpt-5.4 Generated: 2026-04-01 Token Usage: unavailable in current environment

14.1 Tool Count and Coverage

One of the easiest ways to compare coding agents is to count their tools. On the surface, the numbers seem to tell a simple story: OpenCode exposes roughly 20 core tools, OMO extends the surface to around 26, and Claude Code reaches roughly 61. But raw tool count is one of the most misleading metrics in agent design. A larger catalog may reflect better coverage, but it may also reflect fragmentation, redundancy, or product sprawl. In fact, one of the most important design principles to emerge in modern agent engineering is that bloated tool sets are a primary failure mode.

Why are bloated tool sets dangerous? Because every tool is not only a capability; it is also a cognitive branch in the agent’s action space. The more tools a model sees, the more decisions it must make about which tool to select, when to select it, how to parameterize it, and how to interpret the result. This creates a larger search space for planning and a larger error surface for misuse. In other words, tool count does not grow linearly in cost. It increases decision complexity. Decision complexity is not a standard textbook phrase, but it is easy to understand: each extra option raises the number of possible wrong turns.

OpenCode’s smaller tool set reflects a fairly disciplined platform philosophy. Its tools cover the essential actions required for a useful coding agent: filesystem operations, search, patching, shell execution, web access, MCP-related capabilities, and core session interactions. This coverage is broad enough to make the system powerful, but not so broad that the tool menu itself becomes the architecture. OpenCode therefore leans toward compositional coverage: a moderate number of well-defined primitives that can be combined into many workflows.

OMO adds tools, but it does so with a different purpose than simple feature accumulation. Its move from about 20 tools to 26 is not mainly about broadening environmental reach; it is about deepening orchestration power. Background task management, todo tracking, skill loading, session inspection, AST-aware search and rewrite, richer LSP operations, and agent delegation all expand the system in service of a more autonomous workflow. In other words, OMO’s tool growth is vertically targeted. It is not merely “more tools”; it is “more machinery for the operating model OMO wants.”

Claude Code’s much larger set—around 61 tools in the analyzed snapshot—suggests a different strategy. As a commercial product serving diverse workflows, Claude Code offers more directly exposed capabilities: browser-oriented tools, REPL execution, delegation and message passing, permission-aware shell/file operations, and various product-specific utilities. This larger inventory helps reduce the need for external plugins or hand-built extensions in common cases. It increases product completeness. But it also raises an architectural question: at what point does coverage turn into overload?

The answer depends on coverage quality, not just coverage quantity. Coverage quality asks whether the tools form a coherent basis set. A basis set is a borrowed term from mathematics and physics: a small family of elements that can generate many outcomes through combination. In tool design, a good basis set means a compact collection of primitives with high recombination value. OpenCode is strongest on this dimension. Its tools tend to be broad primitives. Claude Code is stronger on out-of-the-box convenience. OMO sits between them, keeping many of OpenCode’s primitives while adding orchestration-specific operators.

Another useful distinction is between horizontal coverage and vertical coverage. Horizontal coverage means spanning many different task types: web, shell, files, search, networked services, delegation, and more. Vertical coverage means going deeper within one category, such as code intelligence or orchestration. Claude Code emphasizes horizontal breadth because it needs to support a wide range of mainstream developer workflows. OMO emphasizes vertical depth in the autonomy and composition layers. OpenCode aims for a middle ground: enough horizontal breadth to be a foundation, without yet becoming a giant catalog.

This is why the slogan “more tools = better agent” is usually false. Too many tools can make a model indecisive, encourage shallow one-off tools instead of reusable abstractions, and increase prompt overhead. Prompt overhead matters because every tool definition consumes context window budget. A system with dozens of narrowly differentiated tools may waste tokens teaching the model distinctions it should not need to care about. Conversely, too few tools can force awkward workarounds and reduce capability. The art lies in finding the smallest surface that still covers real workflows.

Seen through that lens, the three systems represent three philosophies. OpenCode asks: what is the minimum sufficiently powerful toolbox for an open agent platform? OMO asks: what additional tools are justified if the goal is extreme autonomous orchestration? Claude Code asks: what tool breadth is necessary for a commercial product to feel complete and self-contained? None of these questions produce the same numeric answer.

The practical lesson for future agent designers is clear. Count tools, but do not stop there. Ask whether the tool set is legible, whether the primitives compose cleanly, whether the prompt cost remains reasonable, and whether added tools reduce or increase decision burden. Quantity matters, but only as a crude proxy. The better metric is whether the agent can reliably solve real tasks with minimal confusion. By that standard, a disciplined 20-tool system may outperform a noisy 60-tool one—and a carefully extended 26-tool orchestration stack may outperform both on long-horizon autonomous work.

Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 14 — Tool System Deep Comparison Model: openai/gpt-5.4 Generated: 2026-04-01 Token Usage: unavailable in current environment

14.2 Unique Tools

The deepest differences among tool systems often do not appear in the shared basics—read, write, edit, search, bash, web—but in the unique tools that reveal each project’s strategic intent. A unique tool is important not because no one else could build it, but because its presence shows what the system treats as first-class behavior rather than an afterthought. In that sense, unique tools are architectural fingerprints.

For OpenCode, one of the most revealing tools is apply_patch. At first glance, patch application may seem like a mundane editing convenience. In practice, it is a powerful design choice. Instead of forcing the agent to rewrite whole files or issue many line-by-line edit operations, apply_patch lets the system express file changes as structured diffs. This matters because diffs align well with how software engineers reason about change: additions, deletions, updates, and moves. They are more compact than full-file rewrites, easier to review mentally, and often safer to execute. OpenCode’s inclusion of apply_patch shows a bias toward code-native manipulation rather than purely text-native manipulation.

OMO’s most distinctive tools point in a different direction. Its AST-grep search and replace tools are especially notable. AST stands for Abstract Syntax Tree, the tree representation that parsers build to describe program structure. An AST-aware tool does not merely search raw text; it searches code as syntax. That means it can match patterns like function calls, imports, or declarations with much greater structural precision than ordinary string search. This is significant because many agent failures come from brittle text matching. By giving the agent AST-level operations, OMO upgrades code manipulation from approximate text handling toward semantic structure handling.

The broader implication is that OMO sees the coding agent not just as a chat interface with files, but as a programmable engineering worker that should have access to stronger code intelligence primitives. Its unique tools therefore tend to enhance the depth of action: session management, background task output retrieval, agent spawning, todo persistence, AST rewriting, and more granular LSP access. These are not tools for casual convenience; they are tools for disciplined, long-horizon execution.

Claude Code’s unique tools tell yet another story. The WebBrowserTool indicates that Anthropic wants browser-mediated interaction to be part of the native execution model, not a plugin afterthought. The REPLTool reveals a desire for interactive code execution inside controlled language-specific or runtime-specific loops. A REPL, or Read-Eval-Print Loop, is the classic interactive programming environment in which code is entered, executed, and inspected incrementally. That is a very different capability from plain shell execution. It supports experimentation, fast feedback, and language-aware interaction. Then there is SendMessageTool, which points directly toward inter-agent or cross-context coordination. It suggests that message passing itself is being productized as an explicit operation.

These Claude Code tools matter because they reflect a product that wants to absorb more workflows natively. Instead of expecting users to glue together browser automation, interactive execution, and delegation through external surfaces, the product chooses to recognize them as core modalities. This is consistent with Claude Code’s commercial philosophy: broad out-of-the-box coverage, integrated safety and UX, and richer first-party behaviors.

Another way to interpret unique tools is to ask what hidden assumptions they encode. apply_patch assumes that structured file diffs are a privileged representation of engineering work. AST-grep assumes that syntax-aware operations are worth exposing directly to the model. WebBrowserTool assumes that browsing is integral to software work. REPLTool assumes that interactive execution loops deserve their own abstraction rather than being collapsed into shell commands. SendMessageTool assumes that agent-to-agent or context-to-context communication is central enough to deserve a first-class channel.

These are not small assumptions. Every unique tool changes the mental model available to the agent. A system limited to bash and text editing nudges the model toward generic procedural behavior. A system with AST operations nudges it toward structural code reasoning. A system with explicit message passing nudges it toward orchestration. Unique tools therefore do more than expand functionality; they shape the style of cognition the system can express.

There is also a cautionary lesson here. Unique tools are valuable only if they earn their conceptual weight. A tool that is highly specific but rarely needed can bloat the action space. A tool that exposes a genuinely recurrent and high-value operation can transform agent quality. OpenCode’s apply_patch earns its place because patch-based edits are frequent and fundamental. OMO’s AST-grep tools earn their place because structural search and rewrite are powerful upgrades for code-focused work. Claude Code’s browser, REPL, and message-passing tools earn their place because they correspond to recurrent workflows in practical software engineering.

For future agent systems, the best unique tools will likely come from identifying actions that are simultaneously high-frequency, high-value, and awkward to express through generic primitives. That is exactly where architectural differentiation becomes meaningful. Shared tools define the baseline. Unique tools reveal the soul of the system.

Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 14 — Tool System Deep Comparison Model: openai/gpt-5.4 Generated: 2026-04-01 Token Usage: unavailable in current environment

14.3 LSP Integration Depth

If filesystem tools tell us how an agent touches code, LSP integration tells us how deeply it understands code structure. LSP stands for Language Server Protocol, a standardized protocol originally designed to let editors communicate with language-aware backends. Through LSP, an editor can ask for definitions, references, diagnostics, hover information, rename support, symbols, and more. When coding agents integrate LSP deeply, they move from text manipulation toward semantically informed code navigation.

OpenCode includes relatively basic LSP support. That is already meaningful. Even lightweight language-aware operations are a major improvement over pure grep-and-edit behavior, especially in medium or large codebases. Basic LSP support helps the agent avoid some of the classic weaknesses of text-only workflows: missing symbol relationships, misunderstanding imports, or editing only one of many connected references. In OpenCode, LSP is important, but it does not dominate the architecture. It is one useful capability among several in a general-purpose open platform.

OMO pushes much further. In the tool snapshot under discussion, it exposes around 8 LSP-oriented tools, including operations such as references, rename, diagnostics, hover, definition, prepare-rename, symbol search, and related file/workspace analysis. This matters because OMO does not treat code intelligence as one monolithic black box. Instead, it decomposes LSP capability into several precise operations. That decomposition gives the agent finer control. It can ask exactly for definitions when tracing code flow, references when estimating impact, diagnostics before running heavier builds, and prepare-rename before attempting a global rename.

This granularity aligns with OMO’s broader philosophy of disciplined autonomous execution. The more precisely a system can query code intelligence, the less it must rely on brittle inference. Consider rename as an example. A text-only rename is notoriously dangerous because symbols may share names across unrelated scopes. An LSP-assisted rename, by contrast, can use the language server’s understanding of bindings and references. Similarly, diagnostics provide structured compiler- or analyzer-level feedback without always requiring a full build. In long-horizon autonomous work, these capabilities are force multipliers.

Claude Code appears to take a different route, exposing a more unified LSPTool abstraction rather than many narrow, separately named LSP tools. This is an elegant product decision. A unified tool can reduce prompt surface area and simplify tool selection for the model. Instead of making the model choose among many code-intelligence operations explicitly, the system can encapsulate that complexity behind a single interface. This fits Claude Code’s general preference for externally simpler product surfaces. Internally, of course, the unified tool may still route to many sub-operations.

The tradeoff here is subtle. OMO’s multi-tool LSP design offers explicitness, precision, and inspectability. Claude Code’s unified LSP tool offers abstraction, compactness, and potentially easier tool selection. OpenCode remains lighter-weight, preserving useful code intelligence without building a whole semantic operating layer around it. The right choice depends on how central language-aware reasoning is to the system’s identity.

This brings us to the notion of integration depth. Integration depth is not just “does the system have LSP?” It is a composite property involving at least four dimensions. First, breadth of operations: how many distinct semantic queries are supported? Second, workflow centrality: are those operations peripheral helpers or normal steps in the agent’s planning loop? Third, granularity of access: does the agent get one generic gateway or multiple sharp instruments? Fourth, fallback discipline: does the system know when to prefer LSP over text search and when to degrade gracefully if LSP is unavailable?

By that measure, OMO currently shows the deepest and most explicit LSP integration of the three. OpenCode includes LSP as an important supporting capability. Claude Code likely integrates semantics deeply as well, but packages that depth behind a more unified product abstraction. The difference is not one of seriousness, but of architectural style: exposed semantic primitives versus consolidated semantic service.

There is a larger lesson here for agent design. Pure text tooling is sufficient for small or disposable tasks, but it becomes fragile as codebases grow. Semantic tooling—LSP, ASTs, static analyzers—reduces ambiguity and makes agents more reliable. However, exposing too much semantic detail as separate tools can itself create choice overload. The challenge is therefore to provide enough semantic power without making the tool layer unreadable.

In the long run, the strongest coding agents will almost certainly combine multiple semantic layers: LSP for symbol intelligence, AST tooling for structure-preserving transforms, diagnostics for early feedback, and build/test systems for final verification. OpenCode, OMO, and Claude Code each point toward that future, but they differ in how explicitly they surface the pieces. That difference is one of the clearest windows into their broader design philosophies.

Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 14 — Tool System Deep Comparison Model: openai/gpt-5.4 Generated: 2026-04-01 Token Usage: unavailable in current environment

14.4 Tool Composition Patterns

The most mature way to compare tool systems is not by looking at individual tools in isolation, but by asking how tools are composed into higher-level workflows. Tool composition is the discipline of combining small operations into repeatable patterns that solve larger tasks. In coding agents, composition patterns often matter more than any single tool, because real software work is almost never one-step execution. It is investigate, plan, inspect, modify, verify, delegate, summarize, and continue.

The clearest example is the emergence of the Task tool as an orchestration primitive. A primitive is a low-level building block from which more complex behavior can be constructed. Traditional coding agents treated tools mainly as environment access methods: read file, run shell, search code. Newer systems increasingly treat some tools as workflow operators. A Task tool is not merely a way to interact with the environment; it is a way to create another unit of work. That is a major conceptual shift. It turns the tool layer into a control layer.

OMO makes this shift especially explicit. Its background agent spawning, task output retrieval, session continuation, and todo discipline all combine into a composition pattern where the agent can offload subproblems, continue working elsewhere, and rejoin results later. This is orchestration through tools, not just prompting. The tool system is therefore partly functioning as an agent operating system: one tool creates work, another tracks it, another retrieves output, another manages context continuity. This is why OMO’s tools often look unusual when compared with ordinary coding assistants. They are less about local actions and more about process structure.

Claude Code also points in this direction through its delegation and message-oriented capabilities. When a system includes tools such as SendMessageTool or task-oriented coordination mechanisms, it acknowledges that large tasks may need explicit decomposition and mediated communication. The difference is stylistic. Claude Code tends to package composition in a more productized and unified way, whereas OMO exposes more of the orchestration surface directly to the model and operator.

OpenCode, meanwhile, provides a more classic compositional substrate. Its tools are broader primitives that can be chained into workflows, but the orchestration doctrine is lighter. This is consistent with OpenCode’s role as a foundation rather than a fully opinionated autonomous stack. It enables composition without fully prescribing it. Developers can build richer orchestration layers on top—which is exactly what OMO does.

Another important composition pattern is the rise of the Skill tool. A skill is more than a static prompt snippet. In mature systems, skills become packaged operational knowledge: instructions, constraints, step-by-step playbooks, sometimes bundled with embedded MCP servers or domain-specific utilities. This evolution is significant. It means tool composition is no longer just tool plus tool plus tool; it can become tool plus procedural memory. Procedural memory is the know-how of how to do something, as distinct from factual memory about what something is.

OMO’s skill system demonstrates this strongly. Skills can begin as reusable prompt templates, but they grow into richer operational modules that teach the agent how to approach a class of tasks. With embedded MCPs, a skill may even bring its own external capabilities. That blurs the line between prompt, policy, and tool. A skill is becoming a packaged micro-runtime: part instructions, part workflow, part capability extension. This is one of the most important innovations in agent extensibility because it allows expertise to be distributed without hardcoding every behavior into the core system.

Composition patterns also reveal another principle: the best tool systems minimize direct dependence on any one tool. Instead, they create small, reusable paths. For example, a robust coding workflow may repeatedly use a pattern like search → read → analyze → patch → diagnostics → test. A more advanced orchestration workflow may use task-spawn → continue-local-work → retrieve-output → integrate-results → verify. These patterns become more stable than any particular tool implementation. Good architecture therefore encourages reusable compositions rather than one-off tool explosions.

This is why the earlier warning about tool bloat matters so much. A bloated tool set makes composition harder by fragmenting the action space. A disciplined tool set makes composition easier because primitives can be combined predictably. The highest form of tool design is not building ever more isolated tools. It is designing tools whose interactions create reliable higher-order behavior.

The long-term implication is that coding agents may increasingly resemble layered operating environments. At the bottom sit raw capability tools: filesystem, shell, search, web, code intelligence. Above them sit semantic tools: AST, LSP, diagnostics. Above those sit orchestration tools: task creation, session management, message passing, continuation. Alongside them sit skills: reusable procedural modules that teach strategy. OpenCode supplies a strong foundational layer, OMO builds a rich orchestration and skill layer, and Claude Code integrates many of these patterns into a commercial, product-centered experience.

So when comparing tool systems, the decisive question is no longer “What tools do you have?” It is “What kinds of workflows can your tools compose into?” That question cuts much closer to real agent capability. And on that axis, the rise of Task and Skill as orchestration primitives may be one of the most important shifts in modern coding-agent architecture.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 15 — Agent Orchestration Comparison
Token Usage: ~4,900 input + ~1,180 output

15.1 Orchestration Pattern Taxonomy

When people say “multi-agent system,” they often blur together several very different coordination designs. That is a mistake. The important question is not whether more than one agent exists, but how responsibility, authority, and context move among them. Across OpenCode, Oh-My-OpenCode (OMO), and Claude Code, five orchestration patterns are especially useful: Orchestrator-Worker, Pipeline, Swarm, Mesh, and Hierarchical.

The Orchestrator-Worker pattern is the most familiar. One primary agent decomposes a task, delegates subproblems to workers, then integrates results. OMO uses this pattern heavily. Sisyphus-style executor prompts, delegate-task tooling, background subagents, and parent-session notification all push toward a world where one session remains accountable while workers explore, verify, or research. Claude Code also supports this pattern through Task tools, DreamTask-style background execution, and specialized agent invocations. OpenCode by itself uses it only lightly. It has agents and subtask-oriented commands, but it does not impose a rich supervisory runtime around them.

The Pipeline pattern means work flows through ordered stages, where each stage transforms the artifact before passing it onward. In classic systems terms, a pipeline is not about many minds talking at once; it is about deterministic sequencing. OpenCode naturally fits pipeline thinking because its core architecture is simple and compositional: prompt assembly, tool call, observation, next step. OMO adds more explicit runtime phases through hooks: session preprocessing, message transformation, tool guards, continuation logic, and post-task notification. Claude Code also exhibits pipeline behavior in compaction, permission classification, hooks, slash commands, and context rebuilding. All three systems therefore use pipelines, but they use them at different layers. OpenCode pipelines execution primitives; OMO pipelines policy; Claude Code pipelines safety and product UX.

The Swarm pattern describes multiple agents working in parallel with relatively weak central control. A swarm is useful when tasks can be explored independently: code search, design alternatives, broad research, parallel hypothesis testing. OMO comes closest to a practical swarm among the three because it explicitly supports concurrent background agents, model-based concurrency limits, and role-specific prompts for wide exploration. Claude Code can approximate swarm behavior through parallel tasks and background work, but the experience is more product-curated than architecture-exposed. OpenCode alone does not natively feel swarm-oriented; you can build it, but the host does not strongly encourage it.

The Mesh pattern is rarer. In a mesh, agents exchange information laterally rather than always routing through a single boss. Peer nodes can consult, critique, or update each other. None of the three systems are pure mesh systems. That is not accidental. Mesh designs are flexible, but they are expensive, hard to debug, and prone to context drift. Still, there are partial mesh tendencies. OMO’s wisdom accumulation and reusable session knowledge give later agents access to earlier discoveries, creating indirect peer-to-peer influence through shared artifacts. Claude Code’s task ecosystem can create weak mesh effects if multiple task threads summarize into common context. OpenCode, being thinner, leaves mesh implementation mostly to external builders.

The Hierarchical pattern combines delegation with multiple authority levels. Think manager, specialist lead, and worker. This matters when tasks differ in abstraction level: planning, implementation, validation, and external research should not always sit in one context. OMO is the clearest hierarchical system here. It has layered agents, semantic categories, enforcement hooks, continuation logic, and role constraints. Some agents are read-only, some are exploratory, some are knowledge-oriented, and some are execution-heavy. Claude Code has a softer hierarchy: the main coding agent can spawn task agents, coordinate specialized work, then re-enter a controlling role. OpenCode mostly provides the substrate for hierarchy rather than a strongly opinionated hierarchy itself.

A useful comparison is this:

Pattern	OpenCode	OMO	Claude Code
Orchestrator-Worker	Basic support	Strong, explicit	Strong, productized
Pipeline	Core execution style	Core policy style	Core safety/UX style
Swarm	Weak native support	Strongest of three	Moderate
Mesh	Builder-implemented	Partial via wisdom sharing	Partial via task summaries
Hierarchical	Possible, not emphasized	First-class	Present, less formal

The biggest architectural distinction is not “single-agent versus multi-agent.” It is whether the system provides an explicit control plane for orchestration. OpenCode is intentionally minimal and composable. It gives developers a good base loop, tool system, plugin surface, and agent concept, but it does not overdetermine the orchestration style. OMO is the opposite: it turns orchestration into a visible runtime discipline. Claude Code sits between them. It exposes enough multi-agent behavior to improve outcomes, but it packages the complexity in a more curated product form.

This is why the five-pattern taxonomy matters. Without it, people make shallow claims such as “System A supports subagents” or “System B is multi-agent.” Those statements hide the real design question: which coordination topology is optimized, and at what cost?

If the goal is simplicity, reproducibility, and hackability, OpenCode’s light orchestration bias is attractive. If the goal is persistent supervision, background delegation, and policy-rich coordination, OMO is the most ambitious design. If the goal is commercially polished multi-agent assistance with strong safety framing, Claude Code is the most balanced implementation.

The broader lesson for agent designers is straightforward: no single orchestration pattern wins universally. Orchestrator-Worker is best for accountable decomposition. Pipelines are best for reliable stage control. Swarms are best for breadth. Meshes are best when peer critique truly matters. Hierarchies are best when tasks span multiple abstraction levels. Great agent systems do not merely “support multi-agent.” They choose, combine, and constrain these patterns deliberately.

Claude Code’s Orchestration in Practice

Model: openai/gpt-5.4
Token Usage: unavailable in current environment

The taxonomy above becomes sharper once Claude Code is described in its own terms rather than as a vague “commercial multi-agent system.” In practice, Claude Code is best classified as a Hub-and-Spoke orchestration design with an optional Coordinator overlay.

The hub-and-spoke term is worth defining because it is more common in systems design than in undergraduate CS textbooks. A hub-and-spoke topology means one central node coordinates several peripheral nodes. The center receives requests, allocates work, and integrates outputs; the spokes do specialized work without each becoming a new center of authority. Applied to Claude Code, the hub is the main coding agent or coordinator, and the spokes are the spawned sub-agents and background tasks.

This framing explains several Claude Code features at once. AgentTool is the spoke-creation mechanism. It gives the hub a way to create a worker with its own role and context boundary. TaskCreateTool and the wider Task tools add asynchronous lifecycle control to the hub, so the center can dispatch work without blocking. TaskGetTool, TaskListTool, and TaskOutputTool make the hub observable: the coordinator can ask what is running, what finished, and what results should be reintegrated. DreamTask extends this pattern for longer-duration work, making the topology less like one conversation and more like a lightweight job system.

What makes Claude Code especially interesting is that the spokes are not just extra model calls. They are separated by context isolation. This means the hub does not simply clone its full state into every worker. Instead, it selectively delegates a bounded problem. In orchestration terms, Claude Code combines hub-and-spoke control with a context firewall. The system is therefore not only topologically centralized; it is informationally compartmentalized.

That gives Claude Code a different feel from OMO. OMO is more naturally described as a three-layer hierarchy: Planning → Execution → Workers. In such a hierarchy, there are not only multiple agents but also multiple authority layers and abstraction levels. A planning layer thinks about the mission, an execution layer manages operational flow, and worker layers do narrower tasks such as search, editing, or analysis. Claude Code can approximate some of that behavior, especially in Coordinator mode, but its default structure is flatter. The center is strong, the workers are bounded, and most reintegration happens back into one supervising node rather than through multiple formal tiers.

This flatter structure has practical consequences. It makes the orchestration easier to reason about. It reduces the number of protocol transitions. It is likely easier to secure and productize because there are fewer independent control loops. But it also means Claude Code relies more heavily on the quality of the central coordinator. In a deeper hierarchy, some burden can be delegated to middle layers. In a flatter hub-and-spoke design, the hub remains cognitively important.

Claude Code also differs from OpenCode in a concrete way. OpenCode has agent and task-like foundations, but its orchestration surface is comparatively thin. It gives developers primitives. Claude Code gives users a more opinionated operational mode. That is why the same abstract pattern can feel different across systems: in OpenCode, hub-and-spoke is something a builder assembles; in Claude Code, it is closer to a productized behavior.

The introduction of SendMessageTool is another useful clue. Strict hub-and-spoke systems often route everything through the center. Claude Code mostly works that way, but SendMessageTool allows limited lateral coordination among running agents. This does not transform the system into a mesh, but it does create a hybrid edge case: a hub-and-spoke architecture with constrained side-channel communication. In practice, that means Claude Code is centralized by default, but not totally incapable of peer-level signaling.

graph TB
    subgraph "OpenCode: Orchestrator-Worker"
        OC_M["Build Agent"] --> OC_W1["Plan"]
        OC_M --> OC_W2["Explore"]
        OC_M --> OC_W3["General"]
    end
    
    subgraph "Claude Code: Hub-and-Spoke"
        CC_M["Main Agent"] --> CC_S1["Sub-Agent"]
        CC_M --> CC_S2["Sub-Agent"]
        CC_M --> CC_BG["Background Task"]
        CC_M --> CC_CO["Coordinator"]
        CC_CO --> CC_W1["Worker"]
        CC_CO --> CC_W2["Worker"]
    end
    
    subgraph "OMO: 3-Layer Hierarchy"
        OMO_P["Prometheus"] --> OMO_A["Atlas"]
        OMO_A --> OMO_W1["Oracle"]
        OMO_A --> OMO_W2["Hephaestus"]
        OMO_A --> OMO_W3["Librarian"]
        OMO_A --> OMO_W4["Explore"]
        OMO_A --> OMO_W5["Junior"]
    end
    
    style OC_M fill:#4a9eff,color:#fff
    style CC_M fill:#ff6b6b,color:#fff
    style OMO_P fill:#51cf66,color:#fff
    style OMO_A fill:#ffd43b,color:#000

The following table extends the earlier taxonomy with a more concrete orchestration feature comparison:

Feature	OpenCode	OMO	Claude Code
Sub-agent spawning	task tool (basic)	delegate-task with category routing	AgentTool with context isolation
Background tasks	limited	5 concurrent per model/provider	TaskCreate/Get/List/Output + DreamTask
Inter-agent messaging	none	parent session notification	SendMessageTool
Custom agent definitions	4 built-in	11 built-in + dynamic	`.claude/agents/` markdown + built-in
Orchestration pattern	Orchestrator-Worker	Hierarchical (3-layer)	Hub-and-Spoke + Coordinator
Context strategy	shared session	wisdom accumulation	context isolation (firewall)

This table highlights an important point: Claude Code is not “less advanced” than OMO simply because it is flatter. It is advanced in a different direction. OMO maximizes orchestration richness. Claude Code maximizes orchestration discipline inside a commercial product boundary. OpenCode, by contrast, maximizes substrate simplicity.

For taxonomy purposes, Claude Code therefore occupies a distinctive middle position:

more structured than OpenCode’s builder-oriented substrate,
less deeply layered than OMO’s explicit hierarchy,
stronger on context isolation than either,
stronger on productized async task handling than casual sub-agent systems.

From a design perspective, this makes Claude Code a good case study in practical orchestration minimalism. That phrase is another derived term rather than a textbook one, and here it means: provide enough orchestration machinery to gain real parallelism, specialization, and task continuity, but avoid turning the runtime into a sprawling, difficult-to-govern society of agents. Claude Code does not chase maximum orchestration complexity. It chases a usable, bounded subset of it.

The result is a system whose default orchestration logic can be summarized simply:

keep one accountable center,
delegate bounded work outward,
let some work continue asynchronously,
allow limited agent-to-agent signaling,
bring back compressed results,
preserve isolation wherever possible.

That operating model is likely one reason Claude Code feels comparatively stable in multi-agent usage. It does not ask users to manage a whole agent civilization. It offers a curated topology: one hub, many bounded spokes, optional coordinator behavior, and controlled information flow. In the broader taxonomy of coding-agent orchestration, that is Claude Code’s clearest identity.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 15 — Agent Orchestration Comparison
Token Usage: ~5,100 input + ~1,240 output

15.2 Single-Agent vs Multi-Agent

The central debate in coding agents is not whether one model instance can solve a task. It often can. The question is whether splitting cognition across multiple task contexts produces enough quality gain to justify extra latency, token cost, and orchestration overhead.

Anthropic’s reported result is the best-known data point: a multi-agent Opus+Sonnet system achieved 90.2%, outperforming a single Opus agent on the benchmark in question. The exact benchmark setup matters, but the broader lesson is clear. A stronger topology can outperform a stronger single thread of reasoning. In other words, scaffolding is not cosmetic; it changes problem-solving power.

Why does multi-agent help? Mainly because a single agent is forced to do four jobs inside one context window: understand the task, explore the space, execute changes, and evaluate its own work. That creates interference. Search contaminates action, action contaminates reflection, and local details crowd out higher-level planning. Multi-agent designs reduce that interference by separating concerns. One context can stay focused on planning while another searches code, another gathers external evidence, and another verifies outputs.

OpenCode leans toward the single-agent baseline. Its runtime is elegant precisely because it does not assume complicated delegation. A primary session can use tools, maintain context, and solve many tasks end to end. For many repositories, especially small and medium ones, this is enough. Lower token burn, less orchestration complexity, and easier debugging are real advantages. Single-agent systems also fail more legibly: there is only one chain of responsibility to inspect.

OMO argues that single-agent simplicity eventually hits a wall. Once a task involves broad repository search, external documentation, conflicting constraints, or long-running execution, it becomes useful to split the work. OMO’s orchestration system therefore treats delegation not as an exotic option but as a normal operating mode. Explore agents search. Oracle agents stay read-only. Librarian-style helpers gather external information. Parent sessions coordinate. This makes OMO far more willing than plain OpenCode to spend tokens in exchange for bounded specialization.

Claude Code takes a pragmatic middle path. It is not as orchestration-maximalist as OMO, but it clearly accepts the value of subagents and background tasks. DreamTask-style execution changes the subjective feel of the product: the user no longer has to sit inside a single foreground thread. Parallel subwork becomes part of the interaction model, not merely an implementation detail.

Still, the cost question is unavoidable. Multi-agent systems can easily cost 10x to 15x more tokens than a single pass once you include delegation prompts, repeated repository summaries, worker outputs, evaluator passes, and result integration. So when is that worth it?

It is worth it when the task has at least one of five properties.

First, search breadth. If the correct answer may be hidden across dozens of files, a single context often wastes tokens serially exploring dead ends. Parallel workers can search much faster.

Second, heterogeneous work modes. External research, code editing, architecture synthesis, and safety review are cognitively different activities. Isolating them improves performance.

Third, high error cost. If a bad answer is expensive—a security patch, migration plan, or release-blocking refactor—paying more for redundancy and checking is rational.

Fourth, long horizon tasks. The longer the task, the more a single context accumulates irrelevant residue. Delegated contexts keep local work local.

Fifth, parallelizable subtasks. Multi-agent only pays when there is true independence. If all workers need the same constantly changing state, the overhead overwhelms the benefit.

Conversely, multi-agent is usually not worth it for quick edits, isolated bug fixes, narrow file-local refactors, or tasks where the main difficulty is not search but precise execution. In those cases, orchestration becomes theater. You spend tokens to feel sophisticated while making little progress.

There is also a governance issue. Single-agent systems are easy to reason about. Multi-agent systems need explicit control over permissions, stopping conditions, result merging, and duplicate work. OMO addresses this through role constraints, background limits, and hooks. Claude Code addresses it through a product-level task model. OpenCode leaves more responsibility to the builder.

The deeper point is that “single versus multi” is the wrong binary. The real axis is adaptive delegation. A good agent platform should default to single-agent for cheap, well-bounded work, then escalate into multi-agent only when the structure of the problem justifies it. That means the system needs some way to estimate task breadth, risk, and decomposability.

From a design perspective, the best architecture is not one that always uses many agents. It is one that knows when not to. Anthropic’s 90.2% result shows that multi-agent can be materially better. It does not show that every problem deserves a swarm. OMO’s lesson is to operationalize delegation. OpenCode’s lesson is to respect simplicity. Claude Code’s lesson is to make advanced coordination feel natural rather than burdensome.

The synthesis is clear: start with one agent, escalate deliberately, and treat token cost as an investment that must earn its return.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 15 — Agent Orchestration Comparison
Token Usage: ~4,700 input + ~1,130 output

15.3 Agent Specialization

Specialization is the difference between “many copies of one assistant” and “a coordinated team.” A multi-agent system only becomes structurally interesting when different agents are allowed, encouraged, or forced to behave differently.

OpenCode has the raw substrate for specialization. It supports agents, model selection, commands, and tools, so builders can define different prompts and responsibilities. But its philosophy remains general-purpose. The host does not strongly encode occupational identities. In practice, specialization in plain OpenCode is mostly a builder convention.

OMO, by contrast, treats specialization as a first-class control mechanism. Three examples illustrate the point well: Oracle, Explore, and Librarian.

Oracle is read-only. That sounds like a minor implementation detail, but it has profound architectural consequences. A read-only specialist cannot impulsively “fix” things while investigating. It is forced to inspect, reason, and report. This is useful for auditing, debugging, code review, dependency tracing, and repository explanation. Many agent failures come from overreach: the system edits before it fully understands. Oracle exists to block that failure mode.

Explore is optimized for fast internal discovery. Its job is not to produce polished prose or to take full ownership of execution. Its job is to search, map, and surface signals quickly. In a large repository, that role is valuable because search itself is a cognitive workload. By isolating it, OMO reduces the chance that the main executor burns half its context budget just trying to locate the relevant files.

Librarian handles external knowledge. This creates a boundary between repository truth and outside truth. That boundary matters. Documentation lookup, web references, or API behavior research should not be confused with local code inspection. A dedicated external-information role keeps provenance clearer.

The key design choice is not merely prompt wording. It is the coupling of prompts with permission constraints. OMO does not just say “behave like an auditor.” It also restricts what an auditor can do. This is important because LLMs are opportunistic reasoners. If the tools allow editing, many prompts will eventually drift toward editing. Behavioral constraints are much stronger when they are backed by capability constraints.

This permission-backed specialization prevents several common pathologies.

One is scope creep. An exploration agent starts writing code “just to help,” and suddenly the system has unreviewed edits from a context that was never meant to own the outcome.

Another is context contamination. If the same agent both gathers noisy possibilities and commits final actions, speculative branches can bleed into authoritative decisions.

A third is accountability blur. When a worker is clearly read-only, or clearly external-only, its output is easier to interpret. Humans and parent agents know how much trust to place in it.

Claude Code also supports specialization, though in a more product-packaged style. Its task-oriented structure, custom agents, and subagent execution allow different roles to emerge. But Claude Code usually emphasizes smooth user experience over explicit role taxonomy. OMO is more opinionated. It names the jobs, narrows them, and wires them into orchestration logic.

There is a broader systems lesson here. In human organizations, specialization appears whenever coordination cost is lower than cognitive switching cost. The same is true for agents. A specialist is valuable when staying in one mode produces better results than repeatedly reconfiguring one generalist context.

However, specialization has costs. Too many roles create handoff overhead. Too many constraints can make the system brittle. Too many identity distinctions can force the orchestrator to spend more effort routing work than solving the problem. OpenCode avoids this by staying sparse. Claude Code smooths it through product curation. OMO embraces the complexity because it wants more control.

The ideal design principle is therefore not “specialize everything.” It is specialize where failure modes differ. Read-only roles help when premature action is dangerous. Search roles help when discovery is expensive. External-knowledge roles help when provenance matters. Verification roles help when trust is costly.

Seen this way, OMO’s role division is not cosmetic branding. Oracle, Explore, and Librarian encode a theory of agent failure. The theory says that coding assistants go wrong when they mix investigation, execution, and evidence sourcing in one unbounded context. Permission constraints then turn that theory into enforceable architecture.

This is one of OMO’s strongest contributions to the field. It shows that specialization should not live only in prompt text. It should live in capabilities, workflow position, and trust semantics.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 15 — Agent Orchestration Comparison
Token Usage: ~4,850 input + ~1,160 output

15.4 Background Tasks and Parallel Execution

Parallelism changes an agent system in two ways at once. It changes benchmark behavior, and it changes user experience. Many discussions focus only on the first. That is incomplete. In tools such as OMO and Claude Code, the ability to run work in the background changes how developers think, wait, and coordinate with the machine.

OMO’s implementation is unusually explicit. It supports up to five concurrent background agents per model/provider combination. That limit matters because unconstrained concurrency is not a feature; it is a denial-of-wallet attack waiting to happen. By bounding parallelism, OMO turns “many agents” into a managed resource rather than a chaotic burst of calls.

The concurrency model also reflects a practical insight: parallel work must be observable and resumable. OMO therefore couples background spawning with session tracking, notifications, and follow-up retrieval. Workers are not fire-and-forget threads. They are addressable sessions with outputs that can be checked, summarized, or continued. This gives parallelism operational dignity. It is not just async plumbing; it is a user-facing coordination model.

Claude Code’s DreamTask-style behavior points in a similar direction, even if the product framing is different. The important shift is that work no longer has to remain attached to the foreground conversational turn. A developer can ask for a larger task, let the system continue, and return later. This makes the agent feel less like a chat toy and more like a junior collaborator that can be assigned bounded work.

OpenCode, in contrast, is more synchronous in spirit. It can be extended into richer concurrency, and OMO proves this by building on top of it, but the base product does not foreground background orchestration in the same way. That simplicity has advantages: fewer moving pieces, fewer lifecycle surprises, easier mental models. But it also means the raw developer experience is less “delegative.”

Why does parallelism matter so much in practice? Because coding tasks are rarely purely sequential. Search can happen in parallel with documentation lookup. Test triage can happen while a planner prepares a patch strategy. Multiple solution candidates can be explored at once. Human developers naturally do this with browser tabs, terminals, and teammates. A serious coding agent should eventually learn the same trick.

Yet parallelism brings hard systems problems.

The first is duplication. Two agents can search the same area, repeat the same reasoning, or propose incompatible edits. OMO explicitly warns against this in its operating discipline. Delegation must create non-overlapping work, or the token budget collapses.

The second is merge cost. Work done in parallel is not free unless the parent can integrate results cheaply. If summaries are vague, or workers depend on stale state, the supposed speedup disappears.

The third is context skew. One worker may see an older repository state than another. Background tasks therefore need careful framing: read-only exploration is easier to parallelize than concurrent editing.

The fourth is UX clarity. If users do not know what is running, what finished, and what remains blocked, background execution feels magical in the bad sense of the word. Progress visibility is essential.

OMO’s design is strong precisely because it treats background work as part of orchestration policy. Limits, notifications, session IDs, continuation, and role constraints all work together. Claude Code’s design is strong because it productizes the same underlying idea: developers should be able to hand off work and keep moving. OpenCode’s contribution is subtler: by staying modular, it provides the foundation upon which a richer async layer can be built.

There is also a psychological effect. In a purely synchronous agent, the user sits inside the model’s thought loop. In a background-capable agent, the user becomes a manager of workstreams. That is a meaningful transition in developer experience. It changes the interaction from “please answer now” to “go do this and report back.”

This is why background tasks are not a minor convenience feature. They are part of the transition from chat assistant to execution system. OMO demonstrates the architectural version of that transition. Claude Code demonstrates the polished-product version. OpenCode demonstrates that a good base substrate can remain simple while still enabling more advanced orchestration layers.

The final design lesson is simple: parallelism is worth adding only if the system can bound cost, prevent duplicate labor, preserve observability, and reintegrate results cleanly. Without those controls, concurrency becomes expensive noise. With them, it becomes one of the clearest markers that an agent has matured beyond a single conversational loop.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 15 — Agent Orchestration Comparison
Token Usage: ~4,950 input + ~1,210 output

15.5 Wisdom Transfer vs Context Isolation

One of the deepest differences in agent design is not about tools or models. It is about what survives between task contexts. Should knowledge discovered by one agent flow into future work, or should each new context start mostly clean?

OMO leans toward wisdom accumulation. Claude Code leans more strongly toward context isolation. OpenCode, as usual, provides the substrate and lets builders choose.

Wisdom accumulation means that the system captures useful discoveries, heuristics, repository-specific lessons, or operational reminders and makes them available later. In OMO, this idea appears in explicit wisdom systems, continuation support, remembered work patterns, and reusable artifacts generated across sessions. The architectural intuition is attractive: if the system already learned where the build traps are, which commands are expensive, or which files are authoritative, why pay to rediscover that every time?

This creates compounding returns. The more the system works in a repository, the more efficient it can become. Recurring pain points are surfaced earlier. Known anti-patterns can be avoided sooner. Frequently used recovery moves can be encoded into future behavior. In economic terms, OMO tries to turn prior token spend into future savings.

Claude Code is more cautious. Its design philosophy gives substantial weight to context hygiene. Isolated task contexts reduce contamination from stale assumptions, speculative notes, and outdated local conventions. A fresh task has a cleaner epistemic starting point. The word epistemic simply means “related to knowledge and how we know things.” In agent systems, epistemic cleanliness matters because bad retained context can be worse than missing context. False certainty is dangerous.

Context isolation has several benefits.

First, it reduces prompt barnacles—small leftovers that accumulate over time and slowly distort future behavior.

Second, it improves debuggability. If a task goes wrong, there are fewer hidden inherited assumptions.

Third, it limits context pollution, where facts that were once useful become misleading after the repository changes.

Fourth, it preserves stronger boundaries between tasks, which is often desirable in commercial or enterprise settings.

But isolation also imposes a tax. The system must repeatedly relearn local reality. The same file map may be rediscovered. The same rules may be re-read. The same architectural caveats may be reconstructed across sessions. This is safe, but expensive.

OMO accepts more contamination risk because it believes the upside of accumulated operational knowledge is substantial. Claude Code accepts more rediscovery cost because it believes stale context is a serious failure mode. Neither side is obviously wrong. They are optimizing different failure budgets.

This tradeoff resembles a classic systems tension between cache and fresh read. A cache speeds repeated access but can become stale. A fresh read is slower but safer. Wisdom accumulation is a cognitive cache. Context isolation is a bias toward fresh reads. The right answer depends on how quickly the underlying world changes and how costly stale assumptions are.

OpenCode does not force a strong answer. Its minimalism allows builders to implement persistent memory, skill systems, config layering, or custom continuation strategies without making them mandatory. This is philosophically consistent: OpenCode is a platform more than an opinionated memory regime.

The best future design will probably not choose one side absolutely. It will separate retained knowledge into layers.

One layer should contain stable knowledge: repository conventions, preferred commands, non-obvious build rules, known dangerous paths. This can be preserved aggressively.

Another layer should contain volatile task state: half-finished hypotheses, temporary branch assumptions, exploratory notes. This should decay quickly or stay isolated.

A third layer should contain verified summaries produced at explicit checkpoints. These act as compressed, trust-rated knowledge transfer objects.

OMO already points toward this direction by distinguishing kinds of injected or preserved context. Claude Code points toward it by respecting isolation and avoiding uncontrolled carryover. The synthesis would be selective transfer rather than total memory or total reset.

For agent designers, this chapter’s key insight is simple. Knowledge transfer and context hygiene are in tension because memory is both an asset and a liability. If you retain too much, you inherit ghosts. If you retain too little, you stay forever naïve. OMO explores the power of retained wisdom. Claude Code explores the safety of fresh contexts. OpenCode preserves the freedom to build either. The best systems will learn not merely to remember, but to remember with calibrated trust.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 16 — Extensibility Comparison
Token Usage: ~4,800 input + ~1,150 output

16.1 Plugin System

Plugins are the most powerful and dangerous extensibility layer in coding agents. They are powerful because they can modify runtime behavior close to the core. They are dangerous because they can also blur trust boundaries, destabilize execution, and create subtle compatibility debt. Comparing OpenCode, OMO, and Claude Code through their plugin systems reveals three different philosophies of extensibility.

OpenCode offers the clearest host-level plugin substrate. Its plugin API exposes a small but high-leverage set of lifecycle interception points: configuration injection, tool registration, chat-message interception, parameter transformation, event handling, pre/post tool execution, and message transformation. This surface is intentionally compact. It is large enough to let a plugin alter fundamental behavior, but small enough to remain understandable.

That compactness is exactly what made OMO possible. OMO is not a fork that rewrites the host. It is a massive orchestration layer built as a plugin on top of OpenCode. This is a crucial proof point. A plugin system is only truly powerful when it can support not just small add-ons, but entire secondary runtimes. OMO demonstrates that OpenCode’s plugin surface has enough leverage to host agent orchestration, tool injection, background task systems, compatibility layers, and hook multiplexing.

Claude Code also has a plugin story, but it is shaped by commercial constraints. Its plugin model is generally more curated, more product-bound, and more safety-conscious. The goal is less “let extensions reshape everything” and more “let extensions add meaningful capability without compromising the product envelope.” That is a rational tradeoff for a commercial system, especially one with enterprise expectations.

The most interesting contrast is therefore this:

System	Plugin role
OpenCode	Host-level extensibility substrate
OMO	Plugin as orchestration runtime
Claude Code	Plugin as curated product extension

OpenCode’s plugin system is developer-first. The extension author is trusted to do serious engineering. That openness is empowering, but it also means quality discipline must come from documentation, conventions, and the surrounding community. A bad plugin can easily make the host feel unreliable.

OMO pushes that openness to its logical extreme. It treats the plugin layer not as a place for a few callbacks, but as a way to graft an entire behavioral nervous system onto the host. OMO’s success therefore says two things at once. First, OpenCode’s plugin API is genuinely capable. Second, once a plugin surface becomes that capable, it stops being “just extensibility” and starts becoming a second-order platform.

Claude Code is more conservative for good reasons. Commercial products need predictability, supportability, and tighter security guarantees. So its plugin story is less about maximum leverage and more about bounded extension. This usually means clearer product boundaries, fewer ways to intercept deep lifecycle events, and more emphasis on approved patterns.

One useful way to think about plugin systems is along a spectrum of host sovereignty. If the host retains strong sovereignty, plugins are guests. If the host gives away more lifecycle control, plugins become co-governors. OpenCode moves meaningfully toward co-governance. OMO proves how far that can go. Claude Code keeps more sovereignty in the host itself.

This difference has practical consequences.

In OpenCode, plugins can become architectural instruments. You can add tools, rewrite context, intercept tool boundaries, and build composite behavior. In OMO, the plugin layer becomes the main vehicle for orchestration innovation. In Claude Code, plugins are less likely to become alternate operating systems living inside the product. That makes the system safer and more consistent, but less radically modifiable.

There is also a maintenance lesson. A very powerful plugin surface creates ecosystem innovation, but it also creates version-coupling risk. When plugins rely on subtle lifecycle behavior, host updates can break them in non-obvious ways. Commercial systems reduce this by narrowing the surface. Open systems accept more risk in exchange for more creativity.

For agent platform designers, the key question is not simply “should we support plugins?” The deeper question is what kind of extension contract the product can afford. If the goal is a platform others can build on deeply, OpenCode is the stronger model. If the goal is to prove how much orchestration can be layered atop a host, OMO is the strongest case study. If the goal is to preserve product coherence and enterprise trust, Claude Code’s more bounded model is sensible.

The synthesis is that plugin systems should be judged by leverage, clarity, and blast radius. OpenCode maximizes leverage. OMO demonstrates leverage compounded through architecture. Claude Code minimizes blast radius. All three are valid, but they serve different strategic goals.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 16 — Extensibility Comparison
Token Usage: ~4,650 input + ~1,120 output

16.2 Skill System

Skills are a different kind of extension from plugins. A plugin changes runtime behavior. A skill changes agent behavior indirectly by adding structured knowledge, procedures, heuristics, or usage guidance. In human terms, plugins change the machine; skills train the operator.

OpenCode’s skill system is notable for how simple its content model is. A SKILL.md file with YAML frontmatter can define a reusable body of guidance. Discovery happens across global directories, project directories, config directories, explicit paths, and even remote URLs. This is a powerful design because it treats skills as content assets rather than code modules. The barrier to extension becomes much lower.

That simplicity also creates interoperability. OpenCode intentionally scans .claude/skills/ and .agents/skills/ locations, which means the system is not trying to monopolize skill content. Instead, it behaves like an extensibility substrate that can absorb neighboring ecosystems. This is strategically smart. Content portability reduces migration friction and grows the effective skill ecosystem.

OMO inherits this foundation and makes it more dynamic. In OMO, skills are not just static markdown files waiting to be manually invoked. Skill reminders, category-triggered guidance, embedded MCP access, and runtime nudges make skills active participants in execution. This is an important difference. OpenCode makes skills discoverable. OMO makes them operational.

Claude Code also treats skills as a major extension layer, but usually within a more product-defined workflow. Skills become a sanctioned way to encode domain expertise, team conventions, or procedural patterns without opening the much riskier plugin surface. This is a strong commercial design choice because markdown knowledge assets are easier to audit, share, and review than arbitrary executable extensions.

Across the three systems, the comparison looks like this:

System	Skill philosophy
OpenCode	Simple portable knowledge assets
OMO	Knowledge assets activated by orchestration
Claude Code	Safer behavior extension for product workflows

The skill layer matters because LLM systems often fail not from missing raw intelligence, but from missing local procedure. A model may know programming well in general and still fail a repository-specific workflow: which script to run, which directory to avoid, which testing ritual matters, which style rules are non-negotiable. Skills encode that local know-how.

Another advantage of skills is governance. Compared with plugins, they are easier to inspect. Markdown is legible. Frontmatter is explicit. Reviewers can reason about what a skill is trying to teach. Of course, a malicious or low-quality skill can still mislead the model, but the risk profile is very different from arbitrary code execution.

OMO adds an especially interesting twist with skill-embedded MCPs and runtime reminders. This collapses the distance between “knowledge about what to do” and “access to the external capability needed to do it.” That is a strong pattern because a skill becomes not only guidance, but also an integration entry point.

The main design tension is discoverability versus overload. If a system loads too many skills too aggressively, the agent becomes noisy and distracted. If it loads too few, useful local knowledge remains dormant. OpenCode solves this partly through a clean discovery hierarchy. OMO solves it partly through category-aware reminders and orchestration logic. Claude Code solves it partly through more curated product-level invocation patterns.

For future agent platforms, the lesson is clear. Skills are the best extensibility layer for organizational knowledge: cheap to create, easy to review, portable across projects, and far safer than executable plugins. OpenCode shows the elegance of a simple content model. OMO shows how to animate that content at runtime. Claude Code shows why skill systems are commercially attractive: they extend behavior without giving away the host.

Deep Dive: The Skill System — As Important as MCP

Model: openai/gpt-5.4
Token Usage (this appended section, estimated): ~5,500 input + ~2,200 output

Skills deserve their own deep analysis because they sit in the same strategic tier as plugins and MCP. In this book, MCP already received a full chapter because it is the most visible protocol innovation in the current agent era. But in practice, many real-world agent failures are not caused by a missing protocol. They are caused by missing local know-how: the unwritten procedure for this repository, this team, this deployment path, this release ritual, this compliance checklist. That is exactly the territory of skills.

Put differently, plugins extend the host, MCP extends the host’s tool reach, and skills extend the host’s operational judgment. All three matter, but skills are often underestimated because they look “too simple.” A markdown file does not feel as glamorous as a protocol server. Yet simplicity is precisely the point. A skill can encode domain knowledge at the level where work actually breaks: naming conventions, release order, safety rituals, review rules, documentation style, escalation conditions, team-specific heuristics, and hidden dependencies between steps. That kind of local procedure is rarely in a CS textbook because it is not universal theory; it is situated practice. But agent success in production depends on it.

This is why skills should be treated as one of the three foundational extension primitives. If a team only thinks in terms of “tools” and “integrations,” it will miss the layer that teaches the model how to use those tools responsibly. A skill is often the difference between an agent that merely has access and an agent that knows what good use of that access looks like.

OpenCode’s skill architecture: content-first extensibility

OpenCode’s skill system is architecturally elegant because it treats skills as content assets rather than code packages. At the center is a SKILL.md file with YAML frontmatter. The frontmatter supplies lightweight metadata such as name, description, and trigger-oriented information; the markdown body carries the actual procedural or reference content. This is important because it sharply lowers extension cost. To create a skill, a team does not need to learn an SDK, package a bundle, publish a registry artifact, or maintain a runtime process. They only need to write down expertise clearly.

That small design choice has large consequences. First, it democratizes extensibility. A senior engineer, tech lead, infra specialist, security reviewer, or staff writer can all author skills without becoming plugin developers. Second, it improves auditability. Markdown is readable. Frontmatter is explicit. A reviewer can inspect a skill and understand what behavior it is trying to induce. Third, it makes skills portable across repositories and even ecosystems, because markdown plus metadata is easier to move than executable logic.

OpenCode’s discovery hierarchy reinforces that philosophy. The loader scans multiple locations: global user-level locations such as ~/.config/opencode/skills/ or equivalent config directories, project-level .opencode/skills/, and external ecosystems such as .claude/skills/ and .agents/skills/. This is more than convenience. It expresses a strategic position: skill content should not be trapped inside one tool brand. By scanning Claude Code and Agents-style directories, OpenCode acts like an extensibility substrate willing to absorb neighboring conventions instead of forcing a clean-room ecosystem.

The discovery order also reflects an important governance pattern. Home-level skills provide durable personal knowledge. Project-level skills provide repository-local procedure. External skill directories provide compatibility and migration bridges. In effect, OpenCode lets knowledge accumulate at multiple scopes: individual, project, and cross-tool. That is exactly how real engineering knowledge is distributed in organizations.

Operationally, the bridge into runtime is the skill tool. The agent does not have to preload every skill body into context. Instead, the system exposes available skills as a discoverable inventory, then loads a specific skill on demand when the task matches. This matters for token economy and focus. Massive skill libraries are useful only if the model can selectively pull the right one at the right moment. OpenCode’s design says: advertise broadly, inject narrowly.

There is also a subtle architectural benefit here. Because skills are not code modules, they are versioned and reviewed more like documentation than like software. That makes them an ideal carrier for reference content and task content. A style guide, API usage note, release playbook, or incident checklist can all live in the same basic format. The result is an unusually low-friction knowledge plane for the agent.

OMO’s innovation: turning skills from static content into active orchestration

Oh-My-OpenCode inherits OpenCode’s discovery model, but then pushes the concept further in several directions. The first step is obvious but important: OMO keeps the discovery compatibility story. It can discover OpenCode-style skills, Claude Code-style skill directories, and additional custom paths. So the base portability story is preserved.

The second step is the real innovation: dynamic activation. In plain OpenCode, a skill is primarily a content asset waiting to be loaded. In OMO, skills become runtime actors in the orchestration system. They are not only available; they are contextually suggested, reminded, merged, filtered, and sometimes paired with capabilities. This is the point where a skill system stops being a passive library and becomes part of agent control flow.

The most distinctive mechanism is the skill-embedded MCP design. In OMO, a skill can bundle an MCP server configuration, whether over stdio or HTTP. Conceptually, this collapses “knowledge” and “capability” into one portable unit. A traditional skill says: here is how to do X. A skill-embedded MCP says: here is how to do X, and here is the tool endpoint you will need to do it. That is a major leap because it shortens the path from procedure to execution.

The SkillMcpManager is the engineering layer that makes this practical rather than merely aspirational. It handles connection reuse, session scoping, pooling, retry logic, step-up authentication handling, idle cleanup, and teardown. In other words, OMO does not just let a markdown skill mention a server; it creates a lifecycle manager so those servers behave like first-class, efficiently shared runtime resources. This is crucial, because without pooling and cleanup, embedded MCPs would quickly become operationally messy.

This leads to OMO’s three-tier MCP picture. First, there are built-in remote MCPs such as Exa/web search, Context7 documentation lookup, and grep.app-style code search. These provide high-value external reach out of the box. Second, there are Claude Code-compatible MCP definitions loaded from .mcp.json, which gives OMO a migration and interoperability layer with an existing ecosystem. Third, there are skill-embedded MCPs, which bind a particular capability to a particular knowledge package. The three tiers correspond to three different use cases: platform defaults, environment compatibility, and domain-specific packaged workflows.

Another OMO contribution is runtime skill reminders. This may sound minor, but it addresses a real failure mode in agent systems: dormant knowledge. A skill can exist and still go unused because the model never chooses to load it. OMO reduces that risk by nudging relevant skills into consideration based on task category and agent role. In effect, it creates a lightweight recommendation layer for procedural knowledge. The orchestration system reminds the model that certain expertise exists before the model wanders off into generic behavior.

This is especially important in multi-agent settings. If Atlas or another orchestrator is delegating work, the cost of forgetting a relevant skill is amplified across subagents. A reminder system helps maintain consistency: not every subagent has to rediscover from scratch that there is a repo-specific frontend, docs, security, or release workflow. OMO turns skills into a semi-active part of routing logic.

OMO also addresses the messy reality of overlapping skill sources. If the same skill name appears in multiple places, naive loading can produce collisions, ambiguity, or duplicated context. OMO therefore includes skill deduplication and merge behavior, preferring a coherent final set over brute-force accumulation. This matters because interoperability only works if the result stays understandable. A compatibility-friendly system must also be a conflict-management system.

Finally, OMO extends the skill story into governance through the skill-guardian pattern: a pre-installation security gate for skills. This is not about arbitrary code execution in the same way as plugin review, but it still matters because skill installation changes model behavior and may introduce unsafe or low-quality procedure. The guardian concept treats skills as auditable but still worthy of scrutiny. That is a mature stance: “lower risk” does not mean “no review needed.”

Claude Code’s skill system: productized, safe, and organization-friendly

Claude Code approaches skills from a different angle. The design is less about ecosystem absorption and more about productized behavior extension. Claude Code includes bundled skills and supports custom skills under .claude/skills/. Like OpenCode, it uses markdown plus YAML frontmatter, which confirms a broader convergence in the industry: skills work best when they are authored as structured text.

Claude Code also offers the /skills slash command for discovery and management. This makes the system visibly user-facing in a way that many developer-oriented extensibility layers are not. The slash-command surface matters because discoverability is one of the biggest practical problems in any extension system. A feature that exists but is hard to inspect may as well not exist for many users. /skills therefore acts as a usability layer over the raw file-based mechanism.

The most important thing about Claude Code’s skill design is that it is arguably the safest extension layer in the product. Plugins or arbitrary external tooling inevitably expand the attack surface because they involve executable logic, network boundaries, or host integration points. Skills, by contrast, are mostly inspectable text. They can still carry prompt-injection-like risks or encode bad procedure, but they are far easier to audit than arbitrary code. That makes them especially attractive in enterprise settings.

Claude Code further enriches the concept by integrating skills with the memory system. A skill can reference durable memory entries, session memory, or organizational context, allowing instructions to point toward living project knowledge rather than only static prose. This is a powerful pattern. It means a skill is not only a frozen recipe; it can become a structured doorway into a maintained knowledge base. For teams, that dramatically improves long-term usefulness. Instead of rewriting the entire world into each skill, the skill can define the procedure while memory stores changing facts.

That combination makes skills highly suitable for enterprise use. A company can define team skills for coding standards, release policy, architectural review expectations, support runbooks, compliance requirements, or onboarding patterns. Because the artifact is text, it fits existing review culture. Because the triggers are explicit, it fits governance. Because the execution risk is low compared with plugins, it fits security review. In commercial product terms, skills are the sweet spot between power and controllability.

The Three Content Types framework: why skills matter beyond “prompt files”

Chapter 22 introduces a useful framework that becomes especially illuminating here: three content types.

Reference content: the LLM reads it to understand something. Style guides, API notes, repository conventions, domain definitions, and policy summaries belong here.
Task content: the LLM follows it as a workflow. Runbooks, deployment steps, review checklists, release procedures, and decision trees belong here.
Executable tools: the computer executes them deterministically. Commands, APIs, MCP servers, and scripts belong here.

Skills primarily occupy the first two categories. They are usually reference content, task content, or a blend of both. A good skill might say, “Here is the architectural style of this repository” and also, “Here is the exact sequence you must follow when changing that architecture.” That duality is why skills are so effective.

OMO is especially interesting because skill-embedded MCPs let a single skill span all three content types. The markdown body provides reference material. The workflow sections provide task guidance. The embedded MCP bridges to executable capability. This makes OMO’s skill model unusually comprehensive. A skill is no longer just a prompt file; it becomes a portable package of understanding, procedure, and operational reach.

Skills vs. MCP

Dimension	Skills	MCP
Creation cost	Low — mostly markdown and frontmatter	High — must build and run a server
Audit difficulty	Easy — readable text artifacts	Harder — arbitrary logic and protocol behavior
Capability	Knowledge, heuristics, workflow guidance	Tools, resources, prompts, external data
Security risk	Low to moderate — prompt misuse, bad guidance	Medium or higher — code execution and integration risk
Portability	High — files move easily across repos and tools	Medium — standard protocol helps, but runtime packaging still matters

This table should not be read as “skills good, MCP bad.” They solve different problems. MCP is what you use when the model needs deterministic access to external systems. Skills are what you use when the model needs contextual judgment about how to operate. In other words, MCP answers, “What can the agent reach?” Skills answer, “What should the agent do, and how should it do it here?”

Design lessons

The first lesson is that skills are the best 80/20 extension mechanism in current agent systems. They cover a very large share of customization needs with a very small share of the complexity. Most teams do not actually need to start with plugins or custom MCP servers. They need a clean way to encode local procedure, standards, preferences, and repeatable workflows. Skills do that well.

The second lesson is to think in a progression of extensibility, from simplest to most powerful: CLAUDE.md or equivalent project memory, then skills, then MCP, then plugins. Each step increases power, but also increases implementation cost, governance cost, and security exposure. The mistake many teams make is jumping too early to the bottom of the stack when their actual need is still near the top.

The third lesson is that for most organizations, skills are enough for a surprisingly long time. If the problem is “the agent keeps forgetting our procedure,” that is usually a skill problem, not a plugin problem. If the problem is “the agent needs structured access to an external system,” that may justify MCP. If the problem is “the host itself must be reprogrammed,” then plugins become appropriate. But the number of teams that truly need to start at the plugin layer is much smaller than the excitement around plugins suggests.

OpenCode demonstrates the elegance of a minimal, interoperable skill substrate. OMO demonstrates how to animate that substrate with reminders, deduplication, and capability bundling. Claude Code demonstrates why skills are commercially and organizationally attractive: they are powerful enough to matter, but safe enough to govern. Taken together, the lesson is clear. MCP may be the USB-C of the agent ecosystem, but skills are closer to its operating manual, local etiquette, and field procedures. Without them, the agent may be connected to everything and still do the wrong thing.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 16 — Extensibility Comparison
Token Usage: ~4,720 input + ~1,140 output

16.3 Command System

Commands are the most user-facing extensibility primitive among the three systems. Unlike plugins, which target runtime internals, or skills, which target agent cognition, commands target invocation patterns. They package recurring intent into a stable entry point.

OpenCode’s command system is elegant because it unifies three sources: built-in commands, user-configured commands, and commands derived from MCP prompts or skills. This is more powerful than it first appears. It means the system does not treat slash commands as a narrow UI convenience. It treats them as a routing layer between user intent and reusable prompt templates.

That template model is also thoughtfully designed. Variables such as $1, $2, and $ARGUMENTS make commands parameterizable without requiring the user to write a full prompt every time. This lowers cognitive load and encourages repeatable workflows.

OMO builds on that foundation by making commands part of its orchestration ergonomics. Auto-slash-command behavior, category-linked invocation, and compatibility with adjacent ecosystems mean commands become not merely shortcuts, but a way to trigger structured agent behaviors. In OMO, commands can route into specialized agents, subtasks, and orchestration-aware flows rather than only expanding text.

Claude Code is famous for going much further in command count and productization. A large slash-command inventory creates a feeling that the system has many native “verbs.” That matters for user experience. Commands become the visible surface of the platform’s behavioral repertoire. Instead of teaching users to write long prompts, the product teaches them to select actions.

The architectural comparison is therefore not just about quantity.

System	Command emphasis
OpenCode	Template-based extensible routing layer
OMO	Command-driven orchestration trigger layer
Claude Code	Rich product verb surface

Commands are important because they externalize structure. A good command system makes advanced behavior discoverable, repeatable, and teachable. It turns hidden prompt craftsmanship into explicit product affordances.

There is also a workflow governance angle. Commands can encode safe defaults: the right agent, the right model, the right subtask mode, the right review template. This prevents users from reinventing procedures poorly. In organizational settings, that is valuable. Instead of every engineer writing a slightly different “please review this PR” prompt, a command can standardize review intent.

OpenCode’s design is especially clean because commands are not isolated from the rest of extensibility. Skills can surface as commands. MCP prompts can surface as commands. User-defined templates coexist with built-ins. This gives the command layer unusual breadth.

OMO’s improvement is contextual intelligence. It asks not just “can a command expand into text?” but “can a command become an orchestration entry point?” That shift is subtle but important. A command no longer maps only to words. It can map to a workflow.

Claude Code’s strength is polish and memorability. A large, curated command inventory makes the product feel capable immediately. The risk, of course, is command sprawl: too many verbs, unclear distinctions, and discoverability overload. Commercial systems must continually manage this.

For agent platform designers, the best lesson is that command systems should be treated as structured intent APIs for humans. Plugins are for developers. Skills are for behavioral guidance. Commands are for repeatable user entry points. OpenCode gives the most elegant compositional model, OMO demonstrates orchestration-aware commands, and Claude Code shows how commands can become a major part of product UX.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 16 — Extensibility Comparison
Token Usage: ~5,000 input + ~1,220 output

16.4 Hook System

Hooks are where extensibility becomes lifecycle control. A hook is an interception point tied to a system event or phase: before a tool runs, after a response arrives, when compaction begins, when files change, and so on. If plugins are the broad extension chassis, hooks are the fine-grained attachment points.

OpenCode exposes a relatively small hook surface, but it is strategically chosen. The host offers roughly five major plugin-level hook points that matter most for orchestration: configuration, message handling, parameter/message transformation, event handling, and pre/post tool interception. This small set is enough to be highly expressive without becoming impossible to reason about.

OMO performs the most dramatic transformation of any system in this comparison: it maps 41 internal hooks onto 5 OpenCode hook points. That ratio is the whole story. OMO treats the host’s limited lifecycle surface as a carrier for a much richer policy graph. Session control, tool guards, message transforms, continuation enforcement, and skill reminders are all layered onto that compact substrate.

Claude Code, meanwhile, exposes five hooks of its own, but with a different flavor. Its hooks are more obviously shaped around product safety and lifecycle integration, such as session starts, compaction boundaries, post-sampling moments, and file-change events. The commercial philosophy is visible here: hooks should support customization and automation without letting extensions take over the whole runtime.

This produces a striking comparison:

System	Hook profile
OpenCode	Small, powerful host hook surface
OMO	41-hook policy system on 5 host hook points
Claude Code	5 curated product/safety hooks

The number mismatch matters. OpenCode’s five hook points are host primitives. OMO’s forty-one are not host primitives; they are a second-level hook taxonomy built atop those primitives. That means OMO has effectively created a meta-runtime. It is no longer just “using hooks.” It is building its own lifecycle language on top of the host lifecycle.

Why is this important? Because many of the hardest agent problems are not solved in the prompt. They are solved at transition boundaries. Before a tool call, you may want to inject rules or block dangerous writes. After compaction, you may want to restore task state. Before model input, you may want to validate thinking blocks or add context. These are hook-shaped problems.

OpenCode’s philosophy is minimalist leverage. Give a few strong interception points and let developers compose. Claude Code’s philosophy is bounded customization. Give some high-value hooks, but keep the product sovereign. OMO’s philosophy is to turn hooks into a programmable governance layer.

That governance role is especially visible in OMO’s tool-guard hooks. File guards, comment checkers, rules injectors, and continuation enforcers show how hook systems can improve quality and safety without changing the core model. This is one of the strongest arguments for hook-rich architectures: behavioral reliability often comes from structured interception, not just stronger prompts.

There is, however, a real cost. The richer the hook system, the harder it becomes to reason about precedence, interactions, and unintended side effects. Hook debugging is notoriously subtle. A message transformed in one phase may trigger a rule in another phase and be silently restored in a third. This is the price of power.

For platform designers, the lesson is to distinguish between host hooks and policy hooks. OpenCode demonstrates a good host surface. OMO demonstrates how a plugin can build a second-level policy surface on top of it. Claude Code demonstrates that a commercial product can expose a smaller, safer set oriented toward predictable automation.

If extensibility is about adding capabilities, hooks are about adding timing-aware control. OpenCode makes this possible. OMO radicalizes it. Claude Code domesticates it.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 16 — Extensibility Comparison
Token Usage: ~4,780 input + ~1,170 output

16.5 Manifest-Driven Discovery

Extensibility is not only about how extensions run. It is also about how they are found, loaded, and identified. This is where manifest-driven discovery becomes important. A manifest is a structured declaration of what an extension is, where it lives, and how the system should interpret it.

OpenCode uses a relatively lightweight but effective discovery model. Skills are discovered through directory scanning plus frontmatter. Commands can come from config, MCP prompts, or skills. Plugins and other artifacts are found through conventional paths and configuration. The model is not overly formal, but it is practical. Discovery is based on filesystem conventions plus small metadata headers.

That lightweight approach has advantages. It lowers friction for authors, preserves hackability, and makes extension artifacts easy to inspect. A markdown file with frontmatter is a friendly manifest. A config block defining a command is a friendly manifest. The barrier to entry remains low.

OMO extends this idea into a compatibility and orchestration strategy. It can load assets from OpenCode-style paths, Claude-style paths, .mcp.json, skill directories, command directories, and plugin-defined registries. In effect, OMO treats manifest-driven discovery as an ecosystem bridge. Discovery is not just about loading its own artifacts; it is also about recognizing neighboring formats and converting them into a common runtime view.

Claude Code’s approach is more formalized by product structure. Plugins, agents, hooks, skills, and MCP configuration typically live in expected locations with expected schemas. This gives the product stronger predictability. It also supports clearer UX around what is installed and active. Commercial systems benefit from such explicitness because supportability matters.

The deeper question is how much ceremony a manifest should impose.

Too little ceremony leads to ambiguity. Tools may be discoverable but hard to validate. Extensions may load accidentally from the wrong place. Naming conflicts become messy.

Too much ceremony raises authoring cost. Small, reusable extensions stop feeling lightweight and start feeling like packaged software releases.

The three systems sit at different points on this spectrum. OpenCode prefers lightweight discoverability. OMO prefers broad compatibility and cross-format ingestion. Claude Code prefers more controlled structure.

Manifest-driven discovery also intersects with security. Once extension loading is based on explicit metadata and declared locations, the system can validate schema, inspect provenance, warn on duplication, and apply policy before execution. This is much harder when discovery is entirely ad hoc.

Another key issue is ecosystem portability. OpenCode’s willingness to scan adjacent directory conventions is a clever move because it reduces lock-in. OMO doubles down on that portability by acting as a compatibility layer. Claude Code, by contrast, benefits from strong product conventions but may be less naturally format-agnostic.

For future agent platforms, the best design may be hybrid: lightweight manifests for content assets like skills and commands, stronger manifests for executable extensions like plugins or MCP integrations, and compatibility readers for neighboring ecosystems. That balance would preserve accessibility without giving up auditability.

So while manifest systems can seem mundane, they quietly shape the health of the extension ecosystem. OpenCode shows how far lightweight manifests can go. OMO shows how discovery can become an interoperability layer. Claude Code shows why stronger structure helps commercial reliability. Together they suggest that extensibility succeeds not just when extensions are possible, but when they are legible, discoverable, and governable.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 17 — Security Model Comparison
Token Usage: ~4,950 input + ~1,200 output

17.1 Permission Paradigms

Permission design is where coding agents reveal what they really believe about trust. Every system claims to care about safety, but the mechanism matters. Do permissions rely on static rules, learned classification, runtime guards, or some hybrid? OpenCode, OMO, and Claude Code each choose a different primary paradigm.

OpenCode leans toward a pattern-based permission model. This is a classical engineering approach. Commands, paths, or action categories are checked against explicit rules. The advantage is transparency. Developers can usually understand why a decision was made. The downside is brittleness. Pattern-based systems are only as good as the cases they enumerate, and real-world tool usage often falls into gray areas that are hard to capture with fixed rules.

Claude Code moves further toward a four-mode permission model with machine-learned classification. This is a notably modern approach. Instead of relying only on hardcoded allow/deny logic, the system can infer whether an action looks risky based on broader context. Anthropic has highlighted the impact of this strategy in reducing unnecessary prompts while still preserving guardrails. The tradeoff is familiar from all ML-mediated policy systems: you gain adaptability, but you lose some crisp explainability.

OMO adds a third layer of thinking with file-guard hooks and policy interception. Because it sits atop OpenCode as an orchestration runtime, it can insert additional safety checks before tool execution. This is not a replacement for host permissions; it is a second defensive ring. Write-existing-file guards, comment checkers, and rules injectors collectively create a more situational policy boundary. OMO’s security posture is therefore less about one monolithic permission oracle and more about many targeted runtime interventions.

These paradigms can be summarized like this:

System	Primary permission logic
OpenCode	Explicit rule and pattern matching
OMO	Host permissions plus hook-based file/policy guards
Claude Code	Multi-mode permissions plus ML classifier

Each model reflects a different trust philosophy.

Pattern-based permissions assume that dangerous behavior can be approximated through recognizable shapes. This is useful when the environment is relatively predictable and the developer values control.

ML-based permissions assume that risk cannot be fully enumerated, so the system should learn broader notions of suspiciousness. This is useful in consumer and enterprise products where user experience would suffer if every decision required static overblocking.

Hook-based guards assume that many risks only become legible at the exact boundary where an action is about to occur. This is useful when extensibility itself is rich and runtime policy needs to be composable.

The crucial point is that permission systems are not only about blocking danger. They are also about minimizing interruption. An agent that asks for approval too often becomes annoying and ineffective. An agent that asks too rarely becomes unsafe. Claude Code’s classifier-based approach explicitly tries to optimize this tension. OpenCode keeps the logic more legible. OMO adds repo-specific and workflow-specific enforcement where the host’s generic model may be insufficient.

There is also a layering lesson here. The best permission systems are rarely single-layer. Static rules are good at obvious cases. Learned classifiers are good at ambiguous cases. Hook guards are good at repository- or workflow-specific cases. These three systems, taken together, suggest a stack rather than a single answer.

For future agent platforms, the ideal permission paradigm is probably hybrid: explicit allow/deny rules for high-certainty boundaries, ML-assisted classification for ambiguous operations, and local hook-based policy for repository-specific enforcement. OpenCode contributes clarity. OMO contributes local guardrails. Claude Code contributes adaptive prompting reduction. The direction of travel is obvious: smarter permissions with more context, but still anchored by auditable rules.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 17 — Security Model Comparison
Token Usage: ~4,600 input + ~1,110 output

17.2 Sandboxing

If permission prompts are the psychological layer of safety, sandboxing is the operating-system layer. It asks a much harder question: what if the model is wrong, the prompt is bypassed, or the extension surface is abused? At that point, only execution containment remains.

Claude Code is the clear leader among the three systems here because it incorporates OS-level sandboxing, using mechanisms such as bubblewrap on Linux and Seatbelt on macOS. Anthropic has cited an 84% reduction in a key risk or permission-related burden through this broader safety strategy. The exact internal denominator matters less than the architectural lesson: once the system can contain processes at the OS boundary, many previously high-risk actions become more manageable.

This is a profound difference from relying only on prompts or policy checks. Prompts can be ignored. Policies can be misclassified. A sandbox changes what the process is materially able to access. Filesystem scope, network scope, subprocess behavior, and credential reach can all be narrowed. In security engineering terms, this is a move from intention-based safety to capability-based safety.

OpenCode does not natively offer comparable OS-level sandboxing as a defining architectural feature. Nor does OMO. That does not mean they are unsafe by default, but it does mean their protection model relies more heavily on permission logic, tool boundaries, and extension discipline. If a tool is allowed to run on the host, the host’s own privileges remain highly relevant.

This gap matters especially for autonomous coding agents because their danger is not only model hallucination. It is also the combination of code execution, shell access, file mutation, and external integrations. Sandboxing is the one control that still works even when upstream reasoning fails.

There are tradeoffs, of course. Sandboxes complicate implementation, reduce compatibility, and can frustrate users when legitimate tasks hit containment walls. Some developer workflows genuinely need broader filesystem or network access. Commercial products must therefore balance containment with usability and support load.

Still, the security lesson is hard to escape: if an agent can execute shell commands, edit files, or interact with credentials, sandboxing is one of the few controls that directly limits blast radius after a bad decision. It is not a replacement for permissions, but a backstop behind them.

OMO’s file guards and OpenCode’s permission logic remain useful, but they live at a higher layer. They are better at deciding what should happen. Sandboxing is better at limiting what can happen anyway. That distinction is central.

For future open-source agent systems, Claude Code’s design points toward an important maturity milestone. Permission UX and safety prompts are not enough. Once agents become more autonomous, OS-level containment stops being optional and starts becoming table stakes. OpenCode and OMO currently illustrate the strengths and limits of non-sandbox-first design. Claude Code illustrates the next step.

The best future architecture will likely combine all three layers: clear permissions, runtime policy guards, and OS sandboxing. In that stack, sandboxing is the last line of defense—and often the most honest one.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 17 — Security Model Comparison
Token Usage: ~4,880 input + ~1,180 output

17.3 Supply Chain Security

The more extensible an agent platform becomes, the more it inherits a software supply chain problem. Plugins, skills, commands, MCP servers, manifests, remote skill URLs, and third-party tool integrations all expand capability—but they also expand trust dependencies.

OpenCode, by design, is highly extensible and therefore highly exposed to supply chain questions. Remote skills, externally sourced MCP endpoints, and powerful plugins mean that the host can ingest assets from outside its immediate codebase. This is excellent for ecosystem growth and terrible if provenance is weak. Open platforms win innovation by accepting more trust surface.

OMO inherits all of that and adds its own complexity. Because it can load and interoperate across multiple extension formats, its discovery layer is also a compatibility layer. Compatibility is great for migration and reuse, but it widens the space of things that must be audited. The more formats and locations the system accepts, the more careful it must be about validation, naming collisions, and untrusted code or prompts.

Claude Code is structurally more conservative, but it still faces the same basic problem. Skills, plugins, hooks, and MCP integrations create attack paths if installation or activation is not reviewed carefully. Commercial products usually respond with stronger schema enforcement, more product-level structure, and tighter default trust boundaries.

The key security issue is that not all extensions are equal.

Plugins are executable and therefore highest risk.

MCP servers are external capability bridges and can be high risk depending on scope.

Skills and commands are content-first and generally lower risk, but can still manipulate behavior or socially engineer the model into poor choices.

Manifests matter because they determine how extensions are discovered and declared; weak discovery controls can become a supply chain weakness of their own.

OMO’s skill-guardian concept is especially important here. It recognizes that installation itself is a security event. Before adding a skill, the system should inspect provenance, relevance, and risk. This is the right mindset. Supply chain safety is not something you bolt on after installation; it begins at discovery and admission time.

There are a few best practices that emerge across the comparison.

First, executable extensions should require stronger trust than content extensions.

Second, remote assets should be pinned, checksummed, signed, or at least sourced through explicit reviewable locations whenever possible.

Third, extension manifests should be schema-validated before activation.

Fourth, the system should keep clear provenance metadata so users know what came from where.

Fifth, installation should be treated as policy, not convenience.

The broader lesson is that agent ecosystems are converging toward the same supply chain questions long familiar in package managers and browser extensions. Who authored this? What can it do? How is it updated? What trust does it inherit? OpenCode provides the open ecosystem case. OMO adds the complexity of compatibility and orchestration-aware loading. Claude Code shows the case for stronger product curation.

As agent platforms mature, supply chain security will likely become one of the main differentiators between hobbyist extensibility and production extensibility. OMO’s skill-guardian is an early signal of that maturity.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 17 — Security Model Comparison
Token Usage: ~4,900 input + ~1,190 output

17.4 Credential Isolation

Credential isolation is one of the least glamorous topics in agent architecture and one of the most important. As soon as an agent platform supports plugins, MCP servers, cloud APIs, package registries, or third-party integrations, the question becomes unavoidable: who gets direct access to secrets?

The safest answer is simple and should be treated as a design law: plugins must never have direct access to core credentials. Not to the primary model key, not to repository-wide secrets, not to global tokens unless explicitly and separately granted.

OpenCode’s openness makes this especially important. A highly extensible host invites innovation, but it also increases the chance that third-party code will run near privileged operations. If extensions can casually inherit the host’s full credential context, then one plugin compromise can become a total platform compromise.

OMO intensifies the need for isolation because it introduces more orchestration layers, more extension types, and more indirect execution paths. A background agent, a skill-embedded MCP bridge, or a compatibility-loaded component should not accidentally inherit secrets just because they are participating in a task. The richer the extension graph, the more carefully credentials must be segmented.

Claude Code’s commercial posture makes credential isolation more obviously central. Enterprise-ready systems cannot rely on goodwill alone. They need principled separation between product-core secrets, user-provided tokens, extension-scoped secrets, and ephemeral task-scoped credentials.

The best pattern here is the scoped token proxy. Instead of handing extensions raw core credentials, the host exposes a broker or proxy that issues narrowly scoped, purpose-limited access on demand. The extension asks for a capability, not a secret. The host decides whether to grant it, under what scope, for how long, and with what audit record.

This design has several advantages.

First, the extension never sees the most sensitive root credential.

Second, permissions can be limited by service, repository, operation type, or time window.

Third, revocation becomes practical because the host controls the proxy layer.

Fourth, auditability improves because capability grants can be logged as discrete events.

Fifth, secret rotation becomes easier because extensions are not tightly coupled to long-lived raw tokens.

Credential isolation also interacts with sandboxing. Even if an extension has some scoped access, OS-level containment can reduce how easily it exfiltrates or expands that access. This again shows why Claude Code’s sandboxing direction matters.

There is also a developer-experience challenge. Secret management that is too rigid becomes painful. Engineers will route around it. So the right model is not maximal inconvenience; it is clear separation plus ergonomic brokering. Good platforms make safe credential access the path of least resistance.

Across the three systems, the comparative lesson is straightforward. OpenCode highlights why open extension hosts need strict secret separation. OMO highlights how orchestration complexity multiplies credential paths. Claude Code highlights why commercial systems must formalize isolation as infrastructure rather than convention.

The future of secure agent extensibility will depend on this principle: extensions should be granted capabilities, not inherited trust. Once that principle is violated, every new plugin, skill bridge, or MCP connector becomes a potential credential leak. Once it is respected, extensibility can grow without turning the host into a secret-sharing accident.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 18 — Architecture of the Ideal Coding Agent
Token Usage: ~7,400 input + ~1,950 output

18.1 Consensus Architecture

By 2026, the industry consensus for a serious coding agent is no longer mysterious. Different products still disagree on branding, packaging, and monetization, but the architectural center of gravity is increasingly clear. The ideal coding agent is not just a large prompt wrapped around a model. It is a layered system with a small strategic core and several highly disciplined supporting subsystems: a minimal high-altitude system prompt, a structured tool registry, a compact context manager, a ReAct-style execution loop, sub-agent delegation for parallelized or isolated work, and an OS-level sandbox that turns autonomy from a risk into a practical operating mode.

This consensus did not emerge from theory alone. It emerged because naive designs kept failing. Huge system prompts became brittle. Tool lists without strong descriptions became ambiguous. Unmanaged context became bloated and contradictory. Single-loop agents became too slow for broad tasks. Uncontained execution made autonomy politically and operationally unacceptable. The resulting lesson is that the best coding agent is not the one that stuffs the most instructions into the model. It is the one that distributes responsibility to the right layer.

The most important shift is that the system prompt becomes smaller, not larger. Early agent builders often tried to encode the entire product in prompt text: role, policy, formatting, style, tool tutorials, workflow, safety, edge cases, domain knowledge, and fallback behavior. That approach works for demos and fails at scale. A production coding agent needs a system prompt that is high-altitude, stable, and durable across tasks. Its job is to define identity, task orientation, reasoning discipline, and global boundaries. It should not be the primary home for all operational detail.

That operational detail belongs in the tool registry and surrounding scaffolding. A tool registry is not just a list of functions. It is the machine-readable contract that tells the model what actions exist, what they do, what arguments they accept, what side effects they have, and when they should or should not be used. In practice, this increasingly means MCP clients, native tools, and host-managed wrappers all exposed through a unified abstraction. The ideal agent sees tools through one coherent interface even if the underlying implementation comes from local built-ins, remote MCP servers, LSP services, or platform plugins.

This is where OpenCode, OMO, and Claude Code converge in spirit even though they differ in execution. OpenCode shows the value of a clean programmable host. OMO demonstrates that tools can also be orchestration primitives, not merely file or shell utilities. Claude Code shows that the host, not just the model, should enforce safety and execution structure. Together they imply a consensus stack.

┌─────────────────────────────────────────────────────────────┐
│                    USER / IDE / CLI / API                  │
└────────────────────────────┬────────────────────────────────┘
                             │
                             v
┌─────────────────────────────────────────────────────────────┐
│   MINIMAL SYSTEM PROMPT                                    │
│   - identity and scope                                     │
│   - global safety boundaries                               │
│   - planning / reasoning posture                           │
└────────────────────────────┬────────────────────────────────┘
                             │
                             v
┌─────────────────────────────────────────────────────────────┐
│   CONTEXT MANAGER                                          │
│   - compact conversation state                             │
│   - JIT retrieval                                          │
│   - structured notes / memory                              │
│   - compaction / overflow recovery                         │
└────────────────────────────┬────────────────────────────────┘
                             │
                             v
┌─────────────────────────────────────────────────────────────┐
│   REACT EXECUTION LOOP                                     │
│   think → choose tool / delegate → observe → update state  │
└───────────────┬───────────────────────────────┬─────────────┘
                │                               │
                v                               v
┌──────────────────────────────┐   ┌──────────────────────────┐
│ TOOL REGISTRY / MCP CLIENTS  │   │   SUB-AGENT ORCHESTRATOR │
│ - file / git / search / LSP  │   │ - explore / review / doc │
│ - browser / test / deploy    │   │ - parallel isolated work │
│ - typed schemas + docstrings │   │ - result summarization   │
└───────────────┬──────────────┘   └──────────────┬───────────┘
                │                                 │
                └──────────────┬──────────────────┘
                               v
┌─────────────────────────────────────────────────────────────┐
│   HOST RUNTIME + POLICY LAYER                              │
│   - permissions, approvals, auditing, cost controls        │
│   - retries, rate limits, provenance, policy checks        │
└────────────────────────────┬────────────────────────────────┘
                             │
                             v
┌─────────────────────────────────────────────────────────────┐
│   OS-LEVEL SANDBOX                                          │
│   - filesystem scope   - network policy                    │
│   - process isolation  - secret boundary                   │
└─────────────────────────────────────────────────────────────┘

The context manager is the most underrated layer in this diagram. Prompt engineering taught people how to write instructions. Context engineering teaches them how to control the full token state. A coding agent rarely fails because it lacks raw intelligence; it fails because it is carrying the wrong state at the wrong time. The ideal context manager therefore does four things well.

First, it keeps the active context compact. The agent should not carry entire transcripts, giant diffs, and irrelevant files when it only needs the current task, a few constraints, and a short task history.

Second, it retrieves information just in time. If a file, note, documentation chunk, or previous session detail is relevant, it should be injected when needed rather than pinned permanently in the prompt.

Third, it writes structured notes. A compact summary of “what matters now” is often more valuable than replaying dozens of raw exchanges. This is why memory systems increasingly look like disciplined state objects rather than scrapbook logs.

Fourth, it handles overflow gracefully. The ideal agent assumes context windows will sometimes be exceeded and treats compaction as a first-class workflow, not as an emergency patch.

On top of that sits the ReAct loop, still the dominant execution pattern for coding agents. ReAct here means an iterative cycle in which the model reasons, acts through tools, observes outcomes, and revises its working state. This remains the right primitive because coding work is neither purely conversational nor fully scriptable. The agent must inspect files, test hypotheses, compare outputs, and adapt to unexpected states. However, the 2026 consensus version of ReAct is tighter than the early form. It is less about verbose chain-of-thought and more about disciplined state transition. The loop should expose just enough thought to guide tool use and state updates, while the host runtime handles logging, policy, retries, and safety checks around the loop.

The next layer of maturity is sub-agents. A single agent can solve many tasks, but coding work often benefits from isolation and parallelism. One sub-agent can explore a codebase. Another can research documentation. Another can verify a hypothesis or run a review pass. The critical insight is that sub-agents are not magic; they are controlled context partitions. Their value comes from narrowing scope, reducing interference, and allowing concurrent work. OMO pushes this idea furthest through explicit orchestration. Claude Code uses it more selectively. OpenCode provides the underlying programmability that makes such patterns composable.

Still, sub-agents are only valuable if orchestration quality is high. An ideal architecture does not spawn agents casually. It uses them when at least one of three conditions holds: parallel work materially reduces latency, scope isolation materially reduces confusion, or specialized prompts materially improve execution. Otherwise, delegation becomes ceremony.

Finally, the OS-level sandbox is what makes the whole architecture socially deployable. Without sandboxing, every autonomy improvement increases fear. With sandboxing, autonomy becomes governable. Filesystem limits, network controls, process isolation, credential boundaries, and audit trails allow the host to grant broad operating freedom inside a bounded environment. This is the crucial lesson from Claude Code’s direction: safety is not a speed bump after the agent; it is infrastructure beneath the agent.

Put differently, the consensus architecture is a layered answer to six different failure modes. Minimal system prompts prevent instruction sprawl. Tool registries prevent action ambiguity. Context managers prevent token rot. ReAct loops prevent passivity. Sub-agents prevent monolithic overload. Sandboxes prevent operational distrust. None of these layers alone is sufficient. Together they turn a model into a coding system.

This is also why “best model wins” is no longer an adequate framing. The ideal coding agent in 2026 is a coordination machine. The model remains central, but increasingly as the planner and interpreter inside a larger architecture. If we ask what an ideal agent should look like, the answer is not a giant prompt and a giant context window. It is a restrained prompt, a precise tool surface, a disciplined context pipeline, a robust ReAct loop, selective delegation, and a trustworthy execution boundary. That is the current consensus because every serious system has been forced toward it by reality.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 18 — Architecture of the Ideal Coding Agent
Token Usage: ~7,000 input + ~1,900 output

18.2 Five Defining Tensions

The architecture of a coding agent is not defined by a checklist of features. It is defined by how the system resolves a handful of persistent tensions. These tensions do not disappear as the models improve. In fact, better models often intensify them because stronger autonomy, broader context windows, and richer tool ecosystems make the trade-offs more consequential, not less. Across OpenCode, OMO, and Claude Code, five tensions stand out as the most important: autonomy vs safety, context richness vs context rot, generality vs specialization, simplicity vs capability, and open vs closed.

1. Autonomy vs Safety

Every coding agent wants to be useful without becoming reckless. More autonomy usually means fewer confirmation prompts, longer action chains, and more initiative. But more autonomy also increases the blast radius of mistakes. The wrong rm, the wrong commit, the wrong deploy, or the wrong network call can turn “helpful” into “expensive” very quickly.

OpenCode resolves this tension by leaning toward developer control. It provides a programmable host and a flexible tool model, but historically places more burden on the operator and extension author to decide what should be allowed. This makes it excellent for experimentation and powerful customization, but it means safety depends more on surrounding usage patterns and less on strongly opinionated product enforcement.

OMO resolves the tension through orchestration discipline rather than through a deep security substrate alone. It introduces more autonomous behaviors, including background work, specialized agents, and workflow-level continuation, but counterbalances this with stronger prompt rules, todo discipline, guard hooks, and structured execution patterns. In other words, OMO often tries to make the agent behave better by improving the orchestration logic above the core host.

Claude Code resolves this tension most aggressively through productized safety infrastructure: permission systems, classifier-assisted reductions in human interruption, explicit approval semantics, policy layers, and increasingly sandbox-oriented execution. Its lesson is that if safety is embedded into runtime architecture, autonomy can be increased with less fear. It does not treat safety as the opposite of autonomy. It treats safety as the condition that makes higher autonomy deployable.

2. Context Richness vs Context Rot

A coding agent needs context. It needs user intent, repository state, tool outputs, prior decisions, conventions, and sometimes even historical work. Yet the more context it carries, the greater the chance that the context becomes stale, contradictory, redundant, or simply noisy. Richness helps understanding; rot destroys precision.

OpenCode teaches the foundational discipline here: context should be structured and managed, not merely appended. Its session and compaction logic recognize that long-running conversations require state management, not just bigger windows. But OpenCode’s orientation remains relatively general-purpose, so the policy for what should stay resident versus be compacted is not always as opinionated as a specialized orchestration layer might prefer.

OMO pushes harder on active context shaping. Its hooks, rules injection, continuation logic, and orchestration prompts treat context as something to curate dynamically. It is especially strong in understanding that the right context is not always “more context.” A delegated agent often performs better with a narrower prompt and a smaller task-specific slice of state. OMO therefore resolves this tension by using selective injection and task partitioning to keep context purposeful.

Claude Code resolves it through memory systems, auto-compaction strategies, and a more explicit distinction between persistent memory and transient task state. It treats context overflow as an expected operational problem. Rather than hoping the model can sort everything out, the product compacts, summarizes, and re-anchors. This is a more industrial answer: context management is a subsystem, not a habit.

3. Generality vs Specialization

A general coding agent is easy to explain: one assistant, many tasks. A specialized agent system is harder to explain but often easier to scale: one agent explores, another reviews, another researches, another plans. The tension lies in whether the architecture should optimize for broad adaptability or focused role performance.

OpenCode leans general. It gives developers a broad substrate for building agents, tools, plugins, and interfaces. The host itself is not deeply committed to one worldview about specialization. This is a strength because it preserves programmability. But by itself it does not force a specialized orchestration strategy.

OMO leans specialized, but importantly not through rigid hardcoding alone. Its real contribution is demonstrating that specialization can be prompt-defined, tool-scoped, and workflow-aware. It builds a taxonomy of roles and uses orchestration as the mechanism for deciding when specialization is worth the cost. This is a more mature answer than simply adding many agents. It says specialization should exist where it reduces cognitive collision.

Claude Code sits between the two. Its dominant experience is still a unified product agent, but it increasingly supports specialized modes, task delegation, and surrounding structures that let narrower roles emerge when useful. It does not foreground a theatrical swarm identity, but it accepts specialization where it materially improves outcome quality or task throughput.

4. Simplicity vs Capability

Every new feature makes the system more capable and potentially harder to reason about. Every simplification makes it easier to operate and potentially less powerful. The trap is obvious in both directions. Too simple, and the system cannot support serious workflows. Too capable, and it becomes a maze of overlapping tools, prompts, hooks, and modes.

OpenCode is notable because it chooses a relatively clean host architecture while still exposing significant extensibility. Its simplicity is architectural rather than minimalist in absolute feature count. It tries to keep the substrate legible. This is why it is such a strong foundation: capability can be added without making the core unintelligible.

OMO embraces more complexity because it is trying to solve orchestration problems that simpler systems leave to the user. But its best insight is that capability should be added in structured layers: hooks, skills, categories, background tasks, continuation systems, embedded MCPs. The complexity is real, but it is organized around specific orchestration functions. When OMO works well, it shows that complexity can be justified if it creates leverage rather than confusion.

Claude Code attempts to hide complexity behind product polish. Internally it may have many tools, classifiers, compaction strategies, modes, and command systems, but the user experience aims to remain coherent. This is the commercial answer: capability may expand, but the interface should still feel simple. The cost is that some internal power becomes less visible or less user-programmable.

5. Open vs Closed

This is the deepest strategic tension. Open systems maximize inspection, modification, and community experimentation. Closed systems maximize consistency, controlled safety, and integrated product quality. Both produce real value. Neither is universally superior.

OpenCode resolves this tension strongly on the open side. Its identity is inseparable from programmability, extension, and inspectability. This makes it ideal for researchers, tool builders, and developers who want to treat the agent as a platform. The trade-off is that open systems inherit more integration burden and less centralized quality control.

OMO is also open, but in a layered way. It uses openness not just to expose a base system, but to construct a meta-system on top of it. In this sense, OMO demonstrates a second-order argument for openness: open platforms do not merely allow customization; they allow entirely new orchestration philosophies to be built without forking the host.

Claude Code resolves the tension on the more closed side, though not absolutely closed. It exposes extension surfaces, but under product boundaries. The advantage is reliability, safety integration, and coherent user experience. The trade-off is that some forms of experimentation remain impossible or require official support. Its architecture says, in effect, that not every powerful mechanism should be user-accessible if product integrity would suffer.

The Comparative Pattern

What matters is not which system “wins” each tension in the abstract. What matters is that each system reveals a valid design stance.

OpenCode says: preserve an understandable and programmable foundation.

OMO says: use that foundation to push orchestration sophistication where it most improves task execution.

Claude Code says: if you want mass adoption and higher-trust autonomy, you must invest heavily in runtime safety, context handling, and polished control surfaces.

The ideal coding agent for 2026 is therefore not a copy of any one of the three. It is a synthesis. It should inherit OpenCode’s openness where programmability matters, OMO’s orchestration intelligence where task decomposition matters, and Claude Code’s safety-first runtime where real-world autonomy matters. The defining tensions remain, but the best systems no longer pretend they can eliminate them. They resolve them deliberately, layer by layer, with clear priorities.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 18 — Architecture of the Ideal Coding Agent
Token Usage: ~6,000 input + ~1,500 output

18.3 Lessons from Three Systems

When we compare OpenCode, OMO, and Claude Code deeply enough, three design lessons become hard to ignore. Each lesson is stronger than a product slogan and more useful than a feature list. OpenCode teaches that programmability is non-negotiable. OMO teaches that orchestration quality matters more than raw agent count. Claude Code teaches that safety enables autonomy rather than constraining it. Together, these three ideas form a practical doctrine for building the next generation of coding agents.

OpenCode: Programmability Is Non-Negotiable

The most durable insight from OpenCode is that a coding agent should be a platform, not just a personality. If the agent cannot be extended, inspected, adapted, and embedded into different workflows, then its usefulness will plateau quickly. Coding work is too heterogeneous for one fixed interface and one fixed workflow to satisfy everyone.

Programmability matters at several levels. It matters at the tool level, where developers need to add new capabilities or wrap existing ones. It matters at the interface level, where the same engine may need to power CLI, TUI, IDE, API, or automation flows. It matters at the configuration layer, where teams need to express policy, defaults, memory, and workflow conventions. And it matters at the architectural level, where other systems may want to build on top of the agent rather than merely use it interactively.

OpenCode’s contribution is not just that it is open source. Plenty of open-source systems are still rigid. Its contribution is that it is structured in a way that invites composition. That is why it can function as an engine for things larger than itself. Once we accept this, a design principle follows: never build a coding agent whose useful behavior cannot be programmatically shaped. If a team cannot bend the agent to its own repository practices, runtime policies, or product surface, the agent remains a demo with good marketing.

OMO: Orchestration Quality Beats Agent Count

OMO’s most important lesson is a correction to the simplistic multi-agent hype cycle. The breakthrough is not “more agents.” The breakthrough is better orchestration. A badly orchestrated swarm is usually worse than one disciplined agent. It costs more, takes longer, duplicates work, pollutes context, and makes failures harder to localize.

OMO earns its importance because it treats orchestration as a first-class design problem. It asks which agent should do what, under which prompt, with which tools, under which continuation rules, and with which transfer of findings back to the parent workflow. It recognizes that delegation is valuable only when scope is well chosen and result integration is well controlled.

This lesson generalizes far beyond OMO itself. Whenever builders discuss sub-agents, the right questions are not “how many?” or “what cool roles can we invent?” The right questions are: what work benefits from isolation, what work benefits from parallelism, what work benefits from specialization, and how should findings be reintegrated? Agent count is a surface metric. Orchestration quality is the real performance variable.

An ideal coding system therefore uses delegation sparingly but powerfully. It spawns a sub-agent because the task boundary is real, not because the architecture wants to appear sophisticated. This is why OMO matters. It shows that orchestration is a systems problem, not a branding trick.

Claude Code: Safety Enables Autonomy

Claude Code’s deepest contribution is architectural rather than rhetorical. It demonstrates that safety mechanisms should not be viewed as obstacles placed in front of autonomy. When designed properly, they are the very things that make autonomy acceptable at scale.

Without good safety structure, every gain in initiative creates anxiety. Users hesitate to trust the agent with filesystem changes, network access, long execution chains, or multi-step workflows because the downside risk feels unbounded. In that environment, autonomy becomes politically fragile. Teams either disable it, heavily gate it, or use it only for low-stakes tasks.

Claude Code points toward a better answer: move safety down into the runtime and policy layers. Use permissions, approvals, classifiers, compaction, auditability, and especially sandboxing so that the agent can operate with meaningful freedom inside meaningful constraints. This reframes safety from “friction” to “infrastructure.” Once that shift happens, autonomy can rise without producing the same level of institutional fear.

This is a very important lesson for open systems as well. If they want to match or exceed commercial systems in real-world adoption, they cannot rely only on openness and clever prompting. They need trustworthy execution boundaries. Otherwise the architecture will remain attractive to enthusiasts but difficult to deploy in more demanding environments.

The Combined Doctrine

Taken together, the three lessons point toward a coherent doctrine for ideal coding-agent design.

First, make the system programmable. If it cannot be extended and recomposed, it will not survive contact with real engineering diversity.

Second, make orchestration intelligent. If you add delegation, do it because task structure demands it, not because multi-agent branding sounds advanced.

Third, make safety infrastructural. If you want real autonomy, you must create bounded environments in which that autonomy can operate safely.

These are mutually reinforcing. Programmability without safety can become dangerous. Safety without programmability can become rigid. Orchestration without either can become expensive theater. But when the three are combined, the result is a system that is adaptable, powerful, and trusted.

That is the real legacy of the three systems. OpenCode gives us the platform instinct. OMO gives us the orchestration instinct. Claude Code gives us the deployment instinct. A builder who learns from all three will stop asking whether the future belongs to open source or commercial products, single agents or multi-agents, prompts or tools. The future belongs to architectures that can be shaped, coordinated, and trusted.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 19 — The Art of Tool Design
Token Usage: ~6,400 input + ~1,700 output

19.1 ACI Principles

If the twentieth century taught software builders to invest in HCI, the twenty-first century will require them to invest just as seriously in ACI: Agent-Computer Interface. HCI asks how a human understands and operates a system. ACI asks how a model understands and operates a tool environment. These are related problems, but they are not identical. A tool that looks straightforward to a human engineer may still be deeply confusing to an agent, especially when multiple tools overlap, descriptions are vague, argument schemas are inconsistent, or outputs are harder to interpret than the raw task itself.

The practical design rule is simple: invest as much in ACI as in HCI. If you would carefully label a button, design a clear menu, and prevent a human from making obvious mistakes, you should do the same for an agent-facing tool surface. Tool names, descriptions, parameter schemas, side-effect warnings, and examples are not secondary metadata. They are the agent’s operating interface.

This is why tool descriptions should be written like docstrings for junior developers. That phrase is useful because it prevents two common failures. First, it prevents writing descriptions that are so short they are useless, such as “Runs command” or “Reads file.” Second, it prevents writing descriptions that are so dense and abstract that they assume hidden expertise. A good docstring for a junior developer explains what the tool does, when to use it, when not to use it, and what pitfalls matter. The same is true for agent tools.

Consider the difference between two descriptions:

“Searches files.”
“Search file contents using regex. Prefer this over manual file reading when you need to locate text patterns across many files. Returns matching lines or filenames. Do not use for filename discovery.”

The first is technically true and operationally weak. The second is longer but strategically stronger because it improves tool selection. The cost of a few extra tokens in the description is tiny compared with the cost of repeated wrong-tool decisions during execution.

ACI design therefore starts with clarity of affordance. In HCI, an affordance is a design cue that suggests how an object should be used. In ACI, the affordance is textual and structural. The agent needs to infer from the tool signature not only capability but intent. A good tool definition answers four questions immediately:

What problem does this tool solve?
What input shape does it expect?
What are its side effects or constraints?
What nearby tools should be preferred or avoided in similar situations?

The fourth question is particularly important because most tool confusion is not about whether a capability exists. It is about choosing between partially overlapping capabilities. If the system exposes glob, grep, read, and lsp_symbols, then the agent must understand not merely each tool in isolation, but the boundaries between them. Good ACI makes these boundaries explicit.

This is where poka-yoke design becomes highly relevant. Poka-yoke is a manufacturing and quality-engineering term, often translated as “mistake-proofing.” The core idea is to design the system so that common errors are hard or impossible to make. For agent tools, poka-yoke means building descriptions, schemas, defaults, and runtime checks that gently force correct usage patterns.

Examples include:

making workdir explicit so shell tools do not encourage fragile cd && ... patterns;
warning the agent not to use bash for file reads when a safer dedicated file tool exists;
requiring strongly typed parameters rather than freeform blobs when precision matters;
annotating destructive tools with clear guardrails;
constraining tool outputs to useful sizes so the agent does not drown in noise;
documenting recommended alternatives directly in the tool help text.

The ideal agent tool is not merely capable. It is shaped so that the likely first choice is also the correct choice. That is pure poka-yoke.

Another essential ACI principle is semantic consistency. If one tool uses filePath, another uses path, another uses filename, and a fourth uses target, the agent must continuously translate between similar concepts with different names. Humans tolerate this kind of inconsistency surprisingly well. Models are more brittle. Consistent naming conventions reduce cognitive switching cost and improve tool reliability. The same is true for return shapes. If one search tool returns raw content, another returns filenames, and another returns opaque JSON without strong descriptive framing, the agent will spend tokens just normalizing understanding.

ACI also benefits from progressive disclosure. Not every tool description needs to include everything. But the critical path should be easy to parse. Put the primary use case first. Put strong “do/don’t” guidance near the top. Reserve edge cases and rare notes for lower in the description. This mirrors good API documentation for humans, but is arguably even more important for agents because tool selection often happens under token pressure and within an active execution loop.

OpenCode contributes an important lesson here through its programmable and typed approach to tool definition. A clean substrate encourages coherent tool registration. OMO contributes the lesson that ACI is not just about basic utilities; orchestration tools also need sharp descriptions or the agent will misuse delegation. Claude Code contributes the lesson that product-grade tool design includes strong routing hints and anti-footgun guidance directly in the tool contract.

Taken together, these systems suggest that tool builders should maintain an ACI checklist. For every tool, ask:

Is the name action-oriented and specific?
Does the description explain when to use it?
Does it explicitly say when not to use it?
Are arguments typed and named consistently with the rest of the system?
Are side effects and safety boundaries clear?
Is there overlap with another tool, and if so, is the distinction documented?
Does the output shape minimize post-processing burden?

The broader point is strategic. Agents do not “just figure it out” from raw capability. They operate through interfaces. A poor ACI layer silently taxes every task: more wrong tool picks, more corrections, more retries, more token waste, more brittle workflows. A strong ACI layer quietly improves everything: faster planning, cleaner execution, lower error rate, and more predictable autonomy.

In human-computer history, interface quality often separated useful systems from frustrating ones. The same is now true for agent systems. Tool design is not plumbing. It is cognition infrastructure. If you want reliable coding agents, write tool contracts with the same care that great software teams once reserved for user interfaces. That is what it means to take ACI seriously.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 19 — The Art of Tool Design
Token Usage: ~6,300 input + ~1,650 output

19.2 Minimal Tool Set

One of the easiest mistakes in coding-agent design is to assume that more tools automatically mean more power. In practice, the opposite is often true. A bloated tool surface increases selection ambiguity, duplicates capabilities, complicates descriptions, and raises the planning burden on the model. The right principle is ruthless but useful: if a human cannot confidently choose which tool to use, an agent probably cannot either.

This is not because agents are weak. It is because tool selection is itself a cognitive task. When several tools overlap heavily, the model must spend tokens deciding among them before it can even begin solving the actual user problem. If those distinctions are poorly drawn, performance degrades quietly. The agent may still complete the task, but with more hesitation, more wrong turns, and more corrective loops.

That is why a strong tool ecosystem usually starts with a minimal tool set. “Minimal” does not mean tiny in absolute count. It means minimal overlap. Each tool should justify its existence by covering a distinct operational need, not by providing a stylistic variation of an existing capability.

Imagine a system with six ways to discover information in code: a shell search, a filename globber, a regex content searcher, a file reader, an AST matcher, and an LSP query. This can be excellent if each tool owns a clear slice of the problem space. It becomes terrible if their boundaries are muddy. The system should make it obvious that filename discovery belongs to glob, content search belongs to grep, semantic symbol lookup belongs to LSP, and syntax-tree pattern matching belongs to AST search. If those distinctions blur, the tool set may be larger but the effective capability is lower because the agent’s decision cost rises.

This is where the minimal tool set principle becomes operational. When designing or reviewing a tool inventory, ask three questions.

First, what distinct task does this tool uniquely enable or substantially improve?

Second, if this tool did not exist, what would the agent use instead, and would that fallback be materially worse?

Third, does the addition of this tool make the overall system easier to use or harder to reason about?

The third question is the one teams most often skip. A new tool may be individually useful and still systemically harmful if it introduces ambiguity. For example, a second file search tool that is slightly faster but semantically similar to the first may not be worth adding unless the distinction is dramatic and well documented.

The principle “start small, grow by need” follows naturally. Early in a system’s life, builders should resist the temptation to expose every internal helper as a first-class tool. Instead, begin with a small set of high-frequency, high-leverage tools: read, search, edit, run commands, inspect symbols, maybe fetch web content, maybe delegate a task. Then observe actual usage. Where do agents struggle? What transformations do they repeatedly simulate mentally that a tool could perform deterministically? Which tasks are expensive because the model is doing procedural work that the host could do faster and more reliably? Add tools only where recurring pain justifies new surface area.

OpenCode’s lesson here is that a good substrate can host many tools, but the existence of extensibility does not imply that all possible tools should be exposed by default. OMO adds a subtle insight: some of the most valuable tools are not end-user utilities but orchestration primitives. A task-delegation tool can create more leverage than five narrowly specialized low-level helpers because it changes how work is organized. Claude Code’s broader tool inventory shows both the upside and the tension: many tools can be justified in a mature product, but only if routing guidance and interface clarity are correspondingly strong.

Minimality also improves safety. Fewer overlapping destructive paths mean fewer ways for the agent to make the wrong move. If file modification can happen through a single clearly preferred mechanism, the host can wrap that mechanism with stronger validation, better previews, and more consistent auditability. When the same side effect is scattered across many interfaces, enforcing safety becomes harder.

Another advantage is compositional clarity. Small, distinct tools tend to combine better than medium-sized overlapping tools. A clean stack might look like this: locate candidates with glob, search content with grep, read the chosen file with read, navigate semantically with LSP, then edit with a structured patch tool. Each step has a clear role. By contrast, a bloated system may offer hybrid tools that do several of these at once, making it harder for the model to build predictable workflows.

There is also a maintenance argument. Every additional tool has ongoing cost: documentation, testing, versioning, safety review, interaction effects, prompt references, and support burden. Tool count therefore should be treated like API surface area in a public platform. Once exposed, it becomes part of the agent’s mental world and the platform’s compatibility burden.

The strongest version of the principle can be stated plainly: a tool should earn its place by reducing total system cognition, not by increasing theoretical capability alone. If it adds capability but increases ambiguity more than it reduces work, it is a net loss.

That does not mean all mature systems should have the same tool count. Different products sit at different points on the spectrum. But the discipline should remain the same. Prefer fewer, sharper tools over many fuzzy ones. Grow from observed need. Document boundaries aggressively. Remove or hide weak overlaps where possible.

In agent design, a minimal tool set is not asceticism. It is respect for the finite planning bandwidth of the model. The best systems do not merely ask “what can we expose?” They ask “what should this agent have to think about at all?” The more of that answer you can eliminate through careful tool curation, the more reliable the agent becomes.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 19 — The Art of Tool Design
Token Usage: ~6,500 input + ~1,750 output

19.3 Cognitive Cost of Tool Output

Most discussions of tool design focus on capability selection: what tools should the agent have? An equally important question is often neglected: what cognitive work does the tool output force the model to do? In agent systems, output shape is not a cosmetic concern. It directly affects reliability, latency, and token efficiency.

Here is the central principle: operations should be routed to the cheapest reliable executor. If a deterministic algorithm can do the work, the host should do it. If a database query can do it, use the query. If a sorted list can be returned already sorted, return it sorted. Do not hand the model a raw pile of material and make it simulate basic computation at inference time unless interpretation genuinely requires model judgment.

The classic example is sorting. Asking an LLM to sort a list is expensive and less reliable than running a sorting algorithm. The machine already knows how to do that perfectly. The model should receive the list in the right order, not waste tokens producing order. This sounds obvious when stated that bluntly, yet tool ecosystems frequently violate the principle in subtler ways.

A search tool may dump hundreds of raw matches when the real need is the top ten grouped by file. A directory reader may return entries in arbitrary order even though sorted output would improve stability. A diagnostics tool may return unstructured text when severity-grouped structured JSON would make prioritization easier. A diff tool may emit noise that the host could have filtered. In all these cases, the system is charging the model for work the computer could do more cheaply and accurately.

This is the hidden cognitive cost of tool output. Every token the model spends cleaning, deduplicating, sorting, grouping, filtering, or normalizing machine-produced data is a token not spent on higher-level reasoning. Worse, these low-level transformations are exactly where models can become inconsistent. The result is a double penalty: higher cost and lower reliability.

The tool designer should therefore ask, for every output: what part of this information is useful signal, and what part is mechanical burden? Useful signal supports decisions. Mechanical burden is structure work the host could have performed already.

A strong output design often has four traits.

First, it is pre-structured. Information comes back in a shape aligned with the likely next decision. For example, a search tool can expose content, files_with_matches, and count modes. This lets the agent choose the right abstraction level instead of always receiving maximal raw detail.

Second, it is bounded. Outputs should not explode by default. If a result may be huge, the system should support limits, paging, or summarized views plus a path to fetch more. This keeps the agent from drowning in irrelevant detail.

Third, it is stable. Ordering should be deterministic where possible. Field names should remain consistent. Repeated calls on unchanged state should look similar. Stability reduces confusion and makes reasoning chains more reproducible.

Fourth, it is purpose-aware. A tool should return what is useful for action, not merely whatever is easy for the underlying library to emit. A file reader with line numbers is more actionable than a raw blob. A diagnostics tool grouped by severity is more actionable than an undifferentiated dump.

This principle also tells us when not to use the model. Some operations belong to procedural execution, not inference. Examples include sorting, exact counting, schema validation, regular expression filtering, topological traversal, AST transforms, permission checks, and deterministic merges of structured state. The host runtime, not the LLM, should own these whenever possible.

The best coding-agent architectures increasingly route work across three executor classes:

Deterministic executor: algorithms, parsers, sorters, validators, structured transforms.
Retrieval executor: search engines, indexes, databases, LSPs, MCP servers.
Model executor: interpretation, prioritization, synthesis, planning, explanation, ambiguity resolution.

Trouble begins when the model is forced to impersonate the first two classes. That is often what people mean when they say an agent feels “dumb” or “wasteful.” The model is not failing at intelligence. The architecture is assigning it the wrong labor.

OpenCode’s clean separation of tool responsibilities already points in this direction. OMO reinforces it by adding orchestration tools that move work between executors more intentionally. Claude Code’s more mature product surface shows the value of aggressively shaping output so that the model spends less effort on procedural cleanup and more on software-engineering judgment.

There is also a token-economics dimension. Rich tool ecosystems can be undermined if every call returns verbose text that the model must repeatedly scan. A tool that does “less” but returns exactly the right summary can be more valuable than a “powerful” tool that emits raw exhaust. In agent systems, output precision often matters more than output volume.

An important corollary follows: design tools around decision points, not around backend convenience. If the next model step is “which file should I read?”, then search output should help answer that. If the next step is “what should I fix first?”, then diagnostics should foreground severity and location. If the next step is “is this safe to run?”, then the tool or runtime should attach relevant policy signals. Tool outputs should prepare the next cognitive move.

The future of agent tooling will likely look less like a bag of shell wrappers and more like a carefully staged cognitive pipeline. Retrieval and transformation layers will precompute cheap structure. The model will spend its budget where judgment is actually needed. That is the essence of good tool-output design.

So when evaluating a tool, do not stop at “can it get the information?” Ask the more important question: “what work does its output force the model to do next?” If the answer is lots of mechanical cleanup, the tool is underdesigned. The best tools do not merely return data. They return decision-ready state.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 19 — The Art of Tool Design
Token Usage: ~6,600 input + ~1,750 output

19.4 Tool Composition vs Bloat

One of the clearest contrasts in coding-agent design is the difference between a tool ecosystem that supports composition and one that drifts into bloat. This contrast is not captured by tool count alone, but tool count does provide a revealing starting point. A system with 61 tools, such as Claude Code in one snapshot, may be highly capable. A system with 26 tools, such as OMO in one snapshot, may also be highly capable. The real question is not “which number is bigger?” It is “how much useful work can the agent perform through clean combinations of tools without increasing planning confusion?”

Composition means the tool set behaves like a language with a small number of clear primitives that combine well. Bloat means the tool set behaves like a crowded menu full of overlapping dishes whose distinctions are difficult to remember. In the first case, adding one more tool can create leverage across many workflows. In the second case, adding one more tool mostly increases routing burden.

The most interesting example from OMO is the task tool as orchestration primitive. This is not just another utility in the inventory. It changes the problem-solving grammar of the system. Instead of solving everything in one context, the agent can decide to create a sub-task, assign it to a specialized role, let it run in the background, and later retrieve the result. One tool therefore opens an entire class of workflows: scoped exploration, parallel research, isolated verification, and staged synthesis.

That is composition at a high level. The tool’s value is not merely what it does directly, but what larger structures it enables in combination with memory, prompts, and other utilities. By contrast, a bloated tool is usually one whose primary effect is local. It duplicates a neighboring capability, requires extra explanation, and rarely unlocks new patterns.

Claude Code’s broader inventory illustrates both the power and the danger of expansion. A large product serving many workflows naturally accumulates more tools: browser interaction, task management, search, diagnostics, memory, system operations, integrations, and more. This can be justified if the system also invests in routing cues, descriptions, permissions, and output shaping. In that case, the larger inventory is not necessarily bloat. It may simply reflect product maturity. But the burden of proof rises with each added tool. A 61-tool system that lacks sharp boundaries will often perform worse than a 26-tool system with stronger composition.

OpenCode’s lesson is foundational: a good host architecture should make composition easy. Typed registration, coherent namespaces, and clean execution semantics allow higher-level systems to build compositional workflows on top. Without such a substrate, tool ecosystems tend either toward rigidity or accidental sprawl.

There are several practical signals that a tool surface is compositional rather than bloated.

First, workflows can be expressed as short chains of distinct tools with obvious handoffs. Search leads to read. Read leads to edit. Edit leads to diagnostics. Diagnostics lead to test. Task delegation leads to background output retrieval. The transitions feel natural.

Second, tools have low conceptual overlap. The agent rarely hesitates between three nearly identical options.

Third, some tools act as multipliers. They do not merely solve one narrow task; they reshape how other tools can be used together. Task creation, structured patch application, session search, and semantic code navigation often belong in this category.

Fourth, the documentation of the tool set can be understood as a system rather than as 50 unrelated entries. This is subtle but important. A compositional tool inventory has internal logic.

By contrast, tool bloat shows up in predictable ways. The agent repeatedly chooses the wrong adjacent tool. Descriptions need long warnings to distinguish tools that should perhaps never have coexisted. Some tools are used rarely or only because prompts insist on them. Others return outputs so similar that the model must normalize them mentally. The overall feeling is not leverage but clutter.

One useful evaluation method is to think in terms of workflow basis vectors. In linear algebra, a basis is a minimal set of vectors that can generate a larger space through combination. By analogy, a healthy tool set is one where a relatively small number of well-designed primitives can span the major workflows of the product. Search, inspect, modify, verify, delegate, remember, retrieve, and communicate may be enough to cover a very large space if each primitive is strong. Adding more tools should ideally expand the reachable space significantly, not merely offer alternative roads to the same destination.

This is why more is not always better. A new tool must be evaluated not only on standalone usefulness, but on whether it improves the basis. Does it unlock a new class of composition? Does it replace a clumsy multi-step pattern with a clean primitive? Does it reduce cognitive load elsewhere? If not, it may be bloat dressed as progress.

The strategic lesson from OMO is especially valuable: orchestration primitives can dominate raw tool count. A single well-designed task primitive may contribute more to effective capability than many low-level helpers because it changes the architecture of action. Claude Code reminds us that large inventories can still work if the system invests proportionally in product-quality ACI, permissions, and output shaping. OpenCode reminds us that composition depends on substrate quality.

In short, the goal of tool design is not maximal inventory. It is maximal leverage per unit of cognitive surface. Composition increases leverage. Bloat increases friction. The best coding-agent platforms know the difference, and they treat tool count as a consequence of design quality, not a badge of sophistication.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 20 — Context Engineering
Token Usage: ~6,300 input + ~1,700 output

20.1 From Prompt to Context Engineering

For a while, “prompt engineering” was the dominant phrase for practical LLM work. That made sense in the early era, when the most visible leverage came from phrasing instructions well. If you asked clearly, specified format, and added a few examples, performance improved. But coding agents exposed the limits of that framing. A serious agent does not operate on one static prompt. It operates on a continuously changing token state composed of system instructions, tool schemas, user requests, notes, retrieved files, prior tool outputs, memory, summaries, and hidden runtime scaffolding. Once that becomes clear, the center of gravity shifts. The real discipline is no longer prompt engineering. It is context engineering.

Prompt engineering is about writing instructions. Context engineering is about curating the entire state the model sees at the moment of action. That difference is more than semantic. It changes what builders optimize.

Under the prompt-engineering mindset, teams ask questions like: what wording works best, how many examples should we include, how should we phrase the role, what sentence makes the model more careful? These questions still matter, but they are only one slice of the real problem.

Under the context-engineering mindset, teams ask different questions: what should be present right now, what should be omitted, what should be summarized, what should be retrieved on demand, what belongs in persistent memory, what belongs in transient working state, how should tool outputs be shaped, how do we recover from overflow, and how do we keep context from rotting over long sessions?

That is a much larger systems problem.

Coding agents force this transition because their failures are rarely caused by bad instruction wording alone. More often they fail because the active context is wrong. They are looking at the wrong files, carrying stale assumptions, overloaded with irrelevant tool output, missing one critical repository rule, or burdened by a transcript that has become too long and noisy. In other words, the issue is not usually “the prompt could have been phrased better.” The issue is “the model is standing in the wrong informational environment.”

This is why context engineering should be understood as token-state curation. Every token present in the model input competes for attention. Some tokens sharpen action. Some dilute it. Some are globally important. Others are locally relevant only for one step. Some should remain persistent. Others should be compacted away as soon as they have served their purpose. Designing this flow is now one of the primary architecture jobs in agent systems.

OpenCode points toward this shift by treating sessions, messages, compaction, and instruction layering as formal architectural concerns rather than incidental prompt plumbing. OMO pushes further by dynamically injecting rules, continuation state, and task-scoped context through hooks and orchestration logic. Claude Code productizes the same insight through memory systems, automatic compaction, and explicit differentiation between persistent instructions and ephemeral working state. The vocabulary differs, but the conceptual move is the same: the problem is not one prompt. It is one evolving context state.

There are several practical consequences.

First, the best system prompt is often smaller than people expect. If too much operational detail is stored there, the prompt becomes bloated and brittle. High-altitude principles belong in the system prompt. Time-sensitive, task-local, or retrievable detail belongs elsewhere.

Second, retrieval becomes central. Instead of pinning every possible relevant fact into the prompt, the agent should pull in repository rules, prior decisions, docs, examples, and file slices when needed. This keeps context smaller and more relevant.

Third, summaries become an engineering artifact, not just a convenience. A good compact summary of completed work, remaining uncertainties, and current hypotheses often outperforms replaying a long raw transcript.

Fourth, output shaping matters. Tool outputs are part of context. If tools dump noisy or poorly structured data, the context degrades quickly even when retrieval is correct.

Fifth, memory must be stratified. Not everything deserves the same persistence. Stable user preferences, project conventions, and long-lived repository facts belong in durable memory. Temporary observations, one-off experiments, and transient hypotheses belong in short-lived working context.

The shift from prompt to context engineering also changes evaluation. Instead of only comparing prompt variants, teams should compare context policies. What happens if we inject fewer files? What happens if we summarize tool outputs after each phase? What happens if memory is separated into stable rules and ephemeral notes? What happens if delegation occurs earlier, reducing local context load? These design choices often matter more than small wording tweaks.

There is also a philosophical lesson here. Prompt engineering implicitly imagines the model as a very smart reader of a carefully written instruction sheet. Context engineering imagines the model as a component operating inside a larger information architecture. That second framing is much closer to reality for serious coding agents. It acknowledges that the model’s behavior depends not just on what it is told, but on what information is nearby, what information is absent, what structure surrounds it, and how the surrounding system edits its view over time.

This is why future progress in coding agents will not come only from larger models or better prompts. It will come from better control over what the model sees, when it sees it, and in what form. That is context engineering in its clearest form.

Once you understand that, many best practices fall into place. Minimal system prompts. Just-in-time retrieval. Compaction pipelines. Structured notes. Long-term memory layers. Delegation for context isolation. Overflow recovery. None of these are separate tricks. They are all parts of one discipline: managing token state as carefully as software engineers manage CPU, memory, and network resources.

Prompt engineering taught the industry how to speak to models. Context engineering teaches it how to build environments in which models can think well. For coding agents, that is the more important frontier.

The Full Picture: Prompt Engineering → Context Engineering → Harness Engineering

Appendix Addendum: generated by openai/gpt-5.4
Token Usage for this added section: ~3,500 tokens

The transition from prompt engineering to context engineering is already a major conceptual upgrade. But it is no longer the final frame. A third layer has emerged above both: Harness Engineering. If prompt engineering asks how to phrase instructions, and context engineering asks what information the model should see, harness engineering asks the deeper systems question: what environment should we build so that many mistakes become impossible, unlikely, or cheaply recoverable?

This is why the three terms should be understood as an evolution ladder rather than competing slogans.

Paradigm	Core Question	Time Scale	Determinism
Prompt Engineering	“How do I phrase this?”	Per-prompt	Low
Context Engineering	“What info does the model see?”	Per-session	Medium
Harness Engineering	“What environment prevents mistakes?”	Per-project	High

Prompt engineering operates at the most local layer. It tunes wording, style, examples, and instruction shape for a given turn or template. Context engineering moves up one level. It manages the token-state seen by the model across a session: retrieval, summaries, memory, compaction, tool output shaping, and delegation boundaries. Harness engineering moves up another level again. It treats the agent runtime itself as the design object: system prompts, tools, MCP servers, hooks, sub-agent topologies, validators, permissions, memory rules, and feedback loops.

Why Harness Engineering Subsumes Context Engineering

Harness engineering does not replace context engineering. It subsumes it. Context is one of the dimensions of the harness, but not the whole harness. A model can receive the perfect context and still fail if the tool contracts are weak, permissions are blunt, post-edit verification is absent, or sub-agent isolation is missing. Conversely, a strong harness can often compensate for imperfect context by validating actions, narrowing options, shaping feedback, and forcing self-correction.

This is precisely why the field has moved upward in abstraction. Once teams realized that “the prompt” was too narrow, they adopted context engineering. Once they realized that “the context” was still too narrow, they began describing the broader runtime as a harness.

The Four Harness Dimensions — Plus Hooks

One useful formulation, associated with Viv Trivedy, breaks the harness into four dimensions:

System Prompt — AGENTS.md, CLAUDE.md, skill instructions, policy layers, and runtime behavioral contracts.
Tools / MCPs — built-in tools, custom MCP servers, shell wrappers, repository-specific commands, validators, and operational interfaces.
Context — knowledge documents, retrieved file slices, observability data, browser state, memory layers, summaries, and session compaction.
Sub-agents — specialization, context firewalling, parallel execution, role separation, and task decomposition.

To this, a fifth item is often added by practitioners such as HumanLayer: Hooks. Hooks are event-driven control points that inject deterministic logic into an otherwise probabilistic workflow. They let the system react at session start, before or after tool execution, during compaction, after sampling, or on file changes. Hooks matter because they turn architecture into control flow.

graph TB
    subgraph "Agent Harness"
        SP["📋 System Prompt<br/>AGENTS.md / CLAUDE.md / Skills"]
        T["🔧 Tools & MCPs<br/>Built-in + Custom MCP Servers"]
        CTX["📚 Context<br/>Knowledge Docs / Observability / Browser"]
        SA["🤖 Sub-Agents<br/>Context Firewall / Specialization"]
        HK["⚡ Hooks<br/>Deterministic Control Flow"]
    end
    
    LLM["🧠 LLM Core<br/>(Non-deterministic)"]
    
    SP --> LLM
    T --> LLM
    CTX --> LLM
    LLM --> SA
    HK -->|"intercept & validate"| LLM
    
    style LLM fill:#ff6b6b,color:#fff
    style SP fill:#4a9eff,color:#fff
    style T fill:#51cf66,color:#fff
    style CTX fill:#ffd43b,color:#000
    style SA fill:#9775fa,color:#fff
    style HK fill:#f06595,color:#fff

Three Quotes That Define the Paradigm

Three public statements help capture the shift.

From Mitchell Hashimoto on February 5, 2026: “anytime you find an agent makes a mistake, you take the time to engineer a solution such that the agent never makes that mistake again.” This is the operational ethic of harness engineering: repeated failures should be converted into durable constraints.

From OpenAI on February 11, 2026: a team reportedly built a million-line product with 0 hand-written code using harness engineering. Whether one interprets that literally or as an emphasis statement, the implication is unmistakable: with enough runtime structure, verification, and recovery, large-scale software generation becomes feasible as a systems problem.

From HumanLayer: the key addition is that hooks provide event-driven deterministic control flow around the model. This matters because it clarifies that reliability is not only about what the model knows, but also about when the surrounding system intervenes.

Mapping the Harness Dimensions Across OpenCode, OMO, and Claude Code

The most useful way to compare the three systems is to map them across these dimensions.

Harness Dimension	OpenCode	OMO	Claude Code
System Prompt	instructions in config, AGENTS.md	dynamic-agent-prompt-builder, 41 hooks inject context	CLAUDE.md, 4-type memory, managed settings
Tools	~22 tools, Zod schema	26 tools + skill-embedded MCPs	61 tools, ML permission classifier
Context	session compaction, instruction management	preemptive compaction, wisdom accumulation, rules injection	5-layer compaction, memory system
Sub-agents	4 built-in agents, task tool	11 agents, Atlas orchestrator, background spawner	AgentTool, TaskTools, DreamTask, Coordinator
Hooks	5 plugin hook types	41 hooks on 5 tiers	5 hook types (session/compact/sampling/file)

This table is compact, but the implications are deep.

1. System Prompt Layer

In OpenCode, the system-prompt layer is relatively open and modular. Instructions can be assembled through configuration and project files such as AGENTS.md. This makes it attractive for teams that want transparency and control, but it also means some burden falls on the operator to design clean prompt architecture.

In OMO, the prompt layer becomes more engineered. The dynamic-agent-prompt-builder assembles tailored prompts for different agents, while hooks inject additional context or rules at runtime. The result is not just a larger prompt, but a more conditional and operational one. Prompt construction becomes part of orchestration.

In Claude Code, the prompt layer is more managed. CLAUDE.md, multi-type memory, and product-level settings form a curated instruction architecture. The user gets less raw flexibility than in fully open systems, but more vertical integration.

2. Tools and MCPs

OpenCode ships with a compact but serious tool surface—roughly twenty-plus tools, strongly schema-defined, with emphasis on composability. This supports the open-host model.

OMO extends the host with more tools and, crucially, with skill-embedded MCPs. This is a major harness-engineering move because it lets capability bundles travel together: not just instructions, but instructions plus operational interfaces.

Claude Code appears to expose the largest direct tool inventory among the three, along with an ML permission classifier that reduces human interruption. This is a classic example of harness sophistication: not merely adding tools, but adding a control system that decides when the tools can act.

3. Context Layer

OpenCode already treats session compaction and instruction management as architectural concerns, which is why it naturally fits the context-engineering paradigm.

OMO pushes beyond this into proactive orchestration: preemptive compaction, wisdom accumulation, and rules injection all indicate that context is being actively managed, not merely stored. This is an important bridge from context engineering into full harness engineering.

Claude Code productizes context management at a high maturity level through layered memory and sophisticated compaction strategies. Its context system is not just a technical implementation detail; it is a user-facing behavior model.

4. Sub-agents

Sub-agents are where harness design becomes visibly architectural.

OpenCode includes built-in agents and a task mechanism, enough to support decomposition and specialization.

OMO treats sub-agents as a first-class orchestration layer: eleven agents, an Atlas orchestrator, and background spawning. This is where the context firewall principle becomes concrete. Different roles can work with narrower views and better specialization.

Claude Code uses a different vocabulary—AgentTool, TaskTools, DreamTask, Coordinator—but the same underlying idea appears: isolate work, manage parallelism, and offload subtasks into bounded execution contexts.

5. Hooks

Hooks deserve special attention because they are where deterministic control enters the loop.

OpenCode provides five plugin hook types. This already gives builders meaningful interception points.

OMO turns those base points into a much richer control fabric with 41 hooks across 5 tiers. This is an unusually clear example of harness engineering in practice: the host runtime is being instrumented so that policy, context mutation, validation, and workflow guidance can happen at many points in the lifecycle.

Claude Code exposes a smaller but important set of hook types around sessions, compaction, sampling, and files. The surface is more controlled, but it reflects the same architectural principle.

The Key Insight: OMO as HaaS in Practice

The most important comparative insight may be this: OMO is essentially a pre-built harness that turns OpenCode into a highly engineered agent runtime. In other words, it is one of the closest practical expressions of Harness as a Service.

Why does that matter? Because many teams do not fail to build agents because they lack model access. They fail because harness engineering is difficult. It requires good defaults, many small control points, a library of patterns, and the discipline to turn every recurring failure into a reusable mechanism. OMO lowers that barrier by shipping a large amount of harness intelligence in advance.

Claude Code reaches a similar destination through vertical integration. OpenCode provides the base host for custom harness construction. OMO occupies the middle ground: it packages harness sophistication on top of an open host.

From Session-Level Optimization to Project-Level Reliability

The ladder from prompt engineering to context engineering to harness engineering also corresponds to a shift in time horizon.

Prompt engineering optimizes a request.
Context engineering optimizes a session.
Harness engineering optimizes a project runtime.

This is why determinism rises as we move upward. Prompt phrasing is relatively soft. Context policy is stronger but still dynamic. Harness design introduces durable project-level structure: validators, hooks, permissions, skills, sub-agent workflows, and knowledge files. These mechanisms do not merely advise the model. They shape the space of possible behavior.

The Design Consequence

For builders, the practical lesson is straightforward. If an agent fails, the right response is not only to rewrite the prompt. It is to ask at which layer the fix belongs.

If the wording is ambiguous, fix the prompt.
If the model saw the wrong information, fix the context policy.
If the model should never have been allowed to make that class of mistake in the first place, fix the harness.

That final category is where the field is heading. Prompt engineering taught us to write better instructions. Context engineering taught us to curate token state. Harness engineering teaches us to build environments where reliability compounds over time.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 20 — Context Engineering
Token Usage: ~6,500 input + ~1,750 output

20.2 Four Memory Strategies

Context engineering becomes practical only when memory is treated as a set of distinct strategies rather than a single bucket called “history.” In coding agents, four memory strategies matter especially: just-in-time retrieval, compaction, structured notes, and sub-agents. They solve different problems. The best systems know when to use each.

1. Just-in-Time Retrieval

Just-in-time retrieval means the agent fetches relevant material at the moment it becomes useful rather than carrying everything in the prompt from the beginning. This can include files, docs, previous session summaries, rules, diagnostics, symbol definitions, or web references.

Use JIT retrieval when the search space is large and only a subset is likely to matter. Repositories are the obvious example. It is usually wasteful to preload many files “just in case.” Instead, the agent should search first, then read only what the current hypothesis requires. The same principle applies to long documentation sets and historical session data.

JIT retrieval is best when the agent does not yet know exactly what will be relevant. It preserves context budget and reduces noise. However, it can be slower if overused naively because repeated retrieval incurs extra steps. That is why retrieval quality and routing matter. Good search tools and semantic lookup tools make JIT practical.

OpenCode benefits from this strategy through its file, search, and session structure. OMO extends it by using hooks and orchestration logic to inject rules or task-specific context only when relevant. Claude Code uses similar logic in memory and retrieval workflows even when the user experiences it as seamless product behavior.

2. Compaction

Compaction is the controlled compression of context when the active transcript becomes too large or too noisy to carry forward directly. This usually means replacing a long interaction history with a summary that preserves goals, completed actions, important findings, unresolved questions, and next steps.

Use compaction when the agent already learned what it needed from prior exchanges and raw replay is no longer the best use of tokens. Long-running coding sessions almost always require it. Without compaction, the model eventually spends too much effort rereading its own past.

Good compaction is selective. It should preserve commitments, decisions, and actionable findings while dropping verbosity and dead ends. Poor compaction creates brittle summaries that omit crucial constraints. That is why compaction is not trivial summarization; it is state preservation under token pressure.

Claude Code’s auto-compaction systems illustrate the industrial version of this idea. OpenCode’s session compaction logic shows it at the framework level. OMO’s continuation and hook-driven state management show that compaction can also be shaped by workflow rules rather than only by window size.

3. Structured Notes

Structured notes are explicit state objects or disciplined summaries that track what matters now: task goal, current plan, discovered facts, files of interest, hypotheses, unresolved risks, and remaining steps. Unlike raw transcript history, structured notes are designed for machine consumption.

Use structured notes when the task spans many steps, when intermediate findings need to persist cleanly, or when human and agent both benefit from a stable checkpoint. They are especially useful in coding because exploration often yields facts that are small in size but critical in value: “build fails only on macOS,” “symbol renamed in migration layer,” “tests green except snapshot update,” and so on.

Structured notes reduce the need to infer state from chat history. They turn implicit memory into explicit working memory. OMO’s todo and continuation discipline is closely aligned with this strategy. Claude Code’s memory system also points in this direction, though often with stronger product abstraction. OpenCode’s flexible architecture makes it possible for extensions to layer these note structures on top.

4. Sub-Agents as Memory Strategy

Sub-agents are usually discussed as an orchestration strategy, but they are equally a memory strategy. A delegated sub-agent creates a new context partition. Instead of loading more and more material into one monolithic prompt, the system spawns a narrower agent with a narrower goal and a cleaner state.

Use sub-agents when scope isolation matters. This includes parallel exploration, isolated documentation research, hypothesis verification, or tasks that would otherwise pollute the main context with too much local detail. Delegation is especially useful when different tasks require different tool emphasis or role framing.

Sub-agents are not free. They incur setup cost, summary cost, and orchestration complexity. Use them when their isolation benefit exceeds that cost. OMO is the strongest demonstration of this strategy. Claude Code uses it more selectively. OpenCode provides the foundation that allows such patterns to exist.

When to Use Which

These four strategies are complementary.

Use JIT retrieval when the agent needs facts but does not yet know which facts.

Use compaction when the agent already learned useful things and needs to keep moving without carrying full history.

Use structured notes when intermediate state must remain explicit and durable across many steps.

Use sub-agents when a task deserves a fresh local context instead of one more addition to the parent context.

The best systems chain these together. A parent agent retrieves relevant information, records stable findings in notes, compacts older history when needed, and delegates scoped work to sub-agents where isolation or parallelism improves quality. This is not redundancy. It is layered memory architecture.

The underlying lesson is that memory in coding agents is not one thing. It is an ensemble of strategies for deciding what should be loaded, retained, condensed, externalized, or isolated. Builders who reduce memory to “bigger context windows” miss the point. Better windows help, but they do not remove the need for disciplined memory design.

In practice, the strongest coding agents are those that remember selectively, not those that remember indiscriminately. These four strategies are how that selectivity becomes architecture.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 20 — Context Engineering
Token Usage: ~6,400 input + ~1,700 output

20.3 Automatic Context Injection

One of the most powerful and dangerous ideas in agent architecture is automatic context injection. The idea is straightforward: instead of forcing the user or the model to manually restate every relevant rule, memory, or repository convention, the host system injects the right context automatically at the right time. When done well, this dramatically improves reliability. When done poorly, it creates invisible prompt bloat and hidden behavioral surprises.

Automatic context injection matters because coding agents rarely operate in a vacuum. They work inside projects with conventions, rules, memory files, safety constraints, style preferences, and long-lived workflows. If those constraints must be manually restated in every task, the system becomes tedious and brittle. If they are injected intelligently, the agent can behave consistently without requiring constant repetition.

OMO provides one of the clearest examples through its rules injector and broader hook system. The architecture assumes that some instructions should be attached dynamically based on task state, tool use, or workflow phase. This is important because not all rules belong in the static system prompt. Some are situational. Some are project-local. Some are only relevant before a certain class of tool call. OMO’s injection model therefore treats context as an active stream assembled at runtime.

Claude Code illustrates a different version through its memory system. Persistent guidance can live outside the immediate task transcript and be surfaced when relevant. The user experiences this as continuity: the system remembers preferences, project conventions, or standing instructions. Under the hood, this is still context injection. The host decides which durable memory should become active context for the current turn.

OpenCode, by contrast, highlights instruction management as a compositional framework concern. Because its architecture is extensible, instruction assembly can itself become programmable. This is strategically important. It means context injection is not merely a product trick; it can be an exposed architectural pattern.

The strongest case for automatic injection is consistency. Project-wide rules such as “run diagnostics before claiming completion,” “avoid editing generated files,” “prefer structured patch tools over ad hoc shell editing,” or “explain nonstandard terms” should not depend on the user remembering to repeat them. Nor should the model have to rediscover them from prior conversation every time. Injection ensures the rules are nearby when action occurs.

Another advantage is locality. Some instructions are only useful in specific moments. A pre-tool injection may remind the agent not to use shell for file reads. A pre-commit injection may restate git safety rules. A context-transform hook may insert a compact note about ongoing task constraints. Local injection is often better than global prompt stuffing because it puts instructions near the decision point that needs them.

However, the dangers are real.

The first danger is invisible bloat. If the system keeps injecting helpful fragments without strong discipline, the effective context grows quietly until it becomes cluttered. Because the user does not directly see all injected material, diagnosing degraded performance becomes harder.

The second danger is instruction collision. Automatically injected project rules, user memory, task-local notes, and system-level constraints can overlap or conflict. Without careful precedence rules, the model may receive mixed signals.

The third danger is opacity. If users do not understand why the agent behaved a certain way, trust can erode. They may not realize that an injected rule or memory item influenced the result.

This means good automatic context injection needs design discipline.

First, injected context should be selective, not indiscriminate. Inject only what is relevant to the current task or decision point.

Second, it should have clear precedence. The system must define how project rules, user memory, system policy, and task-local instructions interact.

Third, it should be compact. If a rule can be expressed in one crisp sentence, it should not be expanded into a long essay every time.

Fourth, it should be observable enough. Users and developers should have some way to understand what context layers are active, at least for debugging and trust.

Fifth, injection should prefer runtime assembly over prompt inflation. The whole point is to avoid one giant static prompt.

The comparative lesson from the three systems is that automatic context injection is no longer optional in advanced agent design. OMO shows how hooks can make injection deeply dynamic. Claude Code shows how memory systems can make it persistent and ergonomic. OpenCode shows that the assembly pipeline itself can be architected as an extension surface.

In the long run, the best agents will likely behave less like chatbots with one big instruction block and more like runtime-assembled systems whose active context is tailored continuously. But that future only works if injection remains disciplined. Otherwise “automatic context” becomes just another name for hidden prompt sprawl.

The goal is not to give the model more words. It is to give it the right words at the right time. That is what automatic context injection should achieve.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 20 — Context Engineering
Token Usage: ~6,500 input + ~1,750 output

20.4 Context Window Overflow Recovery

Large context windows changed the economics of agent design, but they did not eliminate the problem of overflow. Coding sessions still grow. Tool outputs still accumulate. Memory layers still stack. Long explorations still create local clutter. That is why the best agent systems no longer treat context-window overflow as an exceptional failure. They treat it as a normal operational condition requiring explicit recovery strategies.

Overflow recovery is the art of preserving task continuity when the raw token state can no longer be carried forward intact. This is not just summarization. It is graceful degradation under memory pressure.

OMO offers a vivid example through its anthropic-context-window-limit-recovery hook. The name is specific, but the architectural idea is general. Instead of letting the session collapse when the context gets too large, the system intercepts the condition and initiates a recovery path. That path may include preserving todo state, summarizing critical findings, restating constraints, and re-establishing the active task with a compact working context. The important lesson is not the exact hook name. It is the mindset: overflow is a workflow event.

Claude Code reflects the same principle through auto-compact behavior. Rather than confronting the user with hard failure whenever context pressure rises, it compacts conversation state, preserves useful memory, and keeps the session moving. This is a more productized manifestation of the same architecture. Overflow is expected. Recovery is automated.

OpenCode contributes the framework perspective. Once sessions, messages, and compaction are first-class architectural concepts, overflow recovery can be implemented systematically rather than as ad hoc emergency logic. That is important because recovery quality depends on good state boundaries. If the system does not know what counts as durable instruction, what counts as transient history, and what counts as task state, it cannot compact safely.

The key distinction in overflow handling is graceful degradation vs hard failure.

Hard failure means the session simply exceeds limits and the task continuity breaks. The user may need to restart, restate goals, and manually reconstruct state. This is the worst-case experience. It wastes time, loses trust, and often destroys subtle context the user may not remember to reintroduce.

Graceful degradation means the system sheds low-value context first while preserving high-value state. Verbose histories become summaries. Local exploration details become notes. Completed branches are collapsed. Stable instructions are retained. The active task is re-anchored.

To do this well, a system needs at least five design elements.

First, it needs a notion of state tiers. Persistent rules, durable memory, active task state, and raw transcript should not all be treated equally.

Second, it needs compaction triggers. Waiting until the window is fully exhausted is usually too late. Better systems recover proactively.

Third, it needs structured preservation. Recovery should capture goals, completed actions, unresolved questions, files of interest, and current next steps.

Fourth, it needs resumability. After recovery, the agent should know how to continue rather than merely remember what happened.

Fifth, it needs observability. Users or developers should be able to tell that compaction occurred and, ideally, what class of state was preserved.

Sub-agents also play a role in overflow recovery. Delegation can isolate local exploratory detail in separate contexts so that the parent context remains cleaner. This means good orchestration reduces the probability of overflow in the first place. OMO’s architecture makes this particularly visible, but the lesson applies generally.

Another important principle is that overflow recovery should preserve intent, not just information. A bad summary may retain facts but lose the operational posture: what the agent was trying to prove, what path it rejected, what still matters, and why the next step follows. Good recovery therefore captures not only data but direction.

This is why compaction and structured notes are so closely related. A system that already maintains explicit task state will recover better than one that relies entirely on raw transcript replay. The less the system must infer during recovery, the less it loses.

As context windows continue to grow, some builders may be tempted to ignore overflow engineering. That would be a mistake. Bigger windows postpone the problem; they do not abolish it. In fact, larger windows can encourage sloppier habits, leading to even bigger accumulations of low-value context before the system finally hits limits or degrades in quality.

The mature stance is therefore simple: design for overflow from the beginning. Assume long-running tasks. Assume noisy tool outputs. Assume memory layering. Assume local clutter. Then build compaction and recovery paths that preserve continuity.

The contrast between OMO’s explicit recovery hook and Claude Code’s polished auto-compaction captures a useful spectrum. One exposes overflow handling more visibly as an orchestration event. The other integrates it smoothly into product behavior. Both are better than pretending overflow will not happen.

In the end, context-window overflow recovery is not a niche feature. It is part of what makes an agent feel robust. Systems that fail hard at the edge of context limits feel fragile. Systems that compact, recover, and continue feel engineered. For long-horizon coding work, that difference matters enormously.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 21 — The Art of Multi-Agent Orchestration
Token Usage: ~6,200 input + ~1,950 output

21.1 Five Orchestration Patterns

“Multi-agent” is not a single architecture. It is a family of coordination topologies. Two systems may both spawn subagents, yet behave very differently because authority, context, and result integration move differently through the system. For practical agent engineering, five orchestration patterns matter most: Orchestrator-Worker, Pipeline, Swarm, Mesh, and Hierarchical. These patterns are not mutually exclusive. Mature systems often combine two or three of them. But separating them analytically is essential, because each one optimizes a different tradeoff.

The first and still most common pattern is Orchestrator-Worker. One agent, process, or supervisory session owns the user contract. It decomposes the task into smaller pieces, delegates them, waits for results, and synthesizes the final answer. This is the most legible pattern for coding agents because accountability stays centralized. OMO uses it explicitly: the parent session remains responsible, background subagents work under bounded roles, and notification flows return control to the parent. Claude Code also supports this pattern through task creation and specialized subagents, though the experience is more product-curated than architecture-exposed. OpenCode supports it at the substrate level, but it does not force it as a first-class worldview. The strength of Orchestrator-Worker is control. The weakness is supervisor bottleneck: if the orchestrator decomposes badly, the whole system underperforms.

The second pattern is Pipeline. Here the work moves through ordered stages rather than through a central delegator talking to many peers. One stage gathers evidence, the next interprets it, the next edits code, the next verifies, and the final stage packages output. Pipelines are valuable when reliability matters more than creativity, because stage boundaries make behavior easier to inspect and reproduce. OpenCode naturally fits pipeline thinking because its host architecture is compositional and linear. OMO pushes pipeline design into policy space through message transforms, hook chains, tool guards, continuation recovery, and notification stages. Claude Code also demonstrates pipeline logic in permissions, compaction, sandbox setup, task execution, and result presentation. The pipeline pattern is often underestimated because it does not look glamorous, but much real-world robustness comes from deterministic ordering rather than from additional “intelligence.”

The third pattern is Swarm. In a swarm, multiple agents explore in parallel with relatively weak central sequencing. They may be given variants of the same problem, different search regions, or independent hypotheses to test. Swarm is useful when breadth matters more than strict consistency: large repository exploration, broad research, multiple design candidates, or independent verification. OMO comes closest to a practical swarm architecture among the three systems discussed in this book because it explicitly supports background agents, concurrency limits, specialized prompts, and result collection. Claude Code can approximate swarm behavior through parallel tasks, but it remains more bounded. OpenCode can host swarm-style behavior, but developers must construct the orchestration logic themselves. Swarm’s strength is coverage; its weakness is cost explosion, duplicate work, and difficult aggregation.

The fourth pattern is Mesh. In a mesh, agents communicate laterally rather than always routing through one supervisor. Peer nodes can critique one another, request clarification, or update a shared knowledge layer. Mesh is attractive in theory because it resembles distributed expertise. In practice, it is the hardest pattern to debug. If three agents influence each other recursively, it becomes difficult to know where an error was introduced. None of OpenCode, OMO, or Claude Code are pure mesh systems, and that is probably wise. But partial mesh behavior appears when outputs from one subagent become shared artifacts that later subagents consume. OMO’s wisdom accumulation creates an indirect mesh: agents do not always talk directly, yet their discoveries shape subsequent agents. Claude Code’s clean-window subagents reduce mesh effects, but summaries reintroduced into the parent context still create limited peer influence.

The fifth pattern is Hierarchical orchestration. This combines delegation with multiple layers of abstraction: planner, coordinator, specialist, verifier, executor. Hierarchy matters because not all agent work lives at the same conceptual altitude. Planning a refactor, searching a codebase, editing files, running verification, and auditing security are different cognitive strata. OMO expresses hierarchy most explicitly through role-specific agents such as Oracle, Explore, Librarian, and Hephaestus, plus category routing and permission constraints. Claude Code has softer hierarchy: a main agent may delegate to subtasks and later resume supervisory synthesis. OpenCode mostly provides the raw host for hierarchy rather than an opinionated hierarchy system. The strength of hierarchy is clarity of responsibility by level. The weakness is managerial overhead and latency.

These five patterns can be compared more concretely:

Pattern	Primary control style	Scalability	Fault tolerance	Debugging difficulty	Latency profile	Best fit
Orchestrator-Worker	Centralized	Moderate	Moderate; orchestrator is a single logical choke point	Low to moderate	Medium	Decomposition with accountable final synthesis
Pipeline	Stage-based	High for repeatable flows	High if stages are isolated and retryable	Low	Medium to high, but predictable	Deterministic workflows and compliance-sensitive tasks
Swarm	Loose coordination	Very high in breadth-first tasks	Mixed; redundant search helps, aggregation can fail	High	Low initial latency, high integration latency	Wide search, ideation, parallel hypothesis testing
Mesh	Lateral peer exchange	Theoretically high, practically bounded	Potentially resilient, but prone to drift	Very high	Variable and often unstable	Critique networks, distributed reasoning research
Hierarchical	Multi-level command	High for complex tasks	Moderate to high if layers are well bounded	Moderate to high	High upfront, lower rework later	Large engineering problems spanning abstraction layers

No pattern dominates on every dimension. Centralized control usually helps debugging but hurts scalability. Loose peer collaboration improves exploration but worsens interpretability. Pipelines reduce ambiguity but may suppress adaptive behavior. Hierarchy improves specialization but introduces coordination tax. This is why agent-system design should begin with task topology, not with ideology. If the task is broad repository reconnaissance, swarm may be ideal. If the task is a production migration with verification requirements, pipeline plus hierarchy is safer. If the task is a user-facing coding workflow where one answer must remain coherent, Orchestrator-Worker is often the default.

There is also a hidden systems lesson here. The same visible behavior—“the agent used subagents”—can be implemented with completely different failure modes. A swarm that appears fast may hide huge token waste. A hierarchy that appears slow may save time by reducing rework. A mesh that appears collaborative may actually be impossible to audit. Therefore, orchestration should be evaluated not by novelty but by operational properties: who owns the answer, how state moves, how retries happen, and how errors are localized.

In practice, the most effective coding-agent systems are hybrids. OMO, for example, is largely Orchestrator-Worker at the top level, hierarchical in role design, swarm-like for exploration, and pipeline-like in hook-driven governance. Claude Code is primarily centralized and product-supervised, but it uses task delegation, isolation boundaries, and safety pipelines to capture some of the same advantages with fewer exposed degrees of freedom. OpenCode is not weaker because it is less opinionated; it is more foundational. It leaves orchestration design to the builder.

The art of multi-agent orchestration lies in choosing the minimum topology that solves the task. Use the richest pattern only when its benefits outweigh its coordination cost. Most systems do not fail because they lack agents. They fail because they choose the wrong graph of responsibility.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 21 — The Art of Multi-Agent Orchestration
Token Usage: ~6,100 input + ~1,900 output

21.2 When to Use Multi-Agent

Many teams adopt multi-agent design for the wrong reason: it looks sophisticated. That is not enough. Multi-agent architecture is not free. It increases token consumption, orchestration complexity, result-merging burden, and debugging difficulty. The right question is not “Can we use multiple agents?” but “Is the value of decomposition larger than the cost of coordination?”

The economics are the first filter. In practical coding workflows, a competent single-agent run may already consume substantial context. Adding one or two specialized agents often increases total token usage to roughly 4x the single-agent baseline once decomposition, task setup, result summaries, and reintegration are counted. A more ambitious multi-agent workflow—with parallel exploration, verification passes, and external research—can easily reach 15x the token cost of a simple direct run. These numbers are not laws of physics, but they are directionally right enough to shape architecture decisions. Multi-agent systems should therefore be treated as high-cost, high-upside machinery.

This does not mean multi-agent is bad. It means it should be reserved for problems with one of four properties. First, the task is too broad for one context window. Large monorepo search, multi-file architecture understanding, and cross-service tracing all fit here. Second, the task benefits from parallelism: independent research threads, alternative design proposals, or broad test-failure diagnosis. Third, the task contains heterogeneous cognitive modes—for example, read-only inspection, external documentation lookup, deep implementation, and security review. Fourth, the task is high value, meaning the cost of a wrong answer or a slow answer is significantly larger than the cost of extra model calls.

Anthropic’s own research gives strong evidence that multi-agent can be worth it in the right regime. In one widely cited result, a multi-agent research system achieved roughly 90.2% improvement on a benchmarked information-finding task relative to a weaker baseline configuration. The exact benchmark context matters, and such numbers should never be naively generalized to all coding tasks. But the broader lesson is clear: when the problem rewards decomposition, independent retrieval, and synthesis, coordinated subagents can materially outperform a single monolithic reasoning thread.

Why does this happen? Because many complex tasks are not bottlenecked by reasoning depth alone. They are bottlenecked by evidence coverage. A single agent may spend too much of its context budget locating information, leaving less room for synthesis. It may also anchor too early on one path. Parallel subagents reduce this risk. One agent can search the codebase, another can inspect external docs, another can compare implementation options, and the parent can integrate. This changes the problem from “one mind must do everything in sequence” to “several bounded minds gather structured evidence for one accountable answer.”

Still, multi-agent should not be the default for ordinary work. If the task is a small code edit, a localized bug fix, a single documentation update, or a straightforward command execution, single-agent is usually superior. It is cheaper, faster, and easier to audit. Multi-agent introduces failure modes that do not exist in a direct workflow: duplicate searches, contradictory outputs, lost context, excessive supervisor verbosity, and integration mistakes. A mediocre orchestration can be worse than a strong single-agent pass.

The decision framework can be phrased economically:

Estimate task value. What is the cost of being wrong or slow?
Estimate decomposition benefit. Can subtasks proceed independently and improve coverage?
Estimate coordination overhead. How much token and latency tax will orchestration add?
Estimate integration risk. Will combining outputs be straightforward or error-prone?

Only when expected benefit exceeds these costs should multi-agent be activated.

This is why good systems increasingly rely on triggers rather than ideology. OMO’s category system is valuable partly because it routes only some tasks into richer orchestration. If the problem looks like research, search, or high-complexity implementation, specialized agents become worth invoking. If the problem is simple, escalating to a full multi-agent graph is wasteful. Claude Code’s more curated task model reflects a similar instinct: subagents are helpful, but not every user request should become a distributed system.

There is also a human-factors angle. Multi-agent systems can create the illusion of rigor because they produce more intermediate artifacts—plans, subreports, summaries, notifications. But more text does not necessarily mean more truth. Engineers should watch for “coordination theater”: the system appears intelligent because several agents spoke, yet all of them relied on the same shallow evidence. The right metric is not how many subagents were involved. It is whether orchestration improved outcome quality, coverage, or reliability enough to justify cost.

The strongest cases for multi-agent in coding systems are therefore concentrated in high-complexity zones: large-scale code understanding, migration planning, benchmarked search tasks, cross-tool workflows, multi-stage debugging, and scenarios requiring explicit separation of powers. Read-only agents can inspect safely, external agents can consult documentation, and deeper execution agents can act only after enough evidence has been gathered. This is not just about performance. It is also about risk shaping.

One practical rule is useful: default to one agent, escalate to many only when one of the failure modes of single-agent becomes visible. Those failure modes include context overflow, repeated search loops, mixed cognitive tasks inside one thread, or a need for independent verification. In other words, multi-agent should usually be a response to observed complexity, not a decorative starting point.

The future will probably bring cheaper models and better orchestration, but the economic principle will remain. Every additional agent introduces both opportunity and tax. The art is knowing when the upside is nonlinear. Anthropic’s 90.2% finding matters because it reminds us that there are domains where decomposition creates step-change gains. The 4x and 15x token realities matter because they remind us those gains are expensive.

Good builders do not worship multi-agent. They reserve it for moments when the task is valuable enough, broad enough, or risky enough that specialization and parallel evidence gathering are worth the bill.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 21 — The Art of Multi-Agent Orchestration
Token Usage: ~6,400 input + ~2,020 output

21.3 Designing Agent Specialization

The simplest way to build a multi-agent system is to create several copies of the same general agent and give them different task prompts. That works, but only up to a point. Real specialization begins when the system changes not just the prompt, but also the permission boundary, tool access, latency expectation, and evaluation role of each agent. OMO is instructive here because its specialized agents are not merely branded personalities. They encode operational constraints.

Consider four representative OMO roles. Oracle is read-only. It exists to inspect, reason, and advise without mutating the environment. This is an important design move because it separates diagnosis from execution. When an agent cannot write files or run destructive operations, it becomes safer to ask it broad architectural questions. Explore is optimized for fast search and reconnaissance. Its value is speed and breadth, not polished synthesis. Librarian is oriented toward external knowledge: documentation, references, and information outside the local repository. Hephaestus is the deep worker: slower, more thorough, suited to harder implementation or design-heavy tasks. These roles differ in objective function, not just wording.

This is the key principle of specialization: agents should differ because the task differs, not because branding differs. If two agents have identical permissions, identical tools, identical latency expectations, and identical success criteria, then they are not meaningfully specialized. They are replicas. Real specialization requires at least one structural difference.

Permission constraints are especially powerful because they prevent overreach. A read-only Oracle cannot impulsively “fix” the thing it is auditing. A fast search agent should not have the same authority as a deep implementation agent. An external-research agent may need web access but not filesystem mutation. This is more than safety engineering; it also improves cognition. When an agent knows it is not responsible for acting, it can focus on diagnosis. When it knows it is responsible for deep execution, it can spend more budget on careful implementation. Boundaries sharpen role behavior.

The OMO category system adds another layer of sophistication. Rather than routing tasks based only on model preference or arbitrary user choice, categories frame the nature of the work: coding, research, visual design, documentation, and so on. This helps eliminate a subtle failure mode: model bias masquerading as role selection. In many systems, developers pick a “smart” model for difficult work and a “cheap” model for simple work, but they never cleanly separate task semantics from model economics. Category-driven dispatch forces the system to reason first about what kind of problem this is, then about which agent and model pairing best fits it. That is architecturally healthier.

Good specialization design usually follows five dimensions.

First is scope. What class of problems should the agent handle? Repository search, documentation retrieval, architecture review, code generation, verification, or triage? Scope should be narrow enough that the role is legible. Second is authority. What is the maximum action the agent may take? Read-only, local file edits, shell commands, network access, external credentials, or deployment operations? Third is tempo. Is the role optimized for fast reconnaissance or slow, deliberate depth? Fourth is artifact style. Does the agent return raw findings, decision-ready summaries, diffs, or executable plans? Fifth is handoff contract. Who consumes the output, and in what format?

These dimensions matter because the hardest failures in multi-agent systems are often handoff failures. A search agent returns prose when the parent needed file paths. A research agent returns fifteen links when the implementation agent needed one clear recommendation. A verifier reruns the same investigation already done by a searcher. Good specialization means not only clear identity, but also interoperable output.

There is also an important anti-pattern: over-specialization too early. Teams sometimes create dozens of narrow agents—frontend-linter-agent, markdown-fixer-agent, YAML-auditor-agent—before they understand their actual workload. This creates routing confusion, prompt maintenance burden, and duplicated capability descriptions. A better path is to start with a few coarse but meaningful roles: read-only advisor, fast explorer, external researcher, deep executor, verifier. Then split roles only when the error profile justifies it.

Another useful insight from OMO is that role quality depends on tool curation. Explore is powerful partly because it has the right search-oriented tools. Oracle is safe partly because it lacks write authority. Specialization therefore lives at the intersection of prompt, policy, and tooling. Designers who focus only on the prompt are missing the systems half of the problem.

Claude Code offers a contrasting lesson. Its subagents generally operate in cleaner, more isolated task windows. This reduces role contamination and makes it easier to reason about what each subagent did. OMO, by contrast, leans harder into cumulative orchestration and role interaction. Both approaches can work. The common lesson is that specialization should be legible to the parent runtime. If the system cannot explain why one agent was chosen instead of another, specialization has not really been designed; it has only been improvised.

The best agent specialization systems feel less like “multiple models chatting” and more like organizational design. A good organization does not assign every employee the same badge, tools, and escalation rights. It allocates responsibility carefully. Oracle, Explore, Librarian, and Hephaestus are useful not because they sound mythic, but because they represent four enduring task modes: inspect, search, research, and deeply build.

That is the broader design rule. Specialize by function, boundary, and contract. Use permission constraints to prevent overreach. Use category systems to route by problem type rather than by model superstition. And make every specialized agent produce outputs that another agent can actually use. Otherwise, a multi-agent architecture becomes a naming scheme rather than a real operating model.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 21 — The Art of Multi-Agent Orchestration
Token Usage: ~6,300 input + ~1,980 output

21.4 Wisdom Accumulation vs Context Isolation

One of the deepest architectural differences in multi-agent systems is not visible in the UI. It lies in how information flows between subagents across time. OMO and Claude Code illustrate two distinct philosophies. OMO emphasizes wisdom accumulation: extract learnings from one agent’s work and pass them to subsequent subagents or future turns. Claude Code leans more heavily on context isolation: each subagent receives a cleaner task window, minimizing accidental contamination from previous agent runs. Both approaches are defensible. They optimize different failure modes.

Wisdom accumulation treats the multi-agent system as an organization that learns. If one subagent already discovered where the relevant files live, which hypothesis failed, or which external documentation contradicted the initial plan, the next subagent should not have to rediscover all of that from scratch. OMO’s architecture is sympathetic to this idea. It can extract useful findings, preserve task state, and feed structured summaries back into the parent or onward into later delegated work. In effect, the system tries to turn temporary effort into reusable operational memory.

This has obvious benefits. First, it reduces duplicate work. Search results, failed hypotheses, and known constraints become durable context instead of evaporating between runs. Second, it improves continuity across interruptions. If the orchestration resumes later, accumulated wisdom acts like a breadcrumb trail. Third, it allows later agents to start at a higher level of abstraction. Rather than burning context on rediscovery, they can focus on synthesis or execution. In long-running coding tasks, these savings compound.

But wisdom accumulation has costs. The more prior material you inject, the greater the risk of bias inheritance. A wrong early assumption can spread through later agents as if it were ground truth. Accumulated summaries can also become stale, overly compressed, or misleadingly confident. In other words, wisdom accumulation increases efficiency, but it can reduce epistemic freshness. The system becomes smarter when the early learnings are good and more fragile when they are not.

Claude Code’s context isolation attacks the opposite problem. By giving each subagent a relatively clean window, it reduces cross-contamination. A fresh subagent can evaluate a task with less anchoring bias from previous threads. This is especially useful for verification, critique, and alternative-path exploration. Isolation also helps debugging. If a subagent produces a bad conclusion, the developer can inspect a narrower causal chain. The runtime is easier to reason about because each child session is less entangled with historical residue.

Isolation, however, is not free either. It often recreates search cost. A clean window means previously discovered evidence may need to be reintroduced manually or rediscovered. The system gains freshness but loses continuity. It also shifts burden onto the parent agent, which must decide what minimal context to pass down and how to summarize results back up. If the summaries are too thin, the child lacks key facts. If they are too thick, isolation is weakened.

The tradeoff can be expressed simply:

Wisdom accumulation optimizes continuity, memory, and reduction of duplicate effort.
Context isolation optimizes freshness, auditability, and resistance to contamination.

Neither is universally superior. The right choice depends on task type.

For exploratory work in a large repository, wisdom accumulation is often valuable. Once one agent has mapped the terrain, throwing that map away is wasteful. For independent verification, context isolation is better. A verifier should not be overly shaped by the same narrative that produced the candidate solution. For long-lived autonomous sessions that may resume after compaction or interruption, accumulation is powerful. For safety-sensitive tasks where one wants independent judgment, isolation is healthier.

OMO’s approach is especially attractive when the system is expected to operate like a persistent engineering team. Teams do not deliberately forget every previous discovery. They maintain working notes, shared findings, and status artifacts. Wisdom accumulation pushes agent systems toward this organizational model. Claude Code’s approach is more like controlled consulting engagements: bring in a specialist with a defined brief, keep the scope clean, then merge the result. This is often better for product reliability and bounded reasoning.

A mature architecture will probably combine both. Not all knowledge deserves persistence. Some information should be treated as stable operating truth: repository layout, previously validated findings, explicit constraints, user preferences, known failures. Other information should remain quarantined: speculative hypotheses, unverified interpretations, and potentially biased summaries. The design challenge is not whether to store or isolate. It is what to store, at what confidence level, and for whom.

This suggests a useful three-tier model. Tier one is durable facts, safe to reuse broadly. Tier two is working hypotheses, useful but confidence-labeled. Tier three is ephemeral reasoning, which should stay local to a subagent unless explicitly promoted. OMO leans toward promoting more information upward. Claude Code leans toward preserving cleaner local reasoning compartments. Both are valid partial answers to the same problem: how can a distributed reasoning system remain both cumulative and trustworthy?

For builders, the practical lesson is to treat memory transfer as a first-class interface, not a side effect. If wisdom is accumulated, label its provenance and confidence. If isolation is used, make summaries deliberate and minimal. Do not let either pattern happen accidentally. Uncontrolled accumulation produces dogma. Excessive isolation produces amnesia.

The broader architectural insight is that multi-agent quality depends not only on how agents are spawned, but on how learning moves between them. OMO teaches that an orchestrator can become more effective over time by preserving useful findings. Claude Code teaches that fresh context windows remain one of the strongest defenses against runaway drift. The art is knowing when to remember and when to forget.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 21 — The Art of Multi-Agent Orchestration
Token Usage: ~6,500 input + ~2,030 output

21.5 Challenges of Parallel Execution

Parallel execution is the glamorous side of multi-agent orchestration. It promises speed, breadth, and better evidence coverage. In practice, it also introduces a new class of systems problems: concurrency limits, duplicate work, result aggregation, conflict resolution, and notification management. A single-agent loop mainly worries about reasoning quality. A parallel agent runtime must additionally behave like a scheduler.

OMO makes this explicit. Its background agent spawner is not just a convenience feature; it is an admission that once agents run concurrently, orchestration becomes infrastructure. One notable design choice is concurrency control at roughly five concurrent background tasks per model/provider combination. This matters because unconstrained spawning would create both cost blowups and provider-level instability. Parallelism is useful only when bounded. Otherwise, the orchestrator turns one user request into an accidental denial-of-wallet attack against its own budget.

Concurrency limits solve only the first problem. The next problem is task partitioning. Parallel agents must be given subproblems that are actually independent enough to benefit from concurrent execution. If two search agents inspect the same code paths without coordination, the system burns tokens for little gain. If several deep-execution agents attempt overlapping edits, the parent must later untangle conflicting patches. Good orchestration therefore requires explicit partitioning by file region, hypothesis, abstraction layer, or evidence source.

Then comes result aggregation. This is harder than it looks. Subagents rarely return perfectly aligned outputs. One may produce raw evidence, another a summary, another a recommendation, another a contradiction. A robust parent runtime needs an aggregation policy: which outputs are authoritative, which are advisory, how duplicates are collapsed, and how uncertainty is represented. Without this layer, parallelism merely creates more text to read.

Conflict resolution is even more subtle. Suppose one agent says a failure is caused by configuration drift, while another says it is caused by an API mismatch. Or two implementation agents propose incompatible edits to the same subsystem. The system cannot simply concatenate both answers. It needs either a tie-breaker policy or an escalation path. Common strategies include:

Priority-based resolution: some roles outrank others, such as verifier over implementer.
Evidence-based resolution: outputs with concrete file paths, traces, or external citations dominate vague claims.
Reconciliation pass: a dedicated synthesizer agent compares conflicting outputs.
User escalation: unresolved contradictions are surfaced explicitly rather than hidden.

Parallel systems also face the problem of time asymmetry. The fastest subagent is not always the most valuable. A quick search result may arrive before a deeper analysis, tempting the orchestrator to converge too early. Schedulers therefore need stopping rules: wait for all tasks, wait for quorum, or allow early completion if confidence exceeds a threshold. Each choice has tradeoffs. Waiting for all tasks increases latency. Returning early risks ignoring late but important evidence.

Notification mechanisms matter more than most designers expect. When subagents run in the background, the parent session needs structured updates: task started, task completed, task failed, output available, continuation required. OMO’s parent-session notification design is important because it transforms asynchronous work from a silent side channel into a visible orchestration event. Without notifications, users and parent agents both lose track of which parallel branches still matter.

There is also a debugging problem unique to concurrency: causal ambiguity. In a serial workflow, it is usually clear which step introduced an error. In a parallel workflow, several branches may contribute partial evidence that later gets merged incorrectly. Was the bug in the child task prompt, the scheduler, the aggregator, or the synthesizer? This is why observability for multi-agent systems should include per-task IDs, start and end times, role labels, input scope, and merge provenance. Parallelism without traceability is operationally brittle.

Claude Code’s more curated multi-task approach implicitly acknowledges these challenges. By keeping subagents more isolated and product-bounded, it reduces some of the coordination surface exposed to the user. OMO, by contrast, is architecturally richer and therefore more powerful, but also more exposed to orchestration complexity. OpenCode again serves as neutral substrate: it can host concurrency, but it does not solve these scheduler-level issues by itself.

The hardest conceptual mistake is to assume that parallelism automatically reduces wall-clock time. Sometimes it does. But when result synthesis, conflict resolution, and retries are counted, the total end-to-end latency may remain high or even increase. Parallelism helps most when the branches are genuinely independent and the merge contract is clear. Otherwise, the system simply moves work from execution time into integration time.

Three design rules follow.

First, parallelize search and evidence gathering more readily than mutation and execution. Independent read operations merge more safely than concurrent writes. Second, keep branch outputs schema-like whenever possible: file paths, ranked findings, citations, confidence, recommended next action. Structured outputs are easier to aggregate than free-form prose. Third, treat notifications and provenance as part of the product, not backend detail. Users need to understand why the orchestrator is waiting, what completed, and how the final answer was assembled.

In other words, multi-agent parallelism is not just “run five agents at once.” It is a compound systems problem involving resource limits, partitioning, scheduling, merging, and observability. OMO’s concurrency cap of five per model/provider, its aggregation pathways, and its notification mechanisms show one credible answer. Claude Code shows another: hide more of the machinery behind a safer product surface. The broader lesson is that parallel execution becomes valuable only after coordination becomes disciplined.

The future of coding agents will almost certainly involve more concurrency, not less. But the winners will be the systems that treat parallelism as a control problem rather than a spectacle.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 22 — Designing for Extensibility
Token Usage: ~5,900 input + ~1,820 output

22.1 Three Content Types

One of the most common extensibility mistakes in agent systems is treating all extension content as if it were the same thing. It is not. A practical architecture must distinguish at least three categories: reference content, task content, and executable tools. These categories differ not only in format, but in who or what is supposed to consume them. Once that distinction becomes explicit, extensibility design becomes far cleaner.

The first category is reference content. This is material the language model should read, consult, and incorporate into reasoning, but not necessarily obey as a procedural script. Examples include API documentation, architectural notes, coding standards, product requirements, design rationale, glossaries, and prior decisions. Reference content expands the model’s situational knowledge. It says, in effect, “Here is the world you should understand.” Skills, memory files, READMEs, and documentation often fall into this bucket.

Reference content is valuable because LLMs are reasoning systems that depend heavily on available context. However, reference content is easy to misuse. If too much of it is injected indiscriminately, the model spends precious attention budget parsing background instead of solving the immediate task. Good extensibility systems therefore need retrieval, prioritization, or conditional injection. The design question is not merely how to store reference material, but when to surface it.

The second category is task content. This is content the LLM is expected to follow as an operational brief rather than simply absorb as background knowledge. It includes step-by-step instructions, workflows, rubrics, checklists, runbooks, templates, and domain-specific procedures. A skill file that says “when user asks for X, first inspect Y, then produce Z” is task content. So is a command definition that wraps a repeatable workflow. Task content changes model behavior directly. It tells the system how to proceed.

The distinction from reference content matters. A style guide is reference content; a release checklist is task content. A glossary is reference content; a migration procedure is task content. When systems blur the two, models either under-follow important procedures or over-obey background text that should have remained advisory.

The third category is executable tools. Here the consumer is not primarily the LLM; it is the computer. Tools expose actions the runtime can perform: read files, edit code, query GitHub, run tests, call MCP servers, inspect diagnostics, or access external APIs. The model may decide when to call a tool, but the actual effect is carried out by the host environment. This means tool design belongs partly to software engineering, not just prompt engineering.

This three-way split suggests a crucial routing principle:

Reference content should be routed to the LLM as selectively retrieved context.
Task content should be routed to the LLM as active procedural guidance.
Executable tools should be routed to the runtime as callable capabilities.

Many poor extension architectures fail because they route everything through the prompt. They stuff reference docs, procedural instructions, and tool descriptions into one giant text blob and hope the model sorts it out. That approach scales badly. It confuses status, increases prompt entropy, and makes auditing difficult. Better systems recognize that different artifacts belong to different executors.

OMO’s architecture illustrates this separation particularly well. Skills can contain reference material, explicit task procedures, and even embedded MCP references—but those pieces do not play the same role. OpenCode also benefits from keeping tools, commands, and configuration distinct. Claude Code’s plugin, skill, and command surfaces similarly imply that not every extension artifact should be treated as model prose. The broader lesson extends beyond these systems. Any serious agent platform needs a content taxonomy before it needs a marketplace.

Why is this so important? Because the executor determines the failure mode. If reference content is misrouted as task content, the model may over-constrain itself. If task content is treated as reference content, critical steps may be skipped. If tools are described only as prose rather than exposed as executable interfaces, the model must simulate action instead of performing it. These are fundamentally different errors.

There is also a maintenance benefit. Reference content is usually updated by subject-matter experts. Task content is often maintained by workflow designers. Tools are maintained by engineers. A clear content-type model lets teams update one layer without accidentally destabilizing the others. This becomes increasingly important as organizations grow.

A useful mental model is this: reference content tells the model what is true or relevant; task content tells it what process to follow; tools define what can actually be done. Put differently, these correspond to knowledge, procedure, and capability. That triad is more durable than any particular implementation framework.

The design implication is straightforward. Extension systems should classify content explicitly, store metadata about intended executor, and provide routing rules. If the host cannot tell whether an artifact is meant to be read, followed, or executed, then extensibility will eventually collapse into prompt soup.

For agent builders in 2026, this may be one of the most underappreciated architectural disciplines. Extensibility is not just about adding more things. It is about sending the right thing to the right executor in the right form. Once that discipline is in place, skills become clearer, plugins become safer, and tools become more composable.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 22 — Designing for Extensibility
Token Usage: ~6,000 input + ~1,860 output

22.2 MCP as Universal Extensibility Substrate

By 2026, one recommendation has become increasingly clear: new capabilities should, whenever possible, be exposed as MCP servers rather than as framework-specific plugins. This does not mean plugins disappear. It means the default portability layer for new capabilities should be MCP.

Why? Because MCP solves a general problem that plugin systems usually solve only locally. A plugin is often tied to one host’s lifecycle, configuration format, packaging style, and permission model. An MCP server, by contrast, packages capability behind a protocol boundary. If multiple hosts speak the protocol, the same server can be reused across them. That is a profound leverage gain.

This is why MCP is best understood as a universal substrate. A substrate is a base layer that supports many higher-level constructions. In classic computer systems, TCP/IP became a substrate for networked applications, and POSIX became a substrate for portable UNIX-like tooling. MCP aims to do something analogous for tool-using LLM hosts. Rather than requiring every ecosystem to reinvent “how a model calls an external capability,” it standardizes the contract.

There are several advantages to this approach.

First, MCP is language-agnostic. A server can be implemented in Python, TypeScript, Go, Rust, or any other language, as long as it speaks the protocol correctly. This matters because framework-specific plugin ecosystems often silently force teams into one implementation language.

Second, MCP is host-agnostic. If OpenCode, OMO, Claude Code, desktop clients, IDE integrations, and future agent platforms all support MCP, then a tool built once can travel across hosts with relatively small adaptation cost. That sharply reduces ecosystem fragmentation.

Third, MCP supports transport flexibility. Servers can run locally over stdio, remotely over network transport, or through other supported channels. This means capability placement can be tuned to the use case. A sensitive credential broker may run in a hardened environment, while a local code-query tool may run beside the developer’s editor.

Fourth, MCP provides a cleaner separation between capability implementation and host orchestration. The host focuses on prompts, sessions, safety, and UI. The MCP server focuses on doing one job well. This separation of concerns tends to produce more reusable software.

That is why new capabilities such as repository search, issue trackers, deployment controls, database introspection, enterprise APIs, or credential mediation should increasingly be shipped as MCP servers first. A framework-specific plugin can still exist, but ideally it should mostly help the host discover, authorize, and route to the MCP capability rather than reimplement the capability from scratch.

This recommendation also avoids one of the biggest long-term problems in agent ecosystems: duplicated integration logic. If every host needs a bespoke plugin for Jira, GitHub, Linear, Postgres, browser automation, observability systems, and internal company APIs, the maintenance burden grows quadratically. A shared protocol reduces that duplication.

Of course, plugins still matter. There are host-local concerns that MCP does not replace: hook integration, UI augmentation, lifecycle management, local policy injection, settings pages, command registration, and product-specific workflows. But these should increasingly be thin coordination layers around MCP-accessible capability, not giant monolithic extensions that bury transport, policy, and business logic inside one host-specific package.

The OpenCode/OMO/Claude Code comparison reinforces this point. OpenCode offers extensibility surfaces where MCP can be a durable external capability layer. OMO’s skill-embedded MCP pattern is especially instructive: it lets skills teach the model how and when to use an MCP-exposed capability. Claude Code, too, benefits from MCP because it allows external tools to be integrated without requiring the commercial host to expose every conceivable native extension API. In all three cases, MCP reduces lock-in.

There is also a strategic angle. Framework-specific plugins create host moats. MCP creates ecosystem compounding. For builders who want their capability to outlive any one host, MCP is the safer bet. It turns an integration from “software for one product” into “infrastructure for many products.”

This is why the 2026 best-practice recommendation can be stated plainly: if you are inventing a new capability, ask first whether it can be an MCP server. Only if the capability fundamentally depends on host-local lifecycle hooks, UI embedding, or privileged in-process integration should you start with a framework-specific plugin. Even then, consider a hybrid architecture where the plugin is thin and the heavy lifting lives behind MCP.

There is also an ecosystem-design benefit that is easy to underestimate: protocols create competition at the capability layer rather than lock-in at the host layer. When a capability is exposed through MCP, builders can improve quality, latency, and safety without forcing users to switch entire agent platforms. That is healthy for innovation. It lets hosts compete on orchestration and UX while capability authors compete on implementation excellence.

The long arc of extensibility tends toward protocols, not silos. MCP is not important because it is fashionable. It is important because it gives agent ecosystems a shared substrate on which portable capabilities can accumulate. In the same way that standardized ports enabled a richer hardware ecosystem, MCP gives software agents a common port for external intelligence and action.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 22 — Designing for Extensibility
Token Usage: ~5,800 input + ~1,800 output

22.3 Lifecycle Hooks

Most extension systems focus on installation and execution, but mature extensibility requires a fuller lifecycle. A practical extension should be designed around at least six phases: Install → Configure → Activate → Operate → Deactivate → Upgrade → Uninstall. Skipping any of these phases usually creates operational debt. Among them, the most commonly neglected is uninstall.

Start with Install. Installation is where the artifact is fetched, verified, registered, and made known to the host system. This phase should handle dependency setup, integrity checking, and initial metadata registration. It should not do too much else. Systems that overload install with user-specific mutation become hard to reason about and harder to reverse.

Next comes Configure. Installation only places the extension into the system; configuration makes it useful. This may involve API endpoints, credentials, model choices, feature flags, path settings, or scope limitations. Good configuration systems separate host defaults, user overrides, and project-local settings. They also validate early. Nothing is more frustrating than a plugin that “installs successfully” but fails later because required configuration was deferred until runtime.

Then comes Activate. Activation means the extension becomes live in the host’s operational graph. Hooks are registered, commands become available, tools are advertised, background services may start, and permissions may be requested. Activation should be explicit because not every installed extension should always be running. This is especially true in enterprise environments where capability presence and capability activation are different governance decisions.

After activation comes Operate. This is the steady-state runtime phase: the extension receives events, responds to tool calls, injects policy, serves UI, or handles user workflows. Most designers spend all their time here because it is the visible part. But runtime behavior only remains healthy if the surrounding phases are equally well designed.

Then comes Deactivate. Deactivation is not the same as uninstall. It means the extension is still present but not currently active. Hooks should be detached, background services stopped, permissions released where possible, and UI clutter removed. Deactivation is important for debugging, experimentation, and staged rollouts. Systems without a clean deactivate pathway tend to require hard removal whenever anything goes wrong.

Next is Upgrade. Upgrades are where compatibility issues surface. Schemas change, tool contracts evolve, cached artifacts become invalid, and credentials may need rotation. A robust extension system should define upgrade hooks or migration scripts so that changes are deliberate rather than accidental. Otherwise, version drift becomes a hidden source of user pain.

Finally comes Uninstall. This is where many ecosystems fail. Designers assume uninstall is trivial—just delete files or remove a registry entry. In reality, uninstall must reverse as much of the extension’s footprint as possible. Temporary files, cached state, service registrations, modified configuration, background daemons, downloaded models, and credential references may all remain if uninstall hooks are not handled properly. This is why skipping uninstall hooks is one of the most common and most damaging lifecycle mistakes.

Why is uninstall so neglected? Because it is not part of the happy path. It is only tested when something is no longer wanted, and by then the original design attention has moved elsewhere. But from the user’s perspective, a bad uninstall is a trust violation. It means the extension system can only add entropy, not remove it cleanly.

OpenCode, OMO, and Claude Code all reveal parts of this lifecycle challenge. Hook systems make activation and deactivation more important. MCP integrations make configuration and uninstall more subtle. Skill systems make installation easy but can hide stale artifacts unless removal is carefully managed. As ecosystems grow, lifecycle completeness becomes a key measure of maturity.

The best design principle is simple: every extension artifact should declare which lifecycle phases it participates in and what cleanup responsibilities it has. If install created state, uninstall must know how to remove or retire it. If activation attached hooks, deactivation must know how to detach them. If upgrade changes schemas, migration logic must be versioned.

Extensibility is often described in terms of power—how much the user can add. A better framing is reversibility. A healthy extension system is one where capabilities can be added, paused, changed, and removed without leaving the host in a mysterious state. That requires lifecycle hooks, not just runtime hooks.

This is also where extension trust is won or lost. Users forgive a lot if an extension can be disabled cleanly and removed fully. They become suspicious when uninstall leaves behind daemons, files, configuration edits, or credential residue. In other words, lifecycle discipline is not only an implementation concern. It is a product trust concern.

Install, configure, activate, operate, deactivate, upgrade, uninstall: this sequence may look mundane, but it is what separates a toy plugin model from an extension platform that can survive real-world use.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 22 — Designing for Extensibility
Token Usage: ~5,700 input + ~1,780 output

22.4 The Simplicity Spectrum

Extension architecture should scale with organizational size. One of the most common mistakes in platform design is building a marketplace-grade plugin system for a two-person team—or, in the opposite direction, trying to run a ten-team ecosystem out of a flat folder of markdown files. The right model depends on how many developers are contributing, how often extensions change, and how much governance is required. A useful rule of thumb is a three-stage simplicity spectrum.

For one to two developers, a flat SKILL.md or markdown-first extension model is usually enough. The goal at this stage is not industrial governance; it is speed, legibility, and low cognitive overhead. If a small team can read the files directly, understand the behavior, and edit quickly, that is often optimal. A folder of skills, commands, or simple manifests works well because the extension surface is still socially manageable. Human coordination substitutes for formal infrastructure.

At this scale, every layer of automation has a cost. A registry service, dependency solver, package signing pipeline, or compatibility matrix may sound mature, but it often slows down the people doing the actual work. Flat files are especially attractive when extensions are mostly instructional artifacts, small prompt bundles, or lightweight configuration additions. They are also easy to debug: open the file, inspect the content, change it, rerun.

For three to ten developers, however, social coordination starts to weaken. Multiple contributors may be editing extensions in parallel. Naming collisions appear. Versioning becomes meaningful. Some extensions now need validation, lifecycle hooks, or metadata beyond what a single markdown file expresses well. This is the zone where a JSON registry plus hooks becomes appropriate.

A registry provides several benefits. It centralizes metadata such as name, version, author, compatibility, dependencies, permissions, and lifecycle events. Hooks add controlled automation: install scripts, activation logic, cleanup behavior, validation steps. At this stage, the platform still does not need a fully decentralized plugin marketplace, but it does need explicit structure. The registry is effectively a table of contents for the extension ecosystem.

This middle stage is important because it is where many systems either overbuild or underbuild. Underbuilding means flat files become chaotic; nobody knows which extension depends on which tool, whether removal is safe, or which versions are compatible. Overbuilding means teams waste time inventing package ecosystems before they have enough contributors to justify them. The registry-plus-hooks model is often the right compromise.

For ten or more contributors, especially across teams or organizations, a full plugin system with discovery and marketplace-style governance starts to make sense. At this point, extensibility is no longer an internal convenience layer. It is an ecosystem. You now care about publication flows, trust models, compatibility policies, semantic versioning, signing, dependency resolution, rating or approval systems, and perhaps even commercial distribution. Search and discovery become first-class problems. Governance is no longer optional.

This does not mean every large system must immediately build a public marketplace. “Marketplace” here means a mature plugin-management layer: discoverability, metadata indexing, version distribution, update policy, and trust controls. Whether it is internal or public is a separate question. What matters is that extensibility has reached a scale where direct human familiarity with every extension is impossible.

The simplicity spectrum can therefore be summarized as follows:

1–2 developers: flat files, markdown-first, low ceremony.
3–10 developers: structured registry, metadata, lifecycle hooks.
10+ contributors: full plugin platform, discovery, trust, governance.

This spectrum is valuable because it reframes simplicity. Simplicity is not “always use the least machinery.” Simplicity means using the least machinery that still preserves control. A flat file model is simple for two people and chaotic for twenty. A marketplace is absurd for a tiny team and necessary for a large ecosystem.

OpenCode’s lean extensibility surface, OMO’s richer but still local skill and hook layering, and Claude Code’s more curated plugin boundaries all fit somewhere on this spectrum. None is universally best. Each reflects a different assumption about ecosystem size and governance needs.

The deeper lesson is organizational. Extensibility architecture is not only a technical decision. It is a coordination decision. The structure that works best is the one that matches the social scale of the builders maintaining it. Good platform designers resist both vanity complexity and false minimalism. They let the extension model evolve as contributor count, trust boundaries, and lifecycle demands evolve.

Another way to say this is that extension architecture should preserve the highest-bandwidth coordination mechanism available at each scale. For tiny teams, that mechanism is direct human understanding. For mid-sized teams, it is structured metadata and predictable hooks. For large ecosystems, it is institutionalized governance. The architecture should not outrun the social system that supports it.

In short: start flat when the team is tiny, introduce registries and hooks when coordination starts to strain, and graduate to a full plugin system only when ecosystem scale truly demands it. Extensibility becomes sustainable when the architecture grows at the same pace as the community around it.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 23 — Balancing Safety and Autonomy
Token Usage: ~6,000 input + ~1,900 output

23.1 Threat Model

Agent safety begins with threat modeling, not with vibes. If a coding agent can read repositories, edit files, run shell commands, access the network, and interact with external systems, then it is not merely a chat interface. It is an operational actor. That means we need to ask the standard security-engineering question: what could go wrong, through which path, and with what blast radius?

Four threats are especially important for coding agents: prompt injection, privilege escalation, approval fatigue, and supply chain attacks. These are not hypothetical edge cases. They arise naturally from the way agents combine language understanding with tool invocation.

Start with prompt injection. Prompt injection occurs when untrusted content influences the model’s behavior in ways the user or platform did not intend. In coding environments, that content may come from README files, issue descriptions, pull request text, comments in source code, web pages, or tool output. The dangerous property of prompt injection is that the model does not inherently distinguish “instructions from the user” from “text found during the task” unless the system explicitly helps it do so. A malicious repository can therefore try to smuggle in instructions like “ignore previous guidance,” “exfiltrate secrets,” or “modify these files.”

Prompt injection is especially dangerous because agent systems are built to read widely. The more autonomous and retrieval-heavy they become, the larger the attack surface grows. A system that can browse the web, inspect repos, and call tools is continuously ingesting potentially adversarial text.

The second threat is privilege escalation. In traditional security, privilege escalation means gaining access or authority beyond what was originally intended. In agent systems, this may happen in several ways. A user may trick the agent into using a high-authority tool for a low-authority task. A tool may expose more power than its name suggests. A plugin or MCP server may operate with broader permissions than the host realizes. Or an agent may chain together benign tools in a way that creates dangerous effective power. This is one reason why explicit capability modeling matters so much. If permissions are fuzzy, escalation becomes easy.

The third threat is approval fatigue. This sounds softer than the previous two, but it is extremely important in practice. If the system asks the user for approval too often, the user learns to click through prompts reflexively. At that point, the permission layer still exists visually but has lost much of its protective value. Approval fatigue is a socio-technical vulnerability: the interface trains the human to become the weak link. Good safety design therefore cannot rely entirely on repeated human interruption. It must reduce unnecessary prompts while preserving meaningful checkpoints.

The fourth threat is supply chain attack. Coding agents are extension-heavy systems. They use plugins, MCP servers, third-party packages, shell tools, remote APIs, and community-contributed skills or commands. Every additional extension path creates another place where malicious or compromised code can enter. A supply chain attack may target the plugin itself, the package it depends on, the remote service it contacts, or the content it downloads. Because agent ecosystems encourage composition, supply chain security is not a side concern. It is core infrastructure risk.

These four threats interact. A malicious MCP server may inject instructions through tool output. A compromised plugin may request broad permissions and create privilege escalation opportunities. Frequent prompts meant to protect the system may instead produce approval fatigue, causing the user to authorize a dangerous action without scrutiny. A strong threat model therefore has to reason compositionally, not just one threat at a time.

There are also secondary threats worth noting: secret exfiltration, destructive file mutation, hidden persistence, audit evasion, and unsafe delegation to subagents. But most of these can be analyzed as consequences or variants of the four core classes above. Prompt injection manipulates reasoning. Privilege escalation expands authority. Approval fatigue weakens human oversight. Supply chain attacks compromise the capability surface.

What does a good threat model imply architecturally? First, treat all external text as potentially adversarial. Second, make authority explicit and narrow. Third, do not overuse approval prompts; use better defaults, classifiers, and sandboxes to reduce prompt spam. Fourth, treat every extension source as part of the attack surface. This means signing, verification, provenance, permission scoping, and uninstall hygiene all matter.

The OpenCode/OMO/Claude Code comparison is revealing here. Open systems gain enormous extensibility but also broaden the supply chain surface. OMO’s richer orchestration increases power and therefore requires stronger boundaries. Claude Code’s stronger safety posture shows what happens when autonomy is built alongside a more explicit threat model. The lesson is not that openness is unsafe. It is that openness without clear threat modeling is naive.

Balancing safety and autonomy begins with honesty. Once agents can act, we must stop pretending the risk profile looks like ordinary chat. A coding agent is closer to a junior operator with shell access than to a search box. Threat modeling is therefore not optional overhead. It is the foundation that determines whether autonomy becomes useful leverage or uncontrolled exposure.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 23 — Balancing Safety and Autonomy
Token Usage: ~6,200 input + ~1,960 output

23.2 Sandboxing is Key

If threat modeling defines what can go wrong, sandboxing defines how much damage remains possible after something goes wrong. For autonomous coding agents, this is the decisive safety layer. Prompt rules can fail. Permission prompts can be misclicked. Tool descriptions can be misunderstood. Only execution containment changes what the agent can materially reach.

For coding agents, two forms of isolation matter above all: filesystem isolation and network isolation. Together, they form the practical core of a meaningful sandbox. If either is missing, the safety story becomes incomplete.

Consider network isolation first. Without network isolation, an agent that gains access to sensitive information can exfiltrate it. It may send secrets to a remote endpoint, upload repository fragments, leak environment variables through API calls, or embed stolen content into seemingly innocent requests. Even if the filesystem is partially constrained, network access still gives the agent an escape valve. In security terms, the sandbox may limit local reach but not outbound leakage.

Now consider filesystem isolation. Without filesystem isolation, the agent can often roam the host environment more broadly than intended. It may read unrelated directories, inspect SSH keys, modify shell configuration, tamper with cached credentials, or alter neighboring projects. Even if network access is blocked, unrestricted filesystem reach can still cause serious local harm or create persistence mechanisms that outlive the session. In other words, without filesystem isolation, the agent may not exfiltrate easily—but it may still escape its intended workspace and damage the machine’s local trust boundary.

That is why a meaningful statement can be made strongly: filesystem isolation plus network isolation is the closest thing to complete security for autonomous coding agents. Not perfect security, of course—nothing in systems is perfect—but the most important containment pair. Without network isolation, the agent can leak. Without filesystem isolation, the agent can wander. Remove either one and the sandbox stops being a real boundary.

Claude Code’s architecture is valuable precisely because it treats OS-level sandboxing as a core control rather than an afterthought. Mechanisms such as bubblewrap on Linux and Seatbelt on macOS show the right direction: shift safety from “please behave” to “you cannot cross this boundary even if you try.” This is what capability-based containment looks like in practice.

OpenCode and OMO, by contrast, illustrate the limits of non-sandbox-first design. Their permission systems, hooks, and tool boundaries are helpful, but they mostly operate at higher layers of abstraction. They are better at deciding what should happen than at preventing what can happen after a reasoning error. That does not make them bad systems. It simply means their autonomy ceiling is lower unless stronger runtime isolation is added beneath them.

Why is sandboxing especially important for agentic coding tools? Because these systems combine risky ingredients that used to be separated: shell access, file mutation, repository traversal, external integrations, and natural-language reasoning. Once these are combined, the classical assumption that “the human is in the loop” becomes less reliable. The more autonomous the system becomes, the more we must rely on hard boundaries rather than soft intentions.

There are tradeoffs. Strong sandboxes can break legitimate workflows. Developers often need access to broader repository trees, local build caches, package registries, or internal network resources. Sandboxing also complicates implementation across operating systems. It creates support burden because failures may look like mysterious tool errors rather than clear permission denials. Commercial products must carefully choose defaults, escape hatches, and user education.

But those costs should be compared with the alternative. Without robust sandboxing, every increase in autonomy multiplies blast radius. A model mistake is no longer just a wrong sentence; it becomes a wrong command with host-level consequences. Sandboxing is what makes increased autonomy politically and operationally tolerable.

This is also why sandboxing and autonomy are not enemies. They are complements. A weakly sandboxed system often has to ask for approval constantly because it cannot trust itself. A strongly sandboxed system can safely automate more because the consequences of error are bounded. In that sense, containment enables autonomy.

The best future architecture for coding agents will probably combine several layers: explicit permissions, good defaults, reduced prompt spam, runtime policy enforcement, and OS-level sandboxing. But among these, sandboxing is the honest layer. It does not care whether the model meant well. It only defines what remains possible.

There is even a strategic product implication here. Teams will only delegate more valuable work to agents once they believe the downside is bounded. Sandboxing is what turns that belief from marketing promise into engineering reality. It is the difference between “the agent usually behaves” and “the agent physically cannot cross certain lines.”

If we want coding agents that are both powerful and trustworthy, sandboxing cannot remain a premium feature or a nice-to-have. It has to become baseline architecture.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 23 — Balancing Safety and Autonomy
Token Usage: ~6,000 input + ~1,910 output

23.3 Capability Declaration Pattern

The safest autonomous systems are not the ones with the fewest capabilities. They are the ones whose capabilities are declared explicitly, scoped narrowly, and granted deliberately. This is the essence of the capability declaration pattern.

In many systems, permissions are implicit. A plugin can do something because it happens to run inside the host. A tool can touch the network because no one explicitly prevented it. A subprocess can inherit ambient credentials because the environment already contains them. This style of security is fragile because authority is accidental. The host may not even know the true effective power of the extension it loaded.

Capability declaration reverses that default. Instead of asking “what can this thing do in practice?”, the system asks “what powers has this thing explicitly claimed, and under what scope?” A well-designed extension or tool should declare whether it needs filesystem read access, filesystem write access, network egress, external API tokens, shell execution, background processing, long-lived storage, or privileged host hooks. The declaration should be machine-readable and reviewable.

Why is this better than implicit permission? Because explicit capability surfaces are easier to reason about, limit, audit, and revoke. They also compose better. If a parent agent delegates a task to a subagent, the runtime can decide whether the subagent inherits all capabilities, a subset, or a fresh minimal set. Without explicit capability declaration, delegation tends to inherit ambient authority, which is exactly the wrong default.

This pattern becomes especially powerful when paired with scoped, time-limited tokens issued through a credential proxy. Instead of giving the agent permanent raw credentials, the system can mint temporary credentials for a narrow purpose: read one repository, call one service, upload one artifact, perform one deployment step, and expire soon after. The credential proxy acts as a gatekeeper between the autonomous runtime and long-lived secrets.

This is a major improvement over the common anti-pattern of injecting broad API keys into the environment and hoping the model behaves. Ambient secrets are dangerous because they are invisible in the effective capability model. The agent may not have been “granted” deployment authority in any explicit sense, yet if the environment contains a production token, it effectively has that authority. Capability declaration aims to eliminate this mismatch between stated power and actual power.

The credential-proxy model also improves auditing. If every token is minted for a specific purpose, scope, and duration, logs become meaningful. Security teams can answer questions like: which agent requested access, for what operation, against which resource, and for how long? This is much harder when extensions inherit static credentials from environment variables.

Another benefit of capability declaration is product clarity. Users can understand what an extension, agent, or MCP server is asking for. Instead of vague trust decisions—“install this tool?”—the system can present concrete claims: needs read-only repo access, needs outbound web access, does not need file writes, requests temporary issue-tracker token. This supports informed consent and better defaults.

OpenCode, OMO, and Claude Code all expose reasons to care about this pattern. Open ecosystems tend to accumulate many tools and extensions; explicit declarations help prevent hidden authority creep. OMO’s role-based design already hints at capability separation by agent type. Claude Code’s stronger safety focus suggests the next step: move from ad hoc permission prompts toward more formal capability modeling. The lesson is not specific to any one platform. Any agent system with third-party extensions will eventually need this discipline.

There is an implementation challenge, of course. Capability declarations can become verbose. Developers may request overly broad scopes “just in case.” Hosts must define a standard vocabulary. Users may become desensitized if every extension asks for many granular permissions. But these are design challenges, not arguments against the pattern. They are reasons to create good abstractions, not to return to ambient authority.

The broader architectural principle is simple: explicit capabilities are safer than implicit permissions. And temporary scoped credentials are safer than inherited permanent secrets. Together, these ideas form a path toward agent systems that can be both powerful and governable.

There is also a useful cultural side effect. Once teams must declare capabilities explicitly, they begin designing tools more narrowly. The question shifts from “what might we eventually need?” to “what is the smallest authority that lets this capability be useful?” That change in development culture is often as valuable as the enforcement mechanism itself.

If sandboxing defines where the runtime can go, capability declaration defines what powers it may request within that space. Both are needed. But capability declaration is the layer that makes the system legible. It turns hidden authority into reviewable authority, which is a prerequisite for meaningful autonomy.

Model: openai/gpt-5.4
Generated: 2026-04-01
Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead
Chapter: 23 — Balancing Safety and Autonomy
Token Usage: ~6,300 input + ~1,980 output

23.4 Lessons from Claude Code

Claude Code’s safety architecture matters not because it is perfect, but because it demonstrates a crucial principle: safety and usability are not opposites when the safety mechanisms are designed well. In fact, some safety features make autonomy more usable precisely because they reduce friction.

One of the most striking examples is Anthropic’s reported 84% permission reduction through its machine-learning-based permission classifier and broader safety design. The exact internal definition behind that number matters less than the design lesson. A strong classifier can reduce unnecessary approval prompts by recognizing low-risk actions more accurately. This addresses a serious real-world problem: approval fatigue.

Why is that so important? Because naïve permission systems often punish both the user and the product. If every harmless file read or routine command requires confirmation, users become habituated and approvals lose meaning. Safety theater replaces safety. Claude Code’s approach points toward a better path: use smarter classification and stronger containment so that the user is interrupted less often, but with higher signal when interruption does occur.

This is where the second lesson appears: sandboxing enables autonomy. Claude Code’s OS-level sandboxing does not merely reduce risk; it allows the product to be more confident about letting the system act. If filesystem and network reach are bounded, then many actions that would otherwise seem too risky become tolerable. Containment lowers the expected damage of mistakes, which means the product can automate more without asking permission every few seconds.

That is a profound inversion of the usual safety-versus-usability narrative. Weak safety often forces high friction. Strong safety can lower friction. The user experiences the latter as better usability, even though the architecture underneath is actually more constrained.

Claude Code also illustrates the value of productized safety. Instead of exposing every control as a raw configurable primitive, it integrates safety into the experience: curated hooks, bounded subagents, permission logic, sandboxing, and more deliberate extension surfaces. This reduces the amount of security engineering every end user must do for themselves. Open systems offer more freedom, but they also push more safety responsibility onto the builder or operator.

There is a lot for open ecosystems to learn from this. OpenCode and OMO show the power of extensibility, orchestration, and rapid innovation. Claude Code shows that once autonomy becomes serious, safety cannot remain mostly a prompt-layer concern. It must move downward into runtime architecture, classifier infrastructure, and OS-level containment. The lesson is not “commercial closed systems are safer by nature.” The lesson is that serious autonomy eventually demands serious controls.

Another useful lesson is that safety should be measured not only by how much it blocks, but by how well it preserves legitimate workflow throughput. A system that prevents one risky action but slows all normal work to a crawl will lose user trust just as surely as an unsafe system. Claude Code’s permission reduction number matters precisely because it suggests safety mechanisms were tuned to protect without constantly interrupting. That is the kind of metric future agent systems should care about.

Claude Code also implies a design sequence. First, define the threat model. Second, contain the runtime. Third, classify and reduce unnecessary prompts. Fourth, expose extensibility in bounded ways. Fifth, let autonomy expand only inside those guardrails. This sequence is wiser than starting with raw power and bolting on approvals later.

For OMO-style systems in particular, the lesson is not to abandon rich orchestration. It is to pair orchestration with stronger underlying safety primitives. Multi-agent systems amplify both capability and risk. The better the orchestration becomes, the more the platform needs sandboxing, explicit capabilities, extension provenance, and smarter approval policy beneath it.

The broader takeaway is simple but important: the best agent products will not choose between autonomy and safety. They will use safety architecture to make autonomy acceptable. Sandboxes, classifiers, scoped permissions, and curated extensibility all contribute to this goal. Safety is not merely a brake pedal. It is part of the steering system.

Claude Code is therefore valuable as a case study in maturity. It shows that a powerful coding agent can become more usable by becoming more constrained in the right places. The future does not belong to the least restricted agent or the most paranoid one. It belongs to the systems that know exactly where to constrain so that users can trust what remains free.

For builders, that is perhaps the most important lesson of all: the goal is not to bolt safety onto autonomy, but to architect autonomy inside safety. Once those layers are aligned, the user no longer experiences safety as obstruction. They experience it as confidence.

24.1 From Code Generation to Software Engineering

Model: gpt-5.4 (openai/gpt-5.4) Generated: 2026-04-01 Book: Claude Code VS OpenCode Chapter: 24 — What’s Next Token Usage: N/A (runtime does not expose exact token counts in this environment)

The next frontier for coding agents is not prettier autocomplete. It is a transition from isolated code synthesis to software engineering execution. The first generation of tools proved that a model could generate a function, a test, a component, or a shell command from local context. That was useful, but still narrow. Real engineering work is rarely local. Requirements are incomplete, abstractions leak across modules, tests expose hidden coupling, infrastructure constrains design, and deployment reveals assumptions that looked harmless inside the editor. Future agents will be judged not by how quickly they emit code, but by how reliably they move work through the entire lifecycle.

This is why benchmarks such as SWE-bench Pro matter. Earlier benchmark culture often rewarded “find a patch that makes a failing test pass.” SWE-bench Pro is more directionally important because it favors tasks that resemble actual repository work: long-horizon issue resolution, tool usage under uncertainty, plan revision, and validation beyond one lucky edit. In other words, it measures whether an agent can operate like a software engineer inside a messy codebase rather than like a code completion engine in a vacuum.

24.1.1 Why code generation was the easy layer

Code generation turned out to be the easy part because local code is highly patterned. Syntax is regular, APIs repeat, frameworks teach a shape, and neighboring files reveal style. A strong model can often infer the next 30 lines with impressive fluency. But software engineering demands much more than local fluency. It requires the ability to reason about:

cross-file consistency,
hidden business constraints,
backward compatibility,
build and test pipelines,
deployment environments,
operational risk,
and whether a proposed change should exist at all.

That last point is underrated. Human engineers do not merely produce code; they reject bad approaches, reduce scope, sequence work, and preserve system stability. A future agent must increasingly do the same. Otherwise it remains a fast typist attached to a weak planning loop.

24.1.2 SWE-bench Pro as a signal of the next capability stack

SWE-bench Pro is valuable less as a leaderboard and more as a specification of emerging expectations. To perform well on realistic long-horizon tasks, an agent needs at least five capabilities.

First, it needs repository comprehension. It must build a mental model of architecture, ownership boundaries, naming conventions, and code paths. Second, it needs persistent planning. Long tasks fail when early intent is forgotten after ten tool calls and two test runs. Third, it needs disciplined tool use. Strong results come from sequencing search, read, edit, diagnostics, and verification well. Fourth, it needs verification loops. A patch is not done because one command returned green; the agent must check for regressions, style violations, and broken assumptions. Fifth, it needs recovery behavior. Real engineering includes dead ends, rollbacks, hypothesis changes, and partial success.

Benchmarks that stress these abilities quietly redefine the field. They reward process quality, not only model intelligence.

24.1.3 Multi-file refactoring as the true stress test

If one task reveals the gap between code generation and engineering, it is multi-file refactoring. Renaming a concept, changing an interface, or moving a responsibility across a repository sounds mechanical, but it rarely is. A symbol may appear in source, tests, configuration, generated types, docs, telemetry, migrations, and deployment scripts. One interface update can ripple through serialization logic, API contracts, CLI output, and alerting rules.

Reliable refactoring therefore requires more than text replacement. It needs symbol-aware navigation, dependency tracing, semantic search, staging discipline, and verification at each boundary. This is why the surrounding agent architecture matters so much. A model alone does not guarantee consistent refactors; the agent needs tools like LSP-based rename, diagnostics, AST-aware search, structured diffs, and safe edit primitives.

In practice, future agents will be compared on whether they can preserve global consistency. Humans forgive a generated snippet that needs polishing. They do not forgive a broad refactor that leaves the repository internally contradictory.

24.1.4 Full lifecycle coverage is the real end state

The more important shift is that engineering is a lifecycle, not an edit. A mature coding agent will need usable coverage across at least six stages:

Requirement shaping: clarify ambiguous goals, detect missing constraints, and decompose work.
Design and planning: propose architecture, migration strategy, sequencing, and rollback paths.
Implementation: make coordinated multi-file changes with style and abstraction awareness.
Verification: run diagnostics, tests, builds, and targeted inspections.
Delivery: update docs, changelogs, CI config, release notes, and deployment artifacts.
Operations: help monitor outcomes, analyze incidents, and assist with follow-up remediation.

Today, most agents are strongest in stage three and partially useful in stage four. The long-term winners will cover all six stages well enough that the user feels they are collaborating with a systems-oriented engineer rather than a code-generating appliance.

24.1.5 From patching to ownership

This implies a deeper conceptual shift: future agents will be evaluated on ownership behavior. Ownership means preserving intent across time, noticing adjacent tasks, identifying risks before they surface, and finishing supporting work instead of stopping at the first passing output. A human engineer who only edits code but ignores tests, docs, migrations, and release impact is considered incomplete. Agents will face the same standard.

This is also where product design matters. Systems such as OpenCode and Oh-My-OpenCode emphasize open extensibility and orchestration; Claude Code emphasizes tighter integration, safety, and enterprise-grade workflow support. Those differences affect how quickly each can grow from local coding assistant into fuller engineering collaborator. Open systems can absorb new capabilities fast; commercial systems can productize workflow reliability faster. The likely future combines both patterns.

24.1.6 What must improve next

For the shift to software engineering execution to become real, four technical gaps must close.

The first is long-duration memory with bounded drift. Agents need to remember project intent over many steps without hallucinating stale facts. The second is stronger semantic editing. Refactors must track meaning, not just strings. The third is better environment awareness. Build, dependency, test, and deployment context must become first-class inputs rather than late-stage surprises. The fourth is trustworthy stopping behavior. Agents need to know when a task is done, when it is unsafe, and when escalation is required.

These are not just model problems. They are architecture, tooling, and UX problems.

24.1.7 The likely outcome

The category name “AI coding agents” may itself become too small. The systems that win will not merely write code. They will read issues, inspect architecture, plan changes, coordinate tools, manage validation, update documentation, and participate in release and maintenance loops. That is software engineering behavior.

So the next phase is not a world where models replace engineering. It is a world where the unit of automation expands from the line, to the file, to the pull request, and eventually to the workflow. SWE-bench Pro is one signal of that movement. Multi-file refactoring is one stress test. Full lifecycle coverage is the destination. The agent that reaches that destination first will not just generate better code; it will change what teams mean by “doing software work.”

24.2 “Vibe Coding” and the Evolving Developer Role

Model: gpt-5.4 (openai/gpt-5.4) Generated: 2026-04-01 Book: Claude Code VS OpenCode Chapter: 24 — What’s Next Token Usage: N/A (runtime does not expose exact token counts in this environment)

“Vibe coding” is the popular label for a real shift: developers increasingly describe intent, examples, and desired user experience, while the agent fills in much of the implementation detail. The term is playful, but the underlying change is serious. It captures a move away from hand-authoring every line toward steering the system through goals, feedback, and constraints. In that world, the developer’s value does not disappear. It moves upward.

24.2.1 What vibe coding actually means

Vibe coding does not mean “coding without discipline.” At its best, it means using natural language, sketches, references, and rapid iteration to reach working software faster. A developer may say: build a settings page that matches the current design system, supports dark mode, persists preferences, and includes tests. The agent produces a first version. The developer then critiques structure, product fit, edge cases, and trade-offs instead of manually typing boilerplate.

This changes the unit of work from “authoring statements” to “steering outcomes.” The developer becomes less of a syntax producer and more of a director, reviewer, and systems thinker.

24.2.2 Why the role shifts upward, not away

Many historical automation waves removed low-level manual effort while increasing the value of higher-order judgment. Compilers did not eliminate programmers; they eliminated machine code entry. Frameworks did not eliminate software design; they standardized repetition. AI agents will likely do the same. They reduce the scarcity of implementation labor for common patterns, which means the remaining scarce skills rise in importance:

deciding what to build,
choosing architecture,
setting constraints,
supervising correctness,
aligning with product goals,
and managing organizational risk.

The developer role therefore shifts toward architecture, supervision, and product interpretation.

24.2.3 Architecture becomes more central

When implementation gets cheaper, architecture becomes more valuable because it determines how much generated work composes cleanly. A strong agent can rapidly create ten modules. It can also rapidly create ten incompatible modules if the system boundaries are unclear. Developers will increasingly spend effort on defining interfaces, data ownership, service boundaries, testing strategy, migration plans, and extension points.

This is one reason open systems remain important. OpenCode and Oh-My-OpenCode make internal mechanics more visible, letting advanced users shape toolchains and orchestration to match architectural needs. Claude Code shows the opposite strength: when the environment is more integrated and opinionated, teams can standardize review and control. In both cases, the human role is not typing faster. It is establishing the structure within which fast generation stays safe.

24.2.4 Supervision becomes a first-class engineering skill

Future developers will need to be good at supervising non-human workers. That means more than reviewing diffs. It means setting task scope, checking assumptions, asking for intermediate plans, validating evidence, and knowing when to force a narrow change versus when to allow wider exploration.

Good supervision includes:

decomposing tasks so the agent can succeed,
supplying the right context and constraints,
requiring verification before acceptance,
spotting false confidence,
and preserving coherence across multiple agent runs.

This looks a little like code review, a little like tech lead work, and a little like product management. The exact mix depends on the organization, but the direction is clear: the developer becomes the governor of an execution system.

24.2.5 Product thinking moves closer to coding

Vibe coding also compresses the distance between product intent and implementation. If an agent can turn a paragraph into a prototype, the bottleneck shifts toward specifying the right user outcome. Developers who understand user journeys, business constraints, and operational implications will outperform developers who only know how to implement a ticket literally.

This does not mean every developer becomes a PM. It means product sensitivity becomes a stronger engineering advantage. The best prompts, plans, and reviews will come from people who understand what success means for the user, not just what the story says.

24.2.6 What skills decline, what skills rise

Some skills become less differentiating. Memorizing syntax, writing repetitive CRUD by hand, and manually plumbing common framework patterns will matter less. Other skills become more differentiating.

Rising skills include:

architecture and interface design,
debugging generated systems,
evaluation design,
prompt and context shaping,
repository-level reasoning,
security judgment,
and communication across engineering and product.

In short, developers will be valued more for taste, judgment, and systems thinking than for raw line-by-line throughput.

24.2.7 The danger of shallow vibe coding

There is, however, a weak form of vibe coding: asking the agent for fast output without understanding the resulting system. That produces demo velocity and maintenance debt. Teams can ship attractive surfaces while quietly losing architectural clarity, test trust, and security posture. The risk is not that AI writes code. The risk is that humans accept code without preserving engineering accountability.

This is why future organizations will probably separate “agent-assisted speed” from “agent-governed rigor.” Mature teams will formalize review gates, verification policies, and ownership models. They will treat agent output as accelerant, not exemption from engineering standards.

24.2.8 The likely new workflow

A plausible default workflow for 2026 looks like this:

The developer frames the problem, constraints, and desired outcome.
The agent proposes a plan and initial implementation.
The developer reviews architecture, naming, edge cases, and trade-offs.
The agent performs refactors, tests, docs, and cleanup.
The developer approves, redirects, or rejects based on product and system impact.

This is still software development, but the distribution of labor changes. The agent handles more execution. The human handles more direction.

24.2.9 From maker to orchestrator

So the future developer is not disappearing. The role is becoming more leveraged and more managerial in the best sense of the term. Great engineers will still build things, but increasingly by orchestrating models, tools, and verification loops around a strong internal model of product and system design. Vibe coding is not the end of engineering seriousness. It is the moment when serious engineering moves from keystrokes to control.

The developers who thrive will not be those who resist AI, nor those who surrender all judgment to it. They will be those who can turn fast generative capability into coherent, durable, user-aligned systems.

24.3 Convergence Trends

Model: gpt-5.4 (openai/gpt-5.4) Generated: 2026-04-01 Book: Claude Code VS OpenCode Chapter: 24 — What’s Next Token Usage: N/A (runtime does not expose exact token counts in this environment)

The future of coding agents will not be defined by total fragmentation. It will be defined by convergence. Open-source systems, orchestration layers, and commercial agents are currently differentiated by philosophy, pace, and packaging, but they are steadily borrowing from one another. The likely result is not one winner and two losers. It is a shared baseline architecture with different optimizations layered on top.

24.3.1 Open source keeps pushing security forward

A common assumption is that commercial systems will lead on safety while open systems focus only on flexibility. That is too simple. Open-source ecosystems often move security forward precisely because their internals are inspectable. Tool definitions, permission boundaries, prompt assembly, shell wrappers, and extension mechanisms can be audited by the community. Weak patterns become visible. Good patterns spread fast.

In the coding-agent world, open systems also make it easier to experiment with sandboxing, capability scoping, command allowlists, and safer file operations. Once those patterns prove useful, they are copied widely. Open source therefore acts as a high-speed laboratory for practical security design.

24.3.2 Commercial systems keep pushing extensibility forward

At the same time, commercial systems are no longer defined only by closed polish. They increasingly adopt extensibility because serious users demand it. Enterprise teams want custom commands, internal workflows, organization-specific tools, model routing, repository memory, and integration with ticketing, CI, observability, and documentation systems. A coding agent that cannot adapt to those environments will hit a ceiling.

This is why commercial products often move from “fixed assistant” toward “controlled platform.” Claude Code is a strong example of this pressure: it combines opinionated product design with growing surfaces for tools, hooks, commands, skills, and task orchestration. The competitive center of gravity is shifting from closed experience to governed extensibility.

24.3.3 MCP becomes the universal connector

The strongest convergence trend is the rise of MCP, the Model Context Protocol, as a universal interface layer. MCP matters because it separates three concerns that were previously tangled together: model reasoning, tool access, and application integration. Once tool invocation can be exposed through a standard protocol, ecosystems grow faster. One tool server can work across many hosts. One host can consume many external capabilities. Innovation stops being trapped inside a single vendor boundary.

That is why MCP increasingly looks like the “USB-C” layer of the agent world. It does not eliminate all platform differences, but it creates a shared substrate for extension. Open systems embrace it because it lowers integration cost. Commercial systems embrace it because it expands ecosystem reach without exposing every internal mechanism.

24.3.4 Shared baseline, differentiated orchestration

If MCP becomes universal, what still differs? Mostly orchestration quality. Future systems may converge on a similar baseline stack:

chat plus planning loop,
tool calling,
context management,
memory layers,
standardized external protocols,
and configurable permission controls.

Above that shared baseline, products will differentiate through orchestration strategy. Some will optimize for single-agent simplicity. Others will optimize for multi-agent delegation, background research, specialized roles, or long-running workflows. OpenCode and Oh-My-OpenCode already point toward one end of that spectrum. Claude Code points toward a more integrated but still increasingly extensible end.

24.3.5 Open and commercial are trading strengths

An interesting pattern is that the sides are trading traditional strengths. Open-source systems are learning that security, guardrails, and trust are essential for adoption beyond enthusiasts. Commercial systems are learning that extensibility, transparency, and composability are essential for advanced users. This does not erase the open-versus-closed distinction, but it narrows the practical gap.

That suggests the market is maturing. Early products often compete on ideology. Mature products compete on workflow fit, reliability, and ecosystem depth.

24.3.6 What convergence means for users

For users, convergence is good news. It reduces lock-in at the tool layer, makes skills more portable, and encourages a healthier division between core agent runtime and external capability servers. Teams may eventually choose a host environment the way they choose an editor or CI platform: based on workflow preference and governance needs, not because the surrounding ecosystem is impossible to replace.

It also means architecture decisions become more future-proof. Building internal tools on standard protocols is safer than depending on one vendor’s custom API surface. MCP is especially important here because it lets organizations invest in reusable capability endpoints rather than repeatedly rebuilding integrations for each agent host.

24.3.7 What may not converge

Not everything will converge. Three layers will likely remain differentiated.

First, trust model. Commercial products may continue to lead in enterprise certification, compliance packaging, and centralized governance. Second, developer philosophy. Open systems will remain more hackable and inspectable. Third, workflow opinionation. Some tools will stay deliberately minimal; others will embed stronger process assumptions.

So convergence does not mean sameness. It means the interfaces and baseline capabilities become more interoperable while values and UX remain distinct.

24.3.8 The likely end state

The most plausible end state is a layered ecosystem. MCP or something very much like it becomes universal. Memory, tasks, tools, and permissions become increasingly standardized. Open-source projects continue to generate the fastest ideas in safety and orchestration. Commercial systems productize those ideas, harden them, and deliver them to organizations that need reliability at scale. Users move more easily between environments, and the best ideas spread faster across the whole field.

In that future, the major question is no longer whether coding agents will converge. They already are. The real question is which layer each ecosystem will own: protocol, orchestration, governance, or product experience. The winners may end up owning different layers simultaneously.

24.4 Unsolved Challenges

Model: gpt-5.4 (openai/gpt-5.4) Generated: 2026-04-01 Book: Claude Code VS OpenCode Chapter: 24 — What’s Next Token Usage: N/A (runtime does not expose exact token counts in this environment)

Coding agents have improved quickly, but the hardest problems are still ahead. The next wave of progress will not come from small gains in output fluency. It will come from solving the reliability gaps that stop teams from trusting agents with larger scopes of work. Four challenges stand out: cross-session memory, consistent refactoring, cost predictability, and inter-agent trust.

24.4.1 Cross-session memory without drift

Most agents still work best inside a bounded session. Once the context window resets or a task spans multiple days, continuity becomes fragile. The system may forget why a decision was made, repeat already-failed ideas, or preserve the wrong details while losing the important ones.

Cross-session memory sounds easy: just save notes. In practice, it is hard because memory must be both durable and selective. If an agent stores too little, it loses context. If it stores too much, it accumulates noise. If it stores the wrong summary, it anchors future work to an inaccurate premise. Good long-term memory therefore needs ranking, aging, correction, provenance, and explicit uncertainty.

This is one of the biggest open design areas for future agents. Teams need systems that remember intent, architecture decisions, and unresolved risks across time without freezing mistakes into policy.

24.4.2 Consistent refactoring at repository scale

Agents can already make impressive local edits. They are much less reliable at preserving consistency across a large repository. A broad refactor tests almost every weakness at once: naming discipline, symbol tracking, dependency awareness, test judgment, and stopping behavior.

The unsolved problem is not merely “can the agent rename a symbol.” It is “can the agent update all affected semantics, validate all boundaries, avoid collateral damage, and know what remains unverified.” That requires stronger semantic tooling, better incremental planning, and richer confidence reporting. An agent should be able to say: I changed twelve files, verified seven code paths, but two integration boundaries remain uncertain because deployment config is implicit.

Until that level of self-awareness is normal, large-scale refactors will remain supervision-heavy.

24.4.3 Cost predictability and economic control

One under-discussed barrier is cost predictability. A human engineer can roughly estimate how long a task may take. Agent systems are still much harder to forecast. Long tasks can balloon through repeated tool calls, redundant exploration, failed plans, background agents, and oversized contexts. A workflow that looked cheap in a demo may become expensive in production.

Organizations need better economic controls:

budget-aware planning,
task-level cost estimation,
adaptive model routing,
stop conditions tied to value,
and post-run attribution.

This matters strategically. An agent is only truly useful when teams can predict not just whether it may work, but whether it is worth the cost relative to human time. Cost control is therefore not a billing feature. It is part of the decision engine of the product.

24.4.4 Inter-agent trust and delegation quality

As systems become multi-agent, a new problem appears: how should agents trust one another? One subagent may explore, another may edit, another may verify. But if the parent agent cannot assess the quality of delegated work, parallelism creates confusion rather than leverage.

Inter-agent trust requires more than passing text summaries around. It needs explicit contracts: what was asked, what evidence was gathered, what files were touched, what remains uncertain, and how confidence was derived. Without this, subagents become opaque helpers. With it, they become inspectable collaborators.

This challenge is particularly important for systems like Oh-My-OpenCode that emphasize orchestration. Delegation is powerful only when the resulting knowledge can be integrated without ambiguity.

24.4.5 Verification remains the bottleneck

All four challenges connect to a deeper issue: verification still lags generation. Agents can produce candidate solutions faster than they can prove those solutions safe and complete. Memory errors persist because summaries are weakly verified. Refactors fail because semantic consistency is under-checked. Costs explode because exploration loops lack value-aware stopping rules. Multi-agent systems drift because subagent outputs are not strongly validated.

The field often frames the problem as reasoning quality. In practice, it is just as much a verification architecture problem.

24.4.6 What good solutions might look like

Future progress will likely combine several patterns.

For memory: explicit project journals, decision records, uncertainty markers, and retrieval weighted by recency and authority. For refactoring: symbol-aware tools, dependency maps, automated impact sets, and verification plans generated before edits begin. For cost: per-task budgets, cheap-first exploration, model escalation only when necessary, and transparent accounting. For inter-agent trust: typed delegation outputs, evidence bundles, confidence fields, and parent-level adjudication.

None of these are glamorous, but all of them are necessary. The next generation of winning products may be less defined by raw model novelty than by boring infrastructure that makes ambitious behavior dependable.

24.4.7 The real challenge is calibrated trust

Ultimately, users do not need agents that are always perfect. They need agents whose strengths and weaknesses are legible. The hard problem is calibrated trust: knowing when the system is likely right, when it is uncertain, and when human oversight must increase. Cross-session memory, refactoring, cost, and delegation all collapse into that same requirement.

That is why these are still open challenges. They are not edge cases. They determine whether coding agents remain useful assistants or become dependable engineering systems. The future of the field depends less on making agents look smarter and more on making their behavior measurable, bounded, and trustworthy.

25.1 The Starting Decision Tree

Model: gpt-5.4 (openai/gpt-5.4) Generated: 2026-04-01 Book: Claude Code VS OpenCode Chapter: 25 — Building Your Own Token Usage: ~4,400 input + ~1,400 output

If you want to build your own coding agent, do not begin with a feature list. Begin with a decision tree. A feature list pushes you toward accumulation: more tools, more models, more modes, more automation. A decision tree forces you to answer the harder question first: what kind of system are you actually trying to build?

Chapter 25 is practical by design. The goal here is not to admire mature systems like OpenCode, Oh-My-OpenCode, or Claude Code. The goal is to help a builder choose a sane first architecture. Most failures in agent projects come from choosing the wrong scope before writing the first stable loop.

25.1.1 First Branch: Fork an Existing Host or Start Fresh?

For most teams, this is the highest-leverage decision.

You should fork or extend an existing host such as OpenCode when your main innovation is not the host itself. If you mostly care about orchestration, prompting, workflow, permissions, skills, or UI polish, reusing a host is usually the right move. You inherit a working chat loop, provider integration, tool interfaces, session handling, and often MCP support. That can save months.

You should start fresh only when you can name concrete architectural conflicts. Maybe you want an extremely small local agent with almost no abstraction. Maybe your target domain is not software engineering in general but a narrow workflow such as firmware repair, infrastructure runbooks, or legal-document review. Maybe the host assumes a tool system, session model, or UI structure that actively fights your design.

The rule is simple: if an existing host accelerates you by 70 percent and constrains you by 30 percent, take the leverage. If it accelerates you by 30 percent and constrains you by 70 percent, rebuild.

25.1.2 Second Branch: Single-Model Deep or Multi-Model Broad?

The next decision is about optimization strategy.

Single-model deep means you design the whole system around one primary model family. Your prompts, tool descriptions, compaction strategy, retry logic, and output formatting are all tuned for that model. This usually produces the strongest early experience. It is also how many good systems begin, even if they later add abstraction.

Multi-model broad means you build a provider layer and capability abstraction from the beginning. That sounds attractive, but it is expensive. Different models vary in tool reliability, context window, cost, latency, formatting stability, and how well they follow multi-step instructions. A generic abstraction can easily become a lowest-common-denominator layer.

If you are validating product quality, go single-model first. If you are building infrastructure for others, you may need multi-model support earlier. Even then, pick one model as your primary optimization target. “Supports many” is not the same as “works well on any.”

25.1.3 Third Branch: Assistant or Autonomous Worker?

This branch is really about how tightly humans stay in the loop.

An assistant handles shorter loops. The human stays near the controls, approves important operations, and checks results frequently. An autonomous worker accepts a larger goal and executes longer chains with fewer interruptions.

High autonomy is appealing, but it raises the bar on every subsystem. You now need stronger stop conditions, better failure recovery, more verification, clearer cost control, and more careful trust calibration. In other words, autonomy is not a single feature. It is a systems property built on top of many boring safeguards.

If you are building your first agent, default toward an assistant. A reliable semi-autonomous system is usually more useful than an ambitious autonomous one that cannot recover cleanly from error.

25.1.4 Fourth Branch: Product or Platform?

Another common mistake is deciding too early that your agent must be a platform.

If you build a product, you optimize for coherence. The tool set is constrained, the behavior is opinionated, and the experience is easier to document and support.

If you build a platform, you need extension points: plugins, hooks, third-party tools, versioning, permissions, compatibility promises, and some form of ecosystem governance. Platforms create leverage, but they also create obligations. A plugin API is a long-term maintenance promise, not just a developer convenience.

Do not choose platform because it sounds more strategic. Choose it only if outside extension is central to the product thesis.

25.1.5 Fifth Branch: Single Agent or Multi-Agent?

Many builders ask this too early.

A single-agent system is easier to reason about, cheaper to run, and easier to debug. A multi-agent system can improve throughput or quality when tasks naturally split into independent subproblems, such as exploration, retrieval, implementation, and review.

But multi-agent design introduces overhead: delegation logic, context transfer, merge behavior, duplicate work, inconsistent assumptions, and token multiplication. If your single agent still struggles to read files, maintain a plan, and verify outputs, adding subagents will usually amplify confusion rather than solve it.

A good rule is:

Start single-agent.
Add role prompts before real subagents.
Add true delegation only when recurring workloads clearly benefit from separation.

25.1.6 A Practical Decision Tree

Here is a usable starting tree:

Is this a learning vehicle or a maintained product?
If it is a product, can an existing host accelerate the first release?
Do we need early quality on one model, or portability across many?
Will a human remain in the loop most of the time?
Are third-party extensions essential now, or later?
Is there already a proven workload that needs multiple agents?

Each branch is a tradeoff: leverage vs conceptual purity, depth vs portability, autonomy vs control, extensibility vs simplicity, and orchestration power vs operational clarity.

There is no universally correct answer. There is only a correct answer for your current stage.

25.1.7 Recommended Default for Most Builders

For most teams, the best starting path is conservative: extend an existing host, optimize for one model, stay mostly single-agent, keep the initial tool set small, and postpone platform ambitions.

This path is not glamorous, but it maximizes learning speed. It lets you observe real failure modes before hardening abstractions. That matters because early abstractions in agent systems are often wrong. They are shaped by imagined needs rather than repeated evidence.

The builders who move fastest are rarely the ones who start with the biggest architecture. They are the ones who make the smallest number of irreversible decisions.

25.2 Minimum Viable Agent

Model: gpt-5.4 (openai/gpt-5.4) Generated: 2026-04-01 Book: Claude Code VS OpenCode Chapter: 25 — Building Your Own Token Usage: ~4,500 input + ~1,450 output

If Chapter 25 has one central recommendation, it is this: start with a minimum viable agent, not a maximum imaginable architecture. The early goal is not to reproduce everything seen in OpenCode, Oh-My-OpenCode, or Claude Code. The goal is to build a small system that can do real work, fail in observable ways, and teach you what to add next.

The best place to start is a ReAct loop plus five core tools: read, write, bash, grep, and glob.

25.2.1 Why ReAct Is Still the Right Core

ReAct means the agent alternates between reasoning and acting: think, act, observe, update, repeat. That structure is not fashionable jargon. It is a practical anti-hallucination device. A model that only “thinks” tends to invent missing details. A model that is forced to inspect files, search code, run commands, and re-check results is anchored to reality.

In practice, the loop is straightforward:

Interpret the user’s goal.
Decide what information is missing.
Use a tool to inspect the environment.
Form or update a plan.
Make one bounded change.
Verify the effect.
Continue or stop.

That is enough structure for a usable engineering assistant.

25.2.2 Why These Five Tools?

The recommended starter set is deliberately boring.

Read lets the agent inspect source files, configs, tests, logs, and docs. Without reliable reading, everything downstream is guesswork.

Write lets the agent create or replace content. Whether implemented as full-file write, structured edit, or patch application, the core point is that the agent must be able to persist changes.

Bash gives access to the real execution surface: builds, tests, package managers, linters, formatters, scripts, containers, and runtime checks.

Grep supports content search. When the agent knows what concept it is looking for but not where it lives, grep is often the fastest route.

Glob supports file discovery. It helps the agent build a map of the repository before it understands the code.

Together, these five tools cover the basic human workflow in an unfamiliar codebase: discover, inspect, search, modify, verify.

25.2.3 What This MVP Can Already Do

A surprising amount.

With just this loop and these tools, an agent can fix small bugs, add narrow features, update tests, adjust configuration, revise documentation, and perform many maintenance tasks. It can inspect a failure, locate relevant files, edit them, run the test suite, and report what changed.

More importantly, it can expose the real weaknesses of your system. You will learn where the model guesses instead of checking, where tool outputs are too noisy, where plans drift, where edits become unsafe, and where verification is too shallow. Those lessons are more valuable than prematurely implementing advanced orchestration.

25.2.4 Reliability Matters More Than Breadth

Many builders misunderstand MVP to mean “few features.” In agent design, MVP should mean minimum complexity that still produces dependable behavior.

That means spending early effort on clear tool descriptions, stable input and output schemas, bounded tool behavior, explicit stop conditions, lightweight plan tracking, and mandatory verification before completion.

An agent with five good tools and disciplined verification is more useful than an agent with twenty tools and unclear behavior.

25.2.5 Growth by Need, Not by Imagination

Once the MVP is useful, you should expand in response to recurring failure, not architectural envy. A sensible order is:

Safer editing: structured diff or patch tools.
Smarter navigation: LSP, AST search, or symbol-aware refactors.
Session continuity: plan persistence and resumability.
Permission controls: especially for command execution and file mutation.
External integration: MCP or service connectors.
Delegation: subagents only after the single-agent loop is stable.

This order mirrors the typical pain curve. Most first-generation agents do not fail because they lack multi-agent orchestration. They fail because they misread files, edit sloppily, lose context, and declare victory too early.

25.2.6 Resist Tool Bloat

Tool bloat is one of the fastest ways to make an agent worse.

Every failure tempts the builder to add a new tool. The agent failed to locate code, so add a code-map tool. It forgot the plan, so add a planner tool. It edited incorrectly, so add a refactor tool. It misread search output, so add a summarizer tool.

Sometimes that is the right response. Often it is not. Many failures come from poor tool descriptions, ambiguous output, weak prompting, or the absence of a verification step. Adding tools can mask the real issue while increasing model choice burden and maintenance cost.

The correct default question is: can the current tool set solve this if used more clearly and more reliably?

25.2.7 A Compact Mental Model

The minimum viable agent has three responsibilities: perceive reality through discovery and reading, change reality through writing and execution, and check reality through testing and inspection.

If your system can do those three things well, you already have the foundation of a real coding agent.

25.2.8 Start Here

So where should a builder begin? Start with the ReAct loop. Give it read, write, bash, grep, and glob. Add strong tool descriptions. Make verification non-optional. Then put the system in front of real tasks.

Do not scale by ambition. Scale by evidence. Watch the agent fail, categorize the failures, and add capability one layer at a time. That is how small systems grow into serious ones without collapsing under unnecessary complexity.

25.3 Pitfalls to Avoid

Model: gpt-5.4 (openai/gpt-5.4) Generated: 2026-04-01 Book: Claude Code VS OpenCode Chapter: 25 — Building Your Own Token Usage: ~4,350 input + ~1,420 output

Once you begin building your own coding agent, the danger is usually not lack of imagination. It is misdirected ambition. Builders see rich systems like OpenCode, Oh-My-OpenCode, and Claude Code and assume maturity comes from feature volume. In reality, many early agent projects fail for the opposite reason: they accumulate complexity faster than they accumulate evidence.

This section covers four recurring traps: tool bloat, over-engineering, weak security thinking, and the belief that one agent should solve everything.

25.3.1 Pitfall One: Tool Bloat

Tool bloat happens when every observed failure produces a new tool. The agent misses a file, so you add a repository map tool. It edits poorly, so you add a patch tool, then a refactor tool, then a review tool. It gets lost in large outputs, so you add a summarizer, a compressor, and a smart-context service.

The result is often worse performance, not better. Every new tool adds model choice complexity. The model now has to decide not just what to do, but which interface to use, how they differ, and when each is safe. Tool overlap also creates subtle instability: two search tools, three edit tools, four ways to fetch structure, each with slightly different semantics.

The cure is discipline. Do not add a tool unless you can show repeated failure that cannot be solved by clearer prompts, tighter outputs, or better composition of existing tools.

25.3.2 Pitfall Two: Over-Engineering the First Version

A second trap is building version three before version one works.

This often looks impressive in architecture diagrams: provider abstraction, skill systems, permission middleware, multi-agent routing, memory tiers, plugin SDKs, background tasks, workflow graphs, evaluation dashboards, and enterprise policy hooks. Each component has merit. The problem is timing.

If the base loop is unreliable, these additions do not create a better product. They create more places for failure to hide. You can spend months perfecting extension surfaces for a system that still cannot reliably inspect files, make bounded edits, and run verification.

The right order is brutally simple: prove a small working loop first. Then add structure where real usage demands it.

25.3.3 Pitfall Three: Ignoring Security Because the System Is Local or Internal

Agent builders often postpone security because the first users are trusted teammates or because the system runs on local machines. This is shortsighted.

Even local agents can execute destructive shell commands, overwrite files, leak credentials through logs, follow malicious instructions embedded in repos, or invoke unsafe external tools. Internal is not a threat model. It is an assumption that tends to break the moment the system becomes useful.

Security for agents should begin with capability boundaries. Which tools can write? Which can execute commands? Which can reach the network? What paths are allowed? Which operations require confirmation? How are secrets filtered from outputs? How are third-party extensions isolated?

You do not need enterprise-grade policy on day one. But you do need a threat model on day one.

25.3.4 Pitfall Four: Trying to Solve Every Use Case

Another common failure mode is universalism. The builder wants one agent to be a coder, researcher, DevOps assistant, browser automation tool, enterprise search layer, documentation writer, data analyst, and terminal copilot.

Broad ambition is understandable, but generality has costs. Different tasks need different tools, different safety boundaries, different evaluation criteria, and sometimes different interaction patterns. An agent optimized for code edits is not automatically optimized for web workflows or data pipelines.

The healthier mindset is to choose a sharp initial workload. Maybe your first agent is great at repository-local engineering tasks. Maybe it is strong at test repair. Maybe it specializes in refactors for one language. Narrow focus is not a weakness. It is how quality emerges.

25.3.5 Secondary Trap: Confusing Demo Success with Product Readiness

Some systems look amazing in curated demos because the repo is clean, the task is narrow, and the verification path is obvious. Real environments are not like that. Real repositories are messy, permissions are inconsistent, scripts fail, dependencies drift, and instructions are ambiguous.

A builder who optimizes only for demo moments will underestimate error handling, resumability, and trust calibration. The result is a product that appears magical until it meets ordinary engineering chaos.

25.3.6 Secondary Trap: Weak Verification

Many agent builders focus heavily on generation and too little on verification. But for coding agents, output without checking is not reliability. It is speculation.

Verification can be simple at first: run tests, inspect diffs, check diagnostics, confirm files exist, or validate schemas. What matters is that verification is part of the loop, not an optional epilogue. An agent that edits confidently but verifies weakly will accumulate hidden defects.

25.3.7 Secondary Trap: Making the Model Responsible for Every Guarantee

Language models are flexible, but they should not carry responsibilities that belong in the scaffold. If safe paths can be enforced by the tool layer, enforce them there. If a dangerous command needs confirmation, require it in the execution layer. If token cost must be tracked, instrument it externally. If outputs need structured parsing, define schemas.

A useful principle is: never ask the model to remember a guarantee that software can enforce.

25.3.8 A Better Default Philosophy

To avoid these traps, adopt a restrained philosophy: keep the first tool set small, add abstractions only after repeated evidence, define a real threat model early, choose a narrow workload before expanding, and make verification mandatory.

The strongest agent systems are not the ones that attempt the most on day one. They are the ones that grow in layers without losing coherence.

If there is one meta-pitfall above all others, it is this: assuming maturity comes from adding more. Often maturity comes from refusing to add more until the current layer is truly understood.

25.4 What to Measure

Model: gpt-5.4 (openai/gpt-5.4) Generated: 2026-04-01 Book: Claude Code VS OpenCode Chapter: 25 — Building Your Own Token Usage: ~4,300 input + ~1,430 output

If you build your own coding agent, measurement is where optimism meets reality. Teams often spend enormous effort on prompts, tools, and orchestration, then evaluate progress informally: it feels smarter, the demo looked good, or it solved three tasks yesterday. That is not enough.

You need a compact set of metrics that captures usefulness, reliability, cost, and operator burden. For an early coding agent, four measures matter most: SWE-bench baseline, task completion rate, human intervention frequency, and token cost per task.

25.4.1 Start with a Shared Baseline: SWE-bench

SWE-bench is not a complete picture, but it is a useful anchor. It provides repository-grounded software engineering tasks with real issue-fix expectations. Even if your internal workload differs, a benchmark like SWE-bench gives you an external reference point.

The goal is not to chase the benchmark at the expense of product value. The goal is to avoid self-deception. If your agent performs dramatically worse than a modest baseline on repository-local repair tasks, that is a signal. If it improves after tool or prompt changes, that is also a signal.

Treat SWE-bench as a calibration instrument, not as the whole map.

25.4.2 Measure Task Completion Rate, Not Just Output Quality

For production usefulness, the most important metric is often task completion rate: how often the agent finishes the assigned task correctly within the allowed workflow.

This metric should be defined clearly. Completion should mean more than produced an answer. Depending on the task, completion may require the right files changed, tests or checks passing, no forbidden side effects, and a final result that a human reviewer accepts.

Why is this metric so valuable? Because coding agents are systems, not text generators. A beautiful explanation attached to a broken patch is not success. Completion rate captures the end-to-end property you actually care about.

25.4.3 Measure Human Intervention Frequency

A second crucial metric is human intervention frequency: how often a person must step in to unblock, redirect, correct, or approve the agent.

Two systems can have similar nominal completion rates while producing very different user experiences. One might solve tasks mostly independently. Another might require constant nudges: clarifying prompts, rerunning commands, fixing paths, restoring broken changes, or manually verifying outputs.

If intervention frequency is high, the agent may not really be reducing work. It may simply be changing the shape of work from coding to supervision.

Useful subcategories include intervention for safety approval, intervention for planning correction, intervention for tool failure, intervention for verification failure, and intervention for context loss or confusion.

These subcategories tell you where the real friction lives.

25.4.4 Measure Token Cost per Task

The third operational metric is token cost per task. This matters because agent systems can become expensive long before they become effective. More tools, longer context, retries, delegated agents, and verbose outputs all increase token use.

Cost per task should include the full interaction, not just the final successful turn. Count exploration, retries, failed attempts, verification loops, and summarization if applicable. Otherwise you will underestimate the real operating profile.

This metric matters for both commercial and internal systems. Commercial products need gross-margin discipline. Internal tools need cost predictability and usage budgeting. In both cases, low-value token burn is a design smell.

25.4.5 Measure the Right Ratios, Not Just Raw Totals

Raw numbers are less useful than normalized views. Consider tracking completion rate by task class, tokens per successful task, interventions per successful task, and cost relative to human time saved.

These ratios help separate expensive but effective from expensive and confused. They also prevent misleading conclusions from cherry-picked tasks.

25.4.6 Add Internal Traces for Diagnosis

The four headline metrics tell you whether the system is improving. To learn why, add lightweight traces: number of tool calls, retry count, verification failures, context size at failure, and common termination reasons.

You do not need a giant observability platform at the start. But you do need enough instrumentation to explain why a task failed or why cost spiked.

25.4.7 Beware Goodhart’s Law

Goodhart’s Law says that when a measure becomes a target, it can stop being a good measure. This applies strongly to agents. If you optimize only for benchmark score, you may overfit task distributions. If you optimize only for low token cost, you may cut verification too aggressively. If you optimize only for low intervention, you may let the agent take unsafe actions.

That is why the four measures should be read together. They form a balance: benchmark calibration, real completion, human burden, and operational cost.

No single number is enough.

25.4.8 What a Good Early Dashboard Looks Like

For an early-stage builder, a useful dashboard can stay simple: SWE-bench or internal benchmark pass rate, task completion rate on real tasks, average human interventions per task, average token usage and dollar cost per task, and a short breakdown of the top failure categories.

That is enough to guide iteration. It tells you whether the system is becoming more useful, more autonomous, and more economical.

The strongest builders do not merely ask whether their agent is impressive. They ask whether it is measurably improving at the work they care about. That shift, from anecdote to instrumentation, is one of the clearest signs that an agent project is growing up.

Model: claude-opus-4-6 (anthropic/claude-opus-4-6) Generated: 2026-04-01 Token Consumption: ~4,200 tokens (output) Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 26 — Designing “Oh-My-Claude-Code”

26.1 Claude Code’s Extension Surface

The Five Official Hooks

As of early 2026, Claude Code exposes exactly five lifecycle hooks — carefully chosen to provide observability and safety without compromising the integrity of the core agent loop:

Hook	Trigger Point	Capabilities
`session_start`	Agent session begins	Inject context, set environment, load preferences
`pre_compact`	Before context compaction	Observe what will be compressed, save critical data
`post_compact`	After context compaction	Verify compaction quality, inject lost context
`post_sampling`	After LLM response received	Observe model output, enforce constraints, log telemetry
`file_changed`	File write/edit detected	Trigger linting, formatting, validation pipelines

These hooks share a common design philosophy: they are observation-oriented. You can see what happened and react to it, but you cannot fundamentally alter the flow of the agent loop itself. There is no hook that fires before the LLM receives a prompt, and no hook that wraps tool execution. This is a deliberate security boundary.

Plugins: The Emerging Expansion Layer

The .claude/plugins/ directory represents Claude Code’s answer to extensibility beyond hooks. A plugin can contribute:

Slash Commands: User-invocable actions (e.g., /review, /deploy, /test) that map to predefined tool invocations or agent delegations. These are the primary user-facing extension point.
Custom Tools: MCP-compatible tool definitions that expand what the agent can do — though these route through MCP rather than registering directly into the tool execution pipeline.
Skills: Structured instruction sets (markdown files with triggers, workflows, and constraints) that teach the agent how to perform complex multi-step tasks. Skills are loaded into context based on pattern matching against user requests.
Hooks: Plugin-scoped lifecycle callbacks that piggyback on the five official hook points listed above.

The plugin model is declarative: you describe what your plugin provides via a manifest, and Claude Code decides when and how to incorporate it. This stands in sharp contrast to OpenCode’s imperative plugin model where plugins directly register functions into the runtime.

Custom Agents: Prompt-Defined Specialization

The .claude/agents/ directory allows defining specialized agents as markdown files. Each agent definition includes:

# Agent Name
System prompt content that defines the agent's personality,
constraints, tool access, and behavioral rules.

## Tools
- List of allowed tools
- MCP servers this agent can access

## Rules
- Behavioral constraints
- Stop conditions

These agents are invocable via the AgentTool — a meta-tool that delegates subtasks to a specified agent definition. This is Claude Code’s multi-agent primitive: not a programmatic orchestration framework, but a prompt-engineering pattern backed by tool delegation.

The key insight is that agent specialization happens entirely at the prompt level. An “Oracle” agent and a “Worker” agent share the same underlying model, the same tool execution pipeline, and the same context management system. They differ only in their system prompts — which tools they’re told they can use, what constraints they operate under, and what behavioral patterns they follow.

MCP Servers: The Universal Tool Bridge

The .claude/mcp.json configuration file declares Model Context Protocol servers that provide tools, resources, and prompts to the agent. MCP is the one extension mechanism shared across virtually every coding agent in the ecosystem — Claude Code, OpenCode, Cursor, Windsurf, and others all speak MCP.

MCP servers run as separate processes, communicating via stdio or HTTP. They can be written in any language and provide:

Tools: Functions the agent can invoke (search, fetch, analyze, deploy)
Resources: Data the agent can read (documentation, configs, databases)
Prompts: Pre-built prompt templates the agent can use

This is the most powerful extension mechanism available today, but it operates at the tool level — you can add new tools, but you cannot modify how existing tools behave or intercept tool execution.

CLAUDE.md and the Memory System

CLAUDE.md files (at project root, user home, or nested in directories) provide persistent context that loads automatically at session start. This is Claude Code’s answer to the “agent memory” problem — not a database or vector store, but plain markdown files that the agent reads and the LLM incorporates into its context window.

The memory system (/memory command) extends this with structured note-taking that persists across sessions. The agent can write to memory, read from memory, and search memory — creating a crude but effective form of long-term knowledge accumulation.

What’s Exposed vs. What’s Locked Down

Exposed (Extensible):

Agent definition and specialization (via prompts)
Slash commands (via plugins)
Tool additions (via MCP)
Memory writes and reads
Post-hoc observation of model outputs and file changes
Context injection at session start and after compaction

Locked Down (Not Extensible):

The core agent loop (plan → act → observe cycle)
Tool execution pipeline (ordering, retry, error handling)
Model selection (cannot assign different models to different agents)
Context assembly (how system prompt, tools, history are composed)
Message routing before it reaches the LLM
Token budget management and compaction strategy
Permission and safety checks

This asymmetry is the defining characteristic of Claude Code’s extension model: rich in what the agent can do (tools, agents, skills), but restrictive about how the agent does it (loop control, pipeline interception, context mutation). The next section examines precisely where this creates gaps relative to what OpenCode’s plugin system enables.

Model: claude-opus-4-6 (anthropic/claude-opus-4-6) Generated: 2026-04-01 Token Consumption: ~4,800 tokens (output) Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 26 — Designing “Oh-My-Claude-Code”

26.2 Gap Analysis: OpenCode’s 8 Hooks vs. Claude Code’s 5

OpenCode’s Full Lifecycle Control

OpenCode’s hook system was designed with a single principle: every meaningful point in the agent lifecycle should be interceptable. The result is 8 hook points that collectively give plugin authors control over the entire flow from user input to tool execution to model response:

config — Fires during initialization. Plugins can inject providers, register custom models, modify configuration values. This is the registry injection point where the agent’s capabilities are defined before any conversation begins.
tool — Tool registration phase. Plugins can add new tools, modify existing tool schemas, or conditionally enable/disable tools based on project context. Unlike MCP (which adds tools from external processes), this registers tools directly into the runtime.
chat.message — Fires when a user message arrives, before it reaches the LLM. Plugins can intercept, transform, reroute, or completely replace the message. This is the most powerful hook in the system — it enables middleware-style processing of every user interaction.
chat.params — Fires after message processing, before the API call. Plugins can adjust temperature, max tokens, system prompt content, tool choice, and other LLM parameters on a per-request basis. This enables dynamic parameter tuning based on message content or conversation state.
event — Lifecycle event bus. Fires on session start, session end, error, compaction, and other structural events. This is the observability layer — equivalent to a combination of Claude Code’s session_start, pre_compact, and post_compact hooks.
tool.execute.before — Fires before a tool call is dispatched. Plugins can inspect the tool name and arguments, modify arguments, add logging, enforce rate limits, or block the call entirely. This is the pre-execution interception point.
tool.execute.after — Fires after a tool call returns. Plugins can inspect and transform the result before it’s sent back to the LLM. This enables result caching, output sanitization, and cross-tool coordination.
experimental.chat.messages.transform — The nuclear option. Fires during context assembly and allows plugins to mutate the entire message array before it’s sent to the model. Plugins can inject synthetic messages, remove irrelevant context, reorder conversation history, or add dynamic system prompt sections.

Claude Code’s Safety-First Design

Claude Code’s 5 hooks tell a different story. They are positioned at observation points rather than interception points:

session_start → observe and inject (no blocking)
pre_compact / post_compact → observe compaction (no mutation of compaction strategy)
post_sampling → observe model output (no pre-sampling equivalent)
file_changed → observe file mutations (reactive, not preventive)

The design philosophy is clear: let extensions see what’s happening, but don’t let them alter the core decision-making pipeline. This protects against malicious or buggy plugins that could silently corrupt the agent’s behavior.

The Critical Gaps

Gap 1: No `chat.message` Equivalent

Claude Code provides no way to intercept a user message before it reaches the LLM. You cannot build middleware that transforms user input, adds context based on message content, or routes different message types to different processing pipelines. In Oh-My-OpenCode, this hook powers the entire delegation system — the orchestrator agent uses chat.message to decide whether to handle a request directly or spawn a sub-agent.

Gap 2: No `tool.execute.before` / `tool.execute.after`

The tool execution pipeline is completely opaque. You cannot wrap tool calls with logging, caching, rate limiting, or argument transformation. This means patterns like “retry failed file reads with different encoding” or “cache grep results for 30 seconds” are impossible at the plugin level.

Gap 3: No Message Transform

Without experimental.chat.messages.transform, there is no way to dynamically inject context into the conversation history. CLAUDE.md provides static context injection, and session_start provides one-time injection, but there is no mechanism for ongoing context mutation based on conversation state. This is the hook that enables Oh-My-OpenCode’s “wisdom accumulation” pattern — continuously enriching the context with lessons learned.

Gap 4: No Runtime Tool Registration

MCP provides tool addition, but not tool modification or conditional registration. You cannot dynamically enable or disable tools based on project type, conversation phase, or agent role — at least not at the programmatic level. (Agent definitions can textually restrict tools, but this is prompt-level, not runtime-level.)

Comparison Matrix

Capability	OpenCode Hook	Claude Code Equivalent	Gap Severity
Config/registry injection	`config`	`.claude/settings.json`	Low — static config works
Tool registration	`tool`	MCP servers	Medium — MCP is external only
Message interception	`chat.message`	None	Critical
Parameter adjustment	`chat.params`	None	High
Lifecycle events	`event`	`session_start`, `pre/post_compact`	Low — partial coverage
Pre-tool interception	`tool.execute.before`	None	Critical
Post-tool interception	`tool.execute.after`	None	High
Context mutation	`chat.messages.transform`	None	Critical
Post-model observation	(via `event`)	`post_sampling`	None — Claude Code wins here
File change reaction	(via `event`)	`file_changed`	None — equivalent

The Implication

Three of the four “Critical” gaps relate to the same architectural decision: Claude Code does not allow interception of the core loop. The message goes in, the model processes it, tools execute, and the response comes out — with observation points along the way but no mutation points in the middle.

This is not a bug. It is a deliberate tradeoff between extensibility and safety. OpenCode’s approach trusts plugin authors to not break the agent; Claude Code’s approach guarantees they cannot. The question for Oh-My-Claude-Code is: how much of OpenCode’s power can we recover using only the primitives Claude Code provides?

The answer, as we’ll see in the next section, is “more than you’d expect” — but it requires a fundamentally different architectural approach. Instead of hooks that intercept the pipeline, we encode orchestration logic into prompts that guide the pipeline.

Model: claude-opus-4-6 (anthropic/claude-opus-4-6) Generated: 2026-04-01 Token Consumption: ~4,900 tokens (output) Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 26 — Designing “Oh-My-Claude-Code”

26.3 Architecture Blueprint: Oh-My-Claude-Code

The Core Insight: Prompts as Programs

OpenCode achieves orchestration through programmatic hooks — TypeScript functions that intercept and mutate the agent pipeline. Claude Code does not expose equivalent hooks. But Claude Code does expose something equally powerful in a different dimension: the ability to define agents whose system prompts encode arbitrarily complex behavioral programs.

This is the foundational principle of Oh-My-Claude-Code (OMO-CC): replace imperative pipeline hooks with declarative prompt-level orchestration. Instead of a chat.message hook that routes messages to sub-agents, we write an orchestrator agent whose system prompt contains explicit delegation rules. Instead of tool.execute.before that wraps tool calls with logging, we write agent prompts that demand the agent log before and after every tool invocation.

This approach has a name in the Oh-My-OpenCode lineage: the Sisyphus Prompt pattern.

The Sisyphus Prompt Pattern

Sisyphus, the mythological figure condemned to roll a boulder uphill for eternity — disciplined, relentless, and methodical.

A Sisyphus Prompt encodes the entire orchestration logic of an agent into its system prompt. This includes:

Delegation rules: When to spawn sub-agents, which agent to choose, what context to pass
Tool selection policy: Which tools to prefer for which tasks, in what order
Stop conditions: When to declare a task complete, how many verification passes to run
Anti-patterns: Explicitly forbidden behaviors (e.g., “NEVER re-search what you delegated to an explore agent”)
State management: How to track progress (todo lists), when to compact, what to preserve across compaction
Communication style: Output format, verbosity level, language preferences

The system prompt for the primary OMO-CC agent (internally called “Sisyphus-Junior”) runs to several thousand tokens. It is not a personality description — it is a program specification written in natural language, compiled by the LLM into behavioral patterns.

Multi-Agent Architecture via `.claude/agents/`

┌─────────────────────────────────────────────────────┐
│                  USER INPUT                          │
└──────────────────────┬──────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────┐
│              SISYPHUS-JUNIOR (Primary)               │
│  ┌─────────────────────────────────────────────┐    │
│  │  Sisyphus Prompt:                            │    │
│  │  - Delegation rules                          │    │
│  │  - Todo discipline                           │    │
│  │  - Anti-duplication rules                    │    │
│  │  - Verification requirements                 │    │
│  │  - Termination conditions                    │    │
│  └─────────────────────────────────────────────┘    │
│                                                      │
│  Available Delegations (via AgentTool):              │
│  ┌──────────┐ ┌──────────┐ ┌───────────┐           │
│  │  ORACLE  │ │ EXPLORE  │ │ LIBRARIAN │           │
│  │ (consult)│ │ (search) │ │ (research)│           │
│  └──────────┘ └──────────┘ └───────────┘           │
│  ┌──────────┐ ┌──────────┐ ┌───────────┐           │
│  │  METIS   │ │HEPHAESTUS│ │   MOMUS   │           │
│  │ (plan)   │ │ (build)  │ │ (review)  │           │
│  └──────────┘ └──────────┘ └───────────┘           │
└─────────────────────────────────────────────────────┘
                       │
            ┌──────────┼──────────┐
            ▼          ▼          ▼
     ┌──────────┐ ┌────────┐ ┌────────────┐
     │ MCP      │ │ Built  │ │ Memory     │
     │ Servers  │ │ Tools  │ │ System     │
     │(ast-grep,│ │(read,  │ │(.claude/   │
     │ session, │ │ edit,  │ │  notes/)   │
     │ search)  │ │ bash)  │ │            │
     └──────────┘ └────────┘ └────────────┘

Each sub-agent is a markdown file in .claude/agents/ with a specialized Sisyphus Prompt:

Oracle: Read-only consultant. Cannot edit files. Analyzes code, answers questions, provides recommendations. Its prompt explicitly forbids all write operations and enforces citation of evidence.
Explore: Fast codebase search agent. Restricted to grep, glob, read, and LSP tools. Designed for rapid information gathering with minimal token cost.
Librarian: External reference search. Has access to web search, documentation fetching, and Context7. Finds how other projects solve similar problems.
Metis (named for the Greek Titaness of wisdom and planning): Read-only planning agent. Creates detailed implementation plans without executing them. Its prompt enforces structured output with numbered steps, risk analysis, and dependency ordering.
Hephaestus (named for the Greek god of craftsmanship): The builder. Has full tool access. Executes implementation plans. Its prompt enforces verification after every change (LSP diagnostics, build checks).
Momus (named for the Greek god of criticism): Code reviewer. Read-only. Its prompt encodes review criteria, demands specific evidence for every finding, and forbids vague feedback.

Background Tasks via Slash Commands

Claude Code’s slash commands provide the entry point for background operations. Custom commands defined in .claude/commands/ can:

Spawn an agent in the background (via the run_in_background parameter of AgentTool)
Continue working on unrelated tasks while the background agent operates
Collect results when notified of completion

This approximates Oh-My-OpenCode’s background task system — not through programmatic task spawning, but through prompt-guided agent delegation.

Skill System: CLAUDE.md Sections + MCP Servers

Skills in OMO-CC combine two mechanisms:

CLAUDE.md sections: Structured instruction blocks that describe when and how to perform specific tasks. These load automatically and are pattern-matched by the LLM against incoming requests.
MCP servers: External tools that provide the capabilities referenced by skills. A skill might say “use ast-grep for structural code search” while an MCP server provides the actual ast_grep_search tool.

The skill trigger mechanism is entirely prompt-based — the LLM reads the skill descriptions and decides when to activate them. There is no programmatic pattern matching or hook-based activation.

Wisdom Accumulation

Long-term knowledge accumulates through three layers:

Session memory: The /memory command writes structured notes that persist across sessions
Project notes: .claude/notes/ directory with markdown files organized by topic — architectural decisions, common pitfalls, team conventions
AGENTS.md: Project-level instructions that encode accumulated wisdom about how to work with this specific codebase

Each layer has different scope and persistence. Session memory is personal and ephemeral. Project notes are shared and semi-permanent. AGENTS.md is canonical and versioned.

The Fundamental Tradeoff

This architecture trades precision for resilience. Programmatic hooks execute deterministically — a tool.execute.before hook will always fire before every tool call, without exception. Prompt-based orchestration is probabilistic — the agent usually follows its delegation rules, but might occasionally skip steps or misroute tasks.

The mitigation is redundancy. The Sisyphus Prompt doesn’t just state rules once — it reinforces them through multiple mechanisms: explicit rules, anti-patterns, verification requirements, and termination conditions. The agent is told what to do, what not to do, how to check that it did the right thing, and when to stop. This belt-and-suspenders approach compensates for the inherent non-determinism of prompt-based control.

Model: claude-opus-4-6 (anthropic/claude-opus-4-6) Generated: 2026-04-01 Token Consumption: ~4,400 tokens (output) Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 26 — Designing “Oh-My-Claude-Code”

26.4 Practical Implementation: Four Phases

Phase 1: Custom Agents and Prompt Engineering (Week 1)

The fastest path to value is creating specialized agents. No MCP servers, no hooks, no plugins — just markdown files in .claude/agents/ with carefully crafted system prompts.

The Five Foundation Agents

Agent 1: Oracle (oracle.md)

Role: Read-only consultant and advisor
Allowed tools: Read, Grep, Glob, LSP (all read operations)
Forbidden: Edit, Write, Bash (any mutation)
Prompt pattern: “You are a senior consultant. Analyze the codebase and provide recommendations with specific file:line evidence. Never modify files. If asked to make changes, provide the exact edits needed but do not execute them.”
Use case: Code review, architecture questions, impact analysis

Agent 2: Explore (explore.md)

Role: Fast codebase search specialist
Allowed tools: Grep, Glob, Read (with offset/limit), LSP symbols/references
Prompt pattern: “You are a search specialist. Find relevant code quickly. Use Grep for content, Glob for file patterns, LSP for definitions and references. Return results in a structured format with file paths and line numbers. Minimize token usage — show relevant snippets, not entire files.”
Use case: “Where is X defined?”, “What calls Y?”, “Find all implementations of Z”

Agent 3: Librarian (librarian.md)

Role: External documentation and reference search
Allowed tools: Web search, Context7, WebFetch, Grep (for searching fetched docs)
Prompt pattern: “You are a research librarian. Find authoritative external references — official documentation, well-regarded blog posts, production codebases on GitHub. Always cite sources with URLs. Prioritize official docs over blog posts, blog posts over Stack Overflow.”
Use case: “How does library X handle Y?”, “What’s the recommended pattern for Z?”

Agent 4: Worker / Hephaestus (hephaestus.md)

Role: Task executor with full tool access
Allowed tools: All tools
Prompt pattern: “You are a craftsman. Execute tasks precisely. After every file modification: (1) run LSP diagnostics, (2) verify the build, (3) check that tests pass. Never skip verification. If verification fails, fix the issue before proceeding.”
Use case: Implementing features, fixing bugs, refactoring

Agent 5: Planner / Metis (metis.md)

Role: Read-only planning and analysis
Allowed tools: Read, Grep, Glob, LSP (read-only)
Prompt pattern: “You are a strategic planner. Create detailed implementation plans with numbered steps, estimated effort, risk analysis, and dependency ordering. Never execute the plan — only design it. Output format: markdown with headers for each phase, bullet points for steps, and a risk table.”
Use case: “Plan the refactoring of module X”, “Design the migration from Y to Z”

The Orchestrator Prompt (AGENTS.md)

The primary agent receives an AGENTS.md file that encodes delegation rules:

## Delegation Policy
- Read-only analysis → Oracle
- Codebase search (3+ files) → Explore (background)
- External documentation → Librarian (background)
- Implementation tasks → Hephaestus (after Metis plans)
- Complex planning → Metis first, then Hephaestus

## Anti-Patterns (FORBIDDEN)
- Never search for what you delegated to Explore
- Never implement without verifying
- Never have more than 1 task in_progress

Phase 2: Hooks for Enforcement (Weeks 2-3)

With agents established, we add Claude Code’s available hooks to enforce discipline:

`post_sampling` Hook: Todo Enforcement

// .claude/hooks/post_sampling.js
// After every LLM response, check:
// 1. If response contains task steps, was todowrite called?
// 2. If a todo was in_progress, was it completed before starting another?
// Log violations to .claude/notes/discipline_log.md

This hook cannot prevent the response from being sent (it’s post-hoc), but it can log violations and inject a follow-up message reminding the agent to update its todo list. Over time, the log reveals patterns that inform prompt refinements.

`session_start` Hook: Context Injection

// .claude/hooks/session_start.js
// On every session start:
// 1. Load .claude/notes/active_tasks.md (pending work)
// 2. Load .claude/notes/recent_learnings.md (last 5 sessions)
// 3. Load project-specific conventions from .claude/notes/conventions.md
// Inject as initial context

This approximates the chat.messages.transform hook — we cannot inject context during the conversation, but we can front-load it at session start.

Phase 3: MCP Servers for Enhanced Capabilities (Month 1)

AST-Grep MCP Server

Wraps ast-grep CLI with structured search and replace operations. Enables the agent to perform AST-aware code transformations that are impossible with text-based tools.

Session Management MCP Server

Provides tools for reading, searching, and analyzing previous agent sessions. Enables cross-session learning: “What approach did we try last time for X? Did it work?”

Enhanced LSP MCP Server

Extends built-in LSP with project-wide type analysis, dependency graphs, and impact assessment. “If I change this interface, what breaks?”

Structured Notes MCP Server

Provides CRUD operations on .claude/notes/ with tagging, search, and expiration. Turns the notes directory into a lightweight knowledge base.

Phase 4: Plugin Orchestration Layer (Ongoing)

This phase is speculative — it depends on Claude Code exposing richer hooks in future releases. The goal is to build a plugin that:

Registers composite tools that combine multiple sub-tools into workflows
Implements conditional tool routing based on project context
Provides cross-session state management beyond the memory system
Enables agent-to-agent communication via structured message passing

Each new hook Claude Code exposes becomes a building block. The architecture is designed to be incrementally enhanced — Phase 1 agents work today with zero infrastructure, and each subsequent phase adds capabilities without breaking what came before.

Effort Summary

Phase	Timeline	Dependencies	Key Deliverable
1	1 week	None	5 agents + orchestrator prompt
2	2 weeks	Phase 1	Hook-based enforcement
3	1 month	Phase 1-2	4 MCP servers
4	Ongoing	Claude Code releases	Plugin orchestration

The critical insight is that Phase 1 alone delivers 70% of the value. Well-crafted agent prompts with clear delegation rules produce a functional multi-agent system. Hooks, MCP servers, and plugins are optimization layers — they make the system more reliable, more capable, and more efficient, but the core architecture works with prompts alone.

Model: claude-opus-4-6 (anthropic/claude-opus-4-6) Generated: 2026-04-01 Token Consumption: ~4,300 tokens (output) Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 26 — Designing “Oh-My-Claude-Code”

26.5 What’s Missing from Claude Code’s Extension API

The Six Critical Gaps

Despite the ingenuity of prompt-based orchestration, there are capabilities that simply cannot be replicated without deeper extensibility. These represent genuine architectural limitations — not oversights, but conscious tradeoffs that constrain what Oh-My-Claude-Code can achieve.

Gap 1: `chat.message` Hook — Message Interception Before the LLM

What it would enable: Intercepting every user message before it reaches the model. Transforming input, routing to different processing pipelines, adding dynamic context, or short-circuiting entirely for cached responses.

Why it matters for OMO-CC: The orchestrator’s delegation logic currently lives in the system prompt — the LLM must read the rules and decide to follow them. A chat.message hook would allow programmatic routing: “If the message matches pattern X, delegate to agent Y” — deterministically, with zero token cost for the decision.

Current workaround: Encode routing logic in the Sisyphus Prompt. Works ~95% of the time but wastes tokens on routing decisions and occasionally misroutes complex requests.

Token cost of the gap: Estimated 200-500 tokens per interaction for the LLM to evaluate delegation rules. Over a 100-message session, this adds 20,000-50,000 tokens of pure overhead.

Gap 2: `tool.execute.before` / `tool.execute.after` — Tool Pipeline Wrapping

What it would enable: Wrapping every tool call with pre-execution validation and post-execution transformation. Logging all tool calls with timing data. Caching expensive tool results (grep on large codebases). Rate-limiting file writes. Retrying failed operations with different parameters.

Why it matters for OMO-CC: Tool execution is the agent’s primary interaction with the world. Without pipeline hooks, we cannot enforce policies like “never write to node_modules/”, “cache grep results for 30 seconds”, or “log all bash commands to an audit trail” at the system level.

Current workaround: Agent prompts include rules like “always verify before writing” and “log important operations.” These are advisory — the LLM usually follows them, but there is no guarantee.

What falls through the cracks: Timing data (no way to measure tool execution duration), caching (the agent re-runs identical searches), and audit completeness (the log depends on the LLM remembering to log).

Gap 3: Plugin-Level Tool Registration Beyond MCP

What it would enable: Registering tools directly into the agent’s tool list without running a separate MCP server process. Dynamically enabling/disabling tools based on context. Composing existing tools into higher-level operations.

Why it matters for OMO-CC: MCP servers are heavyweight — each one is a separate process with its own lifecycle. For simple tool compositions (“run grep, then read the matched files, then summarize”), a direct tool registration API would be far more efficient.

Current workaround: MCP servers for everything, even trivial tool wrappers. This works but adds process management overhead and cold-start latency.

Gap 4: Agent-Level Model Selection

What it would enable: Assigning different LLM models to different agents. A fast, cheap model (e.g., Haiku) for the Explore agent that just searches code. A powerful reasoning model (e.g., Opus) for the Planner. A balanced model (e.g., Sonnet) for the Worker.

Why it matters for OMO-CC: Cost optimization. An Explore agent does not need Opus-level reasoning — it needs fast pattern matching and structured output. Running all agents on the same model means either overpaying for simple tasks or underperforming on complex ones.

Current workaround: None. All agents use whatever model the user has configured for Claude Code. This is the one gap with no prompt-level mitigation.

Estimated cost impact: Using Haiku for Explore and Librarian agents would reduce token costs by approximately 60-80% for those delegations, translating to roughly 30-40% overall cost reduction for a typical multi-agent session.

Gap 5: Background Agent Spawning with Session Continuation

What it would enable: Spawning an agent in the background, letting it run asynchronously, and then continuing its session later with full context preserved. True long-running tasks that survive session boundaries.

Why it matters for OMO-CC: Background agents currently lose their context when the spawning session ends. You can start a background Explore agent, but if the main session compacts or ends before it returns, the results may be lost or require re-injection.

Current workaround: The session management MCP server (Phase 3) provides partial mitigation by persisting session history to files. But this is a bolted-on solution, not native support.

Gap 6: Programmatic Context Injection into System Prompt Assembly

What it would enable: Dynamically adding or removing sections from the system prompt based on conversation state. “After the user mentions testing, inject the testing guidelines.” “After 50 messages, remove the onboarding section to save tokens.”

Why it matters for OMO-CC: The system prompt is static — what’s loaded at session start stays for the entire session (unless compacted away). Dynamic context would allow the agent to evolve its behavioral rules as the conversation progresses.

Current workaround: session_start hook for one-time injection. CLAUDE.md for static context. No mechanism for mid-session mutation.

The Case for a Richer Plugin SDK

These six gaps share a common theme: Claude Code’s extension model is designed for safety, not for power. Every locked-down surface prevents a class of misuse — but also prevents a class of innovation.

The argument for a richer Plugin SDK is not that Claude Code should match OpenCode hook-for-hook. Rather, it is that Claude Code should provide safe versions of the missing capabilities:

Gap	Safe Alternative
Message interception	Structured routing rules (declarative, not imperative)
Tool pipeline wrapping	Tool middleware with sandboxed execution
Direct tool registration	Plugin-scoped tool definitions with permission model
Model selection	Cost-tier hints (“use cheapest model that supports X”)
Session continuation	Persistent agent state API with encryption
Dynamic context	Context lifecycle rules (add-after, remove-after, TTL)

Each of these preserves the security invariant — extensions cannot silently corrupt the agent — while providing the programmatic control that prompt-based orchestration cannot fully replicate. The question is not whether Claude Code will evolve in this direction, but how quickly the open-source ecosystem can demonstrate the value that drives prioritization.

Model: claude-opus-4-6 (anthropic/claude-opus-4-6) Generated: 2026-04-01 Token Consumption: ~4,500 tokens (output) Book: Claude Code VS OpenCode: Architecture, Design and The Road Ahead Chapter: 26 — Designing “Oh-My-Claude-Code”

26.6 Convergence Forecast: Where the Ecosystem is Heading

MCP as the Common Denominator

If there is one universal truth in the coding agent ecosystem of 2025-2026, it is this: everyone speaks MCP. Model Context Protocol has achieved what few standards manage — adoption across competing products before the standard itself is fully stabilized.

Claude Code, OpenCode, Cursor, Windsurf, Cline, Continue, Zed — every serious coding agent either supports MCP natively or has it on their near-term roadmap. This creates a shared foundation that makes tool interoperability a solved problem at the transport layer.

But MCP is a tool protocol, not an agent protocol. It defines how an agent calls a function and gets a result. It does not define how agents discover each other, how they negotiate capabilities, how they delegate tasks, or how they share context. MCP is necessary infrastructure, but it is not sufficient architecture.

The convergence forecast begins with MCP as the floor, then maps the three dimensions where the ecosystem is rapidly converging.

Dimension 1: Hook Systems are Expanding

Claude Code launched with zero hooks. Then came session_start. Then the compaction pair. Then post_sampling and file_changed. The trajectory is unmistakable: each release adds new lifecycle interception points.

The pattern follows a predictable safety-first progression:

Observation hooks first (post-hoc, read-only) — post_sampling, file_changed
Injection hooks second (additive, non-blocking) — session_start, post_compact
Interception hooks last (blocking, transformative) — not yet released, but likely

OpenCode began at stage 3 from day one — full interception hooks were a design requirement, not an evolution. This means OpenCode (and Oh-My-OpenCode) serve as the proving ground for what interception hooks should look like. When Claude Code eventually adds pre_sampling or tool.execute.before, the design will likely be informed by what the open-source ecosystem has demonstrated is valuable and what is dangerous.

Prediction: Claude Code will reach approximate hook parity with OpenCode within 12-18 months, but with stronger sandboxing and permission models around each hook.

Dimension 2: Agent Definition Formats are Converging

Consider the parallel evolution:

Timeline	Claude Code	OpenCode/OMO
Early 2025	`CLAUDE.md` (project instructions)	`AGENTS.md` (project instructions)
Mid 2025	`.claude/agents/` (agent definitions)	Plugin agents (programmatic)
Late 2025	Skills (structured instruction sets)	Skills (hook-based activation)
2026	Plugin manifest (declarative capabilities)	Provider plugins (declarative capabilities)

The naming differs. The file formats differ. But the concepts are converging toward a common set of primitives:

Agent identity: Name, role, capabilities, constraints
Tool access: Which tools this agent can use
Behavioral rules: What the agent should and shouldn’t do
Activation triggers: When to invoke this agent
Context requirements: What information this agent needs

These primitives exist in both ecosystems, expressed in different syntaxes. The gap between a .claude/agents/oracle.md file and an OpenCode agent plugin is syntactic, not semantic.

Prediction: A standardized agent definition format will emerge within 18-24 months — likely as an extension to MCP, or as a companion spec. The format will be declarative (like Claude Code’s markdown agents) with optional programmatic extensions (like OpenCode’s hook-based agents).

Dimension 3: The Cross-Pollination Cycle

Innovation in the coding agent space follows a characteristic cycle:

Open-source prototype → Commercial adoption → Standardization → New open-source prototype

We have already witnessed multiple complete cycles:

Cycle 1: Project Instructions

CLAUDE.md (Claude Code, early 2025) — project-level persistent context
AGENTS.md (OpenCode/Codex, mid 2025) — same concept, different format
.cursorrules, .windsurfrules (Cursor, Windsurf) — same concept, product-specific
Cross-pollination: all products now support some form of project instructions

Cycle 2: MCP

Anthropic publishes MCP spec (late 2024)
Open-source implementations proliferate (early 2025)
Competitor adoption (mid 2025)
Ecosystem standardization (late 2025 onward)

Cycle 3: Multi-Agent (In Progress)

Oh-My-OpenCode demonstrates prompt-based multi-agent orchestration (late 2025)
Claude Code adds AgentTool and .claude/agents/ (early 2026)
Cursor/Windsurf experiment with background agents (2026)
Standardized agent delegation protocol (predicted: 2027)

Each cycle compresses the timeline. Cycle 1 took approximately 8 months from innovation to cross-pollination. Cycle 2 took about 6 months. Cycle 3 appears to be converging in 4-5 months. The ecosystem is learning to absorb innovations faster.

The Missing Piece: Agent-to-Agent Protocol (A2A)

MCP defines how agents use tools. Agent definition formats define what agents are. The missing piece is how agents communicate with each other.

Google’s Agent-to-Agent (A2A) protocol, announced in 2025, attempts to fill this gap. A2A defines:

Agent discovery: How agents find and negotiate with each other
Task delegation: How one agent assigns work to another
Context sharing: How agents share relevant information without sharing everything
Result reporting: How agents communicate outcomes back to delegators

Today, OMO-CC implements agent delegation via the AgentTool — a Claude Code primitive that spawns sub-agents within the same session. This is single-process, single-model delegation. A2A envisions cross-process, cross-model, even cross-organization delegation.

Prediction: The eventual standard will combine:

MCP for tool invocation (already universal)
Agent definition format for agent identity and capabilities (converging)
A2A-like protocol for agent-to-agent communication (emerging)
Hook lifecycle spec for agent pipeline extensibility (nascent)

The Two-Year Horizon

By early 2028, we predict:

Unified extensibility standard: A single spec (or tightly coupled set of specs) that covers tools, agents, hooks, and communication. Claude Code and OpenCode will both support it, along with the commercial competitors.
Model-agnostic agent definitions: Agents defined once, executable by any model. The agent definition specifies capabilities and constraints; the runtime maps these to the available model’s strengths.
Layered security model: A permission system for hooks and plugins that allows powerful extensibility for trusted plugins while maintaining safety for untrusted ones. Think: npm’s --ignore-scripts but for agent hooks.
Cross-agent memory: A shared knowledge layer that agents can read from and write to, with access control and provenance tracking. Not just per-session or per-project, but per-team and per-organization.

Oh-My-Claude-Code is a waypoint on this journey — a practical demonstration that powerful multi-agent orchestration is achievable today, within the constraints of today’s extension surfaces, and a signpost pointing toward the richer infrastructure that tomorrow’s agents will require.

The boulder rolls uphill. Sisyphus pushes. And every push reveals a little more of what the summit looks like.

Model: openai/gpt-5.4 Token Usage (estimated): ~7,200 tokens Generated: 2026-04-01 Book: Claude Code VS OpenCode: Architecture, Design & The Road Ahead

Appendix A: Tool Inventory Across OpenCode, OMO, and Claude Code

This appendix consolidates the practical tool surface of the three systems. OpenCode provides the open-source baseline tool set; Oh-My-OpenCode (OMO) inherits that baseline and adds orchestration, session, background-task, skill, and richer search capabilities; Claude Code exposes the broadest commercial tool surface, especially around tasks, teams, workflows, browser interaction, MCP management, and UI-integrated operations. “Yes” means first-class availability; “Conditional” means feature-flag, mode, or environment gated; “Inherited” means OMO gets the capability through OpenCode and then extends it.

Tool Name	OpenCode	OMO	Claude Code	Description
bash	Yes	Inherited	Yes	Run shell commands in the workspace.
powershell	No	No	Conditional	Windows-native shell execution.
read / file read	Yes	Inherited	Yes	Read files or directory listings safely.
write / file write	Yes	Inherited	Yes	Create or overwrite files directly.
edit / file edit	Yes	Inherited or overridden	Yes	Patch existing files with structured edits.
multiedit	Yes	Inherited	No separate public tool	Apply several edits to one file in one call.
apply_patch	Yes	Inherited	No separate public tool	Unified patch-style file modification.
notebook edit	No	No	Yes	Jupyter/notebook-aware cell editing.
ls / list directory	Yes	Inherited	Folded into read/glob	Lightweight directory listing.
external-directory	Yes	Inherited	No direct equivalent	Extend access outside the default project root.
glob	Yes	Enhanced	Yes	Filename/path pattern search.
grep	Yes	Enhanced	Yes	Regex or textual content search.
codesearch	Yes	Inherited	ToolSearch partly analogous	Search code with stronger code-oriented semantics.
ast-grep search	No	Yes	No first-class tool	AST-aware structural code search.
ast-grep replace	No	Yes	No first-class tool	AST-aware structural rewrite.
lsp	Conditional	Split into multiple tools	Conditional	Language-server queries and edits.
lsp_goto_definition	No	Yes	Folded into LSPTool	Jump to symbol definition.
lsp_find_references	No	Yes	Folded into LSPTool	Find all references of a symbol.
lsp_symbols	No	Yes	Folded into LSPTool	Outline/document/workspace symbol lookup.
lsp_diagnostics	No	Yes	Folded into LSPTool	Collect errors, warnings, hints.
lsp_prepare_rename	No	Yes	Folded into LSPTool	Validate whether rename is legal.
lsp_rename	No	Yes	Folded into LSPTool	Cross-workspace symbol rename.
webfetch	Yes	Inherited	Yes	Fetch and summarize a URL.
websearch	Yes	Inherited + provider-aware	Yes	Search the web and return ranked results.
web browser	No	No	Conditional	Interactive browser navigation and extraction.
look_at	No	Yes	Partial via file/browser tooling	Quick image/PDF/media inspection.
question / ask user	Conditional	Inherited	Yes	Ask the human for clarification or approval.
skill	Yes	Upgraded	Yes	Load a skill/slash-command style capability pack.
skill_mcp	No	Yes	No direct equivalent	Invoke MCP resources bundled inside a skill.
slashcommand	No	Yes	Native command system, different surface	Execute discovered slash commands from tool space.
task	Yes	Replaced by delegate-task semantics	No exact equivalent	Spawn a subagent/task in OpenCode-style flow.
call_omo_agent	No	Yes	AgentTool analogous	Spawn specialized OMO agents.
background_output	No	Yes	TaskOutputTool analogous	Fetch output from a background agent/task.
background_cancel	No	Yes	TaskStopTool analogous	Cancel one or more background jobs.
task_create	No	Conditional in OMO	Yes	Create a structured background task.
task_get	No	Conditional in OMO	Yes	Inspect one task’s metadata/status.
task_list	No	Conditional in OMO	Yes	List active or historical tasks.
task_update	No	Conditional in OMO	Yes	Update task state or metadata.
task_output	No	Via background_output	Yes	Retrieve task transcript/output.
task_stop	No	Via background_cancel	Yes	Stop a running task.
todo write	Yes	Inherited	Yes	Maintain structured todo lists for long tasks.
todo read	Internal/limited	Inherited	Task list UI instead	Read stored todo state.
plan enter	Conditional	Inherited	Yes	Switch into planning mode before edits.
plan exit	Conditional	Inherited	Yes	Exit planning mode and resume execution.
verify plan execution	No	No	Conditional	Check whether execution matched an approved plan.
agent tool	No separate public tool	call_omo_agent	Yes	Launch another agent or teammate.
team create	No	No	Conditional	Create a team/swarm of collaborating agents.
team delete	No	No	Conditional	Remove a running team/swarm.
send message	No	No	Conditional	Send messages between peers/teammates.
list peers	No	No	Conditional	Discover peer agents in swarm mode.
workflow	No	No	Conditional	Trigger reusable workflow scripts.
config tool	No	No	Conditional	Inspect or mutate settings from inside the agent.
MCP tool invocation	Via MCP registry	Via MCP + skill MCP	Yes	Call MCP server tools using model-facing schemas.
list MCP resources	No	No	Yes	Enumerate resources exposed by MCP servers.
read MCP resource	No	No	Yes	Read a specific MCP resource URI.
MCP auth	No	No	Yes	Handle MCP auth and connector approval flows.
REPL	No	No	Conditional	Interactive in-process programming shell.
interactive_bash / terminal capture	No	Yes	Conditional	Persistent terminal/tmux or terminal snapshot tooling.
snip / brief	No	No	Yes	Condense prior context or trim transcript slices.
sleep / cron / monitor	No	No	Conditional	Time-based automation and monitoring hooks.
push notification	No	No	Conditional	Trigger push/user-facing notifications.
subscribe PR	No	No	Conditional	Watch pull-request updates.
tungsten / review artifact	No	No	Conditional	Specialized internal or artifact-review flows.

Category Notes

1. File operations

OpenCode centers on a minimal but strong file core: read, write, edit, multiedit, and apply_patch. OMO inherits that layer and optionally overrides edit with a hashline-based editor in experimental mode. Claude Code exposes a comparable but more productized set: FileReadTool, FileWriteTool, FileEditTool, and NotebookEditTool, plus UI affordances such as diff views and rejection messages.

2. Search and retrieval

All three systems treat search as essential. OpenCode provides glob, grep, codesearch, webfetch, and websearch. OMO goes further by splitting LSP into granular operations and adding AST-based structural search/replace plus session search. Claude Code offers GlobTool, GrepTool, WebFetchTool, WebSearchTool, MCP resource browsing, and in some builds a browser tool. The design trend is clear: raw grep is no longer enough; agents need syntactic, semantic, and remote search layers.

3. Execution and composition

OpenCode’s task and skill show the open-source baseline for compositional execution. OMO turns composition into orchestration: call_omo_agent, delegated background tasks, slash commands, skill-embedded MCPs, and persistent tmux-backed sessions. Claude Code generalizes the same idea into task lifecycle tools, team tools, workflow tools, and agent-to-agent messaging. In other words, OMO adds multi-agent depth; Claude Code adds product depth.

4. Interactive and long-running work

OpenCode is mostly request/response oriented. OMO explicitly embraces long-running, visual, and persistent flows through interactive_bash, background_output, background_cancel, and session-manager tools. Claude Code reaches a similar destination through tasks, REPL, terminal capture, cron-like automation, and UI hooks. This matters because real software engineering is often not a single command but a monitored process.

5. LSP and structure-aware development

OpenCode has an experimental monolithic LSP tool. OMO decomposes LSP into atomic operations that are easier for an agent to combine safely. Claude Code also supports LSP, but through a consolidated tool class backed by a large service layer. This reflects an important design choice: broad tools are simpler for product UX; narrower tools are often easier for agent reliability.

6. Special tools and platform identity

OMO’s distinctive tools are the ones that expose orchestration as a first-class primitive: agent spawning, background-output retrieval, skill MCP routing, tmux control, session history, and AST search. Claude Code’s distinctive tools are those that expose a commercial runtime: teams, workflows, browser automation, MCP auth, notifications, and extensive task APIs. OpenCode remains the cleanest baseline and therefore the best reference point for understanding what is fundamental versus what is platform scaffolding.

Model: openai/gpt-5.4 Token Usage (estimated): ~4,100 tokens Generated: 2026-04-01 Book: Claude Code VS OpenCode: Architecture, Design & The Road Ahead

Appendix B: Configuration Reference

This appendix summarizes the most decision-relevant configuration surfaces. It is not a line-by-line dump of every schema field; instead, it highlights the settings that materially change model behavior, tool availability, orchestration, security, or developer experience.

OpenCode

OpenCode merges configuration from remote well-known config, global config, custom config, project config, .opencode/ directories, inline content, and enterprise-managed overrides. That precedence model is one of its most important architectural traits.

Setting	Default	Meaning
`model`	none	Primary model in `provider/model` form.
`small_model`	none	Cheaper helper model for lightweight work.
`default_agent`	none	Default agent/mode selection.
`username`	current OS username	Injected user identity for prompts/UI.
`plugin`	`[]`	Plugin specifiers; later entries override earlier duplicates.
`instructions`	none	Extra instruction files or glob patterns.
`permission`	none	Tool permission rules such as allow/ask/deny.
`provider`	none	Provider auth, endpoints, and model overrides.
`mcp`	none	MCP server configuration and enable/disable state.
`formatter`	none	Formatter command, env, and file-extension mapping.
`lsp`	none	LSP server definitions; can be `false` or per-server objects.
`compaction.auto`	`true`	Auto-compact when context is near full.
`compaction.prune`	`true`	Prune older tool output during compaction.
`compaction.reserved`	none	Token reserve buffer before compaction.
`share`	none / migrated from `autoshare`	Session sharing policy.
`experimental.batch_tool`	`false` unless enabled	Expose the batch tool.
`experimental.primary_tools`	none	Restrict some tools to primary agents only.
`experimental.continue_loop_on_deny`	`false`	Continue loop after a denied tool call.
`experimental.mcp_timeout`	none	MCP request timeout in milliseconds.

Oh-My-OpenCode

OMO adds a second configuration layer on top of OpenCode. The key idea is selective augmentation: disable built-ins, override agents, wire skills, and tune background orchestration without forking the host runtime.

Setting	Default	Meaning
`new_task_system_enabled`	`false`	Opt into OMO’s newer task system.
`default_run_agent`	none	Default agent for `oh-my-opencode run`.
`disabled_mcps`	none	Remove selected MCP servers from exposure.
`disabled_agents`	none	Disable built-in OMO agents.
`disabled_skills`	none	Disable bundled skills.
`disabled_hooks`	none	Disable specific hook modules.
`disabled_commands`	none	Remove built-in commands.
`disabled_tools`	none	Hide tool names from the model.
`agents`	none	Per-agent overrides, including model selection and permissions.
`categories`	none	Category definitions for delegation and specialization.
`claude_code`	none	Compatibility-layer settings for Claude Code artifacts.
`experimental.task_system`	`false`	Expose task_create/task_get/task_list/task_update.
`experimental.hashline_edit`	`false`	Replace normal edit behavior with hashline edit.
`auto_update`	optional	Plugin self-update behavior.
`skills`	none	Skill registry, sources, and enablement.
`ralph_loop.enabled`	`false`	Enable iterative Ralph Loop execution.
`ralph_loop.default_max_iterations`	`100`	Default cap for loop iterations.
`background_task.staleTimeoutMs`	`180000`	Interrupt tasks with no activity for 3 minutes.
`background_task.messageStalenessTimeoutMs`	`600000`	Timeout if a task never reports progress.
`notification.force_enable`	`false`	Force notification hook even if external notifier exists.
`git_master.commit_footer`	`true`	Add commit footer metadata.
`git_master.include_co_authored_by`	`true`	Include co-author footer lines.
`browser_automation_engine.provider`	`playwright`	Choose browser automation backend.
`websearch.provider`	exa-like default behavior	Choose Exa or Tavily web search provider.
`tmux.enabled`	`false`	Enable tmux visual multi-agent mode.
`tmux.layout`	`main-vertical`	Main pane left, agents stacked right.
`sisyphus.claude_code_compat`	`false`	Enable Claude Code compatible behavior.
`babysitting.timeout_ms`	`120000`	Default babysitting timeout.

Claude Code

Claude Code distinguishes between global user settings, project settings, managed settings, and project-local .mcp.json. Its configuration system is broader than OpenCode’s because it covers not only model/runtime behavior but also onboarding, notifications, IDE integration, browser features, tasks, and marketplace/plugin flows.

Setting	Default	Meaning
`theme`	`dark`	UI theme.
`preferredNotifChannel`	`auto`	Choose notification channel automatically.
`verbose`	`false`	Enable more verbose output/logging behavior.
`editorMode`	`normal`	Input/editor keybinding mode.
`autoCompactEnabled`	`true`	Auto-compaction of long contexts.
`showTurnDuration`	`true`	Show elapsed time after a turn.
`diffTool`	`auto`	Diff rendering strategy.
`todoFeatureEnabled`	`true`	Enable todo/task UX.
`showExpandedTodos`	`false`	Expand todo lists by default.
`messageIdleNotifThresholdMs`	`60000`	Idle threshold before notifying.
`autoConnectIde`	`false`	Auto-connect to IDE when possible.
`autoInstallIdeExtension`	`true`	Auto-install IDE extensions.
`fileCheckpointingEnabled`	`true`	Enable file checkpoints/snapshots.
`terminalProgressBarEnabled`	`true`	Show terminal progress UI.
`respectGitignore`	`true`	Hide/avoid ignored files in relevant operations.
`copyFullResponse`	`false`	Copy full responses instead of shortened variants.
`projects[*].allowedTools`	`[]`	Per-project allowlist of tools.
`projects[*].mcpServers`	`{}`	Per-project MCP server map.
`projects[*].hasTrustDialogAccepted`	`false`	Project trust state.
`projects[*].activeWorktreeSession`	none	Active worktree-mode session metadata.

Practical reading guide

OpenCode’s config is the cleanest if your goal is architectural clarity. OMO’s config is the most expressive if your goal is orchestration and policy tuning. Claude Code’s config is the most product-complete because it must coordinate CLI UX, UI state, tasks, MCP auth, notifications, IDEs, and managed enterprise policy.

For system designers, the lesson is simple: the more “agent platform” features you add, the more your configuration surface stops being mere preferences and becomes an operational control plane.

Model: openai/gpt-5.4 Token Usage (estimated): ~3,600 tokens Generated: 2026-04-01 Book: Claude Code VS OpenCode: Architecture, Design & The Road Ahead

Appendix C: Source Code Path Mapping

The table below maps the same architectural concept to its primary implementation locations in each codebase. Paths are intentionally selective: they point readers to the best starting file, not every related dependency.

Concept	OpenCode	OMO	Claude Code
Entry point	`packages/opencode/src/index.ts`	`src/index.ts`	`src/main.tsx`, `src/entrypoints/cli.tsx`
Core loop	`packages/opencode/src/session/processor.ts`, `session/llm.ts`	Inherits OpenCode host loop; plugin interception in `src/plugin/chat-message.ts`, `src/plugin/messages-transform.ts`	`src/QueryEngine.ts`, `src/query.ts`
Tool registry	`packages/opencode/src/tool/registry.ts`	`src/plugin/tool-registry.ts`, `src/create-tools.ts`	`src/tools.ts`, `src/Tool.ts`
Tool implementations	`packages/opencode/src/tool/*`	`src/tools/*`	`src/tools/*`
Agents	`packages/opencode/src/agent/*`	`src/agents/*`, especially `builtin-agents.ts`, `dynamic-agent-prompt-builder.ts`	`src/tools/AgentTool/`, `src/coordinator/`, `src/buddy/*`
Sessions	`packages/opencode/src/session/*`	`src/features/claude-code-session-state/`, `src/features/background-agent/`, `src/features/boulder-state/*`	`src/utils/sessionStorage.ts`, `src/utils/sessionState.ts`, `src/tasks/*`
Providers / model layer	`packages/opencode/src/provider/*`	Mostly inherited from OpenCode; agent-level selection in `src/agents/*` and config schemas	`src/utils/model/*`, `src/services/tokenEstimation.ts`
MCP	`packages/opencode/src/mcp/*`	`src/mcp/`, `src/features/skill-mcp-manager/`, Claude-compatible loaders in `src/features/claude-code-mcp-loader/*`	`src/services/mcp/*`
Permissions	`packages/opencode/src/permission/*`	Inherits host permissions plus OMO policy/config gates in `src/config/schema/internal/permission.ts`	`src/utils/permissions/*`, classifier logic in `yoloClassifier.ts`, `bashClassifier.ts`
Plugins / extension surface	`packages/opencode/src/plugin/*`	Plugin package root `src/index.ts`; compatibility loaders in `src/features/claude-code-plugin-loader/*`	`src/services/plugins/`, bundled plugins in `src/plugins/`
Hooks	OpenCode plugin hook points wired through `packages/opencode/src/plugin/index.ts`	Concrete hook pipelines in `src/plugin/tool-execute-before.ts`, `tool-execute-after.ts`, `messages-transform.ts`, `event.ts` and `src/hooks/*`	Runtime/UI hooks in `src/hooks/*`; settings hook schemas in `src/entrypoints/sdk/coreSchemas.ts`
UI / TUI	`packages/opencode/src/cli/` and TUI under `cli/cmd/tui/`	Mostly host UI; tmux visualization in `src/features/tmux-subagent/*`	`src/components/`, `src/ink/`, `src/screens/*`
Config	`packages/opencode/src/config/config.ts`, `paths.ts`, `tui-schema.ts`	`src/plugin-config.ts`, `src/config/schema/*`	`src/utils/config.ts`, `src/utils/settings/`, `src/services/settingsSync/`
Commands	`packages/opencode/src/command/*`	`src/features/builtin-commands/`, `src/tools/slashcommand/`	`src/commands/*`, `src/commands.ts`
Skills	`packages/opencode/src/skill/*`	`src/features/builtin-skills/`, `src/features/opencode-skill-loader/`, `src/tools/skill/*`	`src/skills/`, `src/tools/SkillTool/`

Reading strategy

If you are studying architecture rather than contributing code immediately, start with the following sequence:

OpenCode: session/processor.ts → tool/registry.ts → config/config.ts.
OMO: src/index.ts → src/plugin/tool-registry.ts → src/agents/builtin-agents.ts → src/config/schema/oh-my-opencode-config.ts.
Claude Code: src/tools.ts → src/QueryEngine.ts → src/utils/config.ts → src/services/mcp/*.

This order works because it follows the control plane outward: first the loop, then the capabilities, then the policy/config layer. In CS textbooks this is close to moving from “execution engine” to “I/O interfaces” to “system administration surface.” The names are newer, but the layering logic is classical.

Model: openai/gpt-5.4 Token Usage (estimated): ~3,100 tokens Generated: 2026-04-01 Book: Claude Code VS OpenCode: Architecture, Design & The Road Ahead

Appendix D: MCP Ecosystem Overview

Model Context Protocol (MCP) has become the common extensibility substrate for agent systems. It standardizes how an agent discovers tools, resources, prompts, and authentication flows exposed by external servers.

Transport types

Transport	How it works	Strengths	Typical use
`stdio`	Agent spawns a local process and talks over stdin/stdout	Simple, local, secure-by-default, easy packaging	Local developer tools, databases, filesystem helpers
`SSE`	Server-Sent Events stream from a long-lived HTTP connection	Good for streaming events over web infra	Hosted MCP services with event streams
`HTTP`	Request/response over ordinary HTTP	Familiar ops model, proxy-friendly	SaaS-hosted MCP endpoints and gateways
`WebSocket`	Full-duplex persistent socket	Bidirectional low-latency interaction	Realtime collaboration, browser/remote control

In systems terms, transport is not the protocol itself; it is the carrier. A CS textbook may discuss this under the distinction between an application protocol and a transport channel.

Popular community servers

Server family	What it exposes	Why it matters
Postgres / SQL	Query execution, schema introspection, migrations, row reads	Lets agents inspect and operate on real application data safely and explicitly.
GitHub	PRs, issues, checks, comments, repo metadata	Essential for coding agents that operate in pull-request workflows.
Slack	Channels, messages, notifications, retrieval	Connects coding work to team communication loops.
Browser / Playwright	Navigation, DOM inspection, screenshots, form interaction	Turns agents into web testers and UI operators.
Filesystem / shell wrappers	Controlled local execution and file access	Still the most common MCP entry point for bespoke automation.
Cloud APIs	Storage, deployment, observability, tickets	Moves agents from “code helper” toward “ops participant.”

Why MCP spread so quickly

MCP succeeded because it solves a boring but foundational problem: capability interoperability. Before MCP, every framework invented its own tool schema, auth model, and lifecycle semantics. That created what operating-systems courses would call a fragmentation problem: many interfaces, little portability. MCP reduces that integration tax.

Official registry

The ecosystem is increasingly organized around the official registry:

Registry UI: https://registry.modelcontextprotocol.io/
Registry docs: https://modelcontextprotocol.io/registry
Registry API docs: https://registry.modelcontextprotocol.io/docs

The registry plays the role that a package index plays in older software ecosystems: it provides discoverability, metadata, version identity, and eventually trust signals. This is why it matters even when you can still install servers directly from GitHub.

Relationship to the three systems

OpenCode treats MCP as a first-class extension mechanism in its config and runtime.
OMO goes further by embedding MCP into skills and combining it with orchestration logic.
Claude Code invests heavily in MCP operations, auth, validation, registry integration, and project/user scope management.

The broad design lesson is that MCP is becoming the “device driver layer” of agent systems. That analogy is not exact, but it is useful: the model does not need to know every external system natively if the runtime can normalize external capabilities into a common protocol.

Model: openai/gpt-5.4 Token Usage (estimated): ~2,700 tokens Generated: 2026-04-01 Book: Claude Code VS OpenCode: Architecture, Design & The Road Ahead

Appendix E: References

The following references were used as conceptual anchors for this book. They include official vendor engineering posts, protocol documentation, industry analysis, and academic papers.

Anthropic. “Building Effective AI Agents.” Published Dec 19, 2024.
URL: https://www.anthropic.com/research/building-effective-agents
Anthropic. “Effective context engineering for AI agents.” Published Sep 29, 2025.
URL: https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
Anthropic. “Beyond permission prompts: making Claude Code more secure and autonomous.” Published Oct 20, 2025.
URL: https://www.anthropic.com/engineering/claude-code-sandboxing
Anthropic. “How we built our multi-agent research system.” Published Jun 13, 2025.
URL: https://www.anthropic.com/engineering/built-multi-agent-research-system
Anthropic. “Writing effective tools for agents — with agents.” Published Sep 11, 2025.
URL: https://www.anthropic.com/engineering/writing-tools-for-agents
Model Context Protocol. Official Documentation.
URL: https://modelcontextprotocol.io/introduction
Model Context Protocol. Official Registry.
URL: https://registry.modelcontextprotocol.io/
Zylos Research. “AI Coding Agents 2025-2026: State of the Art.” Published Jan 9, 2026.
URL: https://zylos.ai/research/2026-01-09-ai-coding-agents
Zylos Research. “AI Agent Plugin and Extension Architecture.” Published Feb 21, 2026.
URL: https://zylos.ai/research/2026-02-21-ai-agent-plugin-extension-architecture
GuruSup. “Agent Orchestration Patterns: Swarm vs Mesh vs Hierarchical.” Published Mar 14, 2026.
URL: https://gurusup.com/en/blog/agent-orchestration-patterns
GuruSup. “Agent Communication Protocols: MCP vs A2A and Why They Matter.” Published Mar 16, 2026.
URL: https://gurusup.com/blog/agent-communication-protocols-mcp-a2a
Kang, Minki et al. “Acon: Optimizing Context Compression for Long-horizon LLM Agents.” arXiv preprint.
URL: https://arxiv.org/abs/2510.00615
Morph Team. “We Tested 15 AI Coding Agents (2026). Only 3 Changed How We Ship.” Published Mar 1, 2026.
URL: https://morphllm.com/ai-coding-agent
Yao, Shunyu et al. “ReAct: Synergizing Reasoning and Acting in Language Models.” 2022.
URL: https://arxiv.org/abs/2210.03629
Jimenez, Carlos E. et al. “SWE-bench: Can Language Models Resolve Real-World GitHub Issues?” 2023.
URL: https://arxiv.org/abs/2310.06770

Short note on citation scope

Several references above are vendor or industry essays rather than peer-reviewed papers. That is intentional. Coding-agent architecture is advancing faster than traditional textbook publication cycles, so engineering blogs often function as primary-source design documents. When a term or mechanism is not yet standardized in classical CS literature, this book treats such sources as contemporary system documentation rather than timeless theory.

Model: openai/gpt-5.4 Token Usage (estimated): ~5,300 tokens Generated: 2026-04-01 Book: Claude Code VS OpenCode: Architecture, Design & The Road Ahead

Appendix F: Glossary

Term	Definition
LLM	Large Language Model. A neural model trained on large text corpora and used here as the reasoning core of an agent.
Token	A model’s basic unit of text processing. Tokens are not exactly words; they are subword chunks used for billing, limits, and context accounting.
Context Window	The maximum amount of tokenized input a model can consider in one inference pass. It is analogous to a bounded working memory.
Function Calling	A structured mechanism that lets a model emit a machine-readable tool invocation instead of plain text.
ReAct	“Reason + Act.” A prompting and control-loop pattern where the model alternates between thinking, tool use, and observation.
Agent	A runtime that combines an LLM, tools, memory/context, and loop control so it can pursue a goal rather than emit one isolated answer.
MCP	Model Context Protocol. A protocol for exposing tools, prompts, resources, and auth to models through a common interface.
ACP	Agent Client Protocol. OpenCode’s protocol for client/runtime interaction across interfaces.
A2A	Agent-to-Agent communication. A broader category for protocols where agents coordinate with other agents rather than only with tools.
LSP	Language Server Protocol. A standardized protocol for editor-language tooling such as diagnostics, references, rename, and definitions.
AST	Abstract Syntax Tree. A tree representation of parsed code structure used for syntax-aware analysis and rewriting.
SSE	Server-Sent Events. A unidirectional streaming transport over HTTP, often used for incremental updates.
Monorepo	A repository containing multiple packages or applications under one versioned root.
TUI	Text User Interface. A terminal-native interface richer than plain command-line output.
Poka-yoke	A Japanese manufacturing term meaning “mistake-proofing.” In agents, it refers to interface or workflow design that prevents easy failures.
ACI	Agent-Computer Interface. By analogy to HCI, it means designing tools and outputs for machine users rather than human users.
JSONL	JSON Lines. A format where each line is a separate JSON object, useful for logs and append-only event streams.
Zod	A TypeScript-first schema validation library widely used to define tool arguments and config schemas.
JSON-RPC	A remote procedure call protocol using JSON messages, common in MCP and related tooling ecosystems.
Extended Thinking	A mode where the model is allowed more internal reasoning budget before responding or calling tools.
Scaffolding	The non-model runtime structure around an LLM: prompt assembly, tools, memory, retries, policies, compaction, and orchestration.
Context Rot	Informal term for quality degradation when an agent accumulates too much stale, noisy, or weakly relevant context. Classical CS textbooks do not name this phenomenon, so the term needs explicit explanation.
Compaction	The process of shrinking conversation/context state while trying to preserve task-relevant information.
Hook	A lifecycle interception point where custom code can run before or after a system event.
Plugin	An externally loaded extension package that adds capabilities without modifying host source code directly.
Skill	A packaged unit of instructions, workflows, and sometimes tooling for a specialized task domain.
YOLO Classifier	In Claude Code terminology, a classifier that predicts whether a tool action can be auto-approved under a more autonomous mode. The name is informal and not a standard CS term.
bubblewrap / Seatbelt	Sandbox technologies. Bubblewrap is common on Linux; Seatbelt is Apple’s sandbox mechanism on macOS.
Ink	A React-based framework for building TUIs in the terminal. Claude Code uses a custom Ink-heavy UI architecture.
Drizzle ORM	A TypeScript ORM and SQL toolkit often used for typed persistence layers.
Vercel AI SDK	A TypeScript SDK for model/provider abstraction and streaming AI application development.
Hono	A lightweight web framework for JavaScript/TypeScript runtimes such as Bun and Cloudflare.
Solid.js	A reactive JavaScript UI framework emphasizing fine-grained reactivity.
Tauri	A desktop-app framework that pairs a Rust shell with web frontend technologies.

Supplemental explanations for non-classical terms

Context Rot

“Context rot” is not a textbook term like cache invalidation or deadlock, but it names a real systems problem: as context grows, retrieval precision falls, irrelevant observations stay resident, and the model’s effective attention budget is diluted. In practice, it behaves like a memory pollution problem.

Scaffolding

Recent agent engineering repeatedly shows that scaffolding matters as much as, and sometimes more than, the base model. In traditional systems language, scaffolding is the control architecture around the compute engine.

Poka-yoke for agents

In manufacturing, poka-yoke means designing the process so common human mistakes become difficult or impossible. For agents, the same idea applies to tools: constrained parameters, safe defaults, staged approvals, and atomic operations reduce catastrophic behavior.

ACI versus HCI

Human-Computer Interaction optimizes interfaces for people. Agent-Computer Interface optimizes interfaces for models. A good ACI tool usually has explicit schemas, low ambiguity, bounded outputs, and failure modes the model can recover from.

Extended Thinking

This term usually refers not to true transparency into internal cognition, but to a product/runtime setting that allocates more reasoning budget, more intermediate steps, or a more deliberate execution strategy before answer generation.

Keyboard shortcuts

Claude Code VS OpenCode: Architecture, Design & The Road Ahead