Deep Research Report

Claude Code Architecture

A radically simple agentic shell around a powerful model — how Anthropic's coding agent achieves production-grade reliability with deliberate architectural minimalism.

April 2026Prepared for Auditoria.ai R&D18 sources cited

~163K

Lines of TypeScript

92%

Prompt Cache Reuse

45+

Built-in Tools

~90%

Self-Written Code

Section 01

High-Level Architecture Overview

Claude Code embodies a counterintuitive thesis: a deliberately simple agent loop paired with a capable model outperforms complex multi-agent orchestration. Head engineer Boris Cherny describes the philosophy as "the product is the model" ^[1] — meaning the thin TypeScript shell exists only to give the Claude model access to tools, files, and the terminal. Every new model release lets the team delete scaffolding code rather than add it. ^[6]

Figure 1 — High-Level Component Architecture

The architecture is surface-agnostic — the identical agentic loop runs in a terminal, IDE extension, web browser, or CI pipeline. Distribution spans npm, Homebrew, WinGet, standalone binaries with auto-update, and an Agent SDK for programmatic embedding.

Section 02

Technical Stack & Runtime

Technology choices are strategic: the entire codebase is TypeScript in strict mode running on Bun (not Node.js). Bun was chosen for native JSX/TSX support without transpilation, fast bundling, and dead-code elimination. ^[6] The team chose a stack that Claude models excel at — enabling ~90% of the codebase to be self-authored. ^[7]

Layer	Technology	Purpose
Runtime	`Bun`	Fast startup, native TSX, bundling
Language	`TypeScript (strict)`	Type safety, model-friendly authoring
Terminal UI	`React 18 + Ink + Yoga`	Flexbox-based terminal rendering
Rendering	Custom reconciler	Double buffering, blitting, interned pools
CLI Parsing	`Commander.js`	Arguments and subcommands
Validation	`Zod`	All tool I/O validated at runtime
API Client	`@anthropic-ai/sdk`	Anthropic Messages API communication

Source Structure

Directory	Contents	Scale
`src/QueryEngine.ts`	Core LLM session coordinator	~46K lines
`src/tools/`	Tool implementations	45+ tools
`src/commands/`	Slash commands	100+
`src/components/`	React terminal UI components	~146
`src/ink/`	Custom terminal rendering framework	~90 files
`src/hooks/`	React hooks	85+
`src/coordinator/`	Multi-agent orchestration
`src/services/`	API, analytics, MCP, compaction

Startup Optimization: Three critical operations fire in parallel before any module imports: macOS MDM policy reads, OAuth/API key keychain lookups, and TCP+TLS preconnection to the API endpoint. The init() function orchestrates 16 setup stages.

Section 03

The Agentic Loop

The core is a while(true) state machine: accumulate messages, call the Claude API with streaming, execute returned tool calls, append results, and loop until a text-only response. ^[11] This simplicity is the point — the model does virtually all reasoning.

Figure 2 — Core Agentic Loop Execution Flow

Tool Concurrency Strategy

Each tool declares concurrency safety via isConcurrencySafe(). Read-only tools run up to 10 in parallel; write tools run serially. A streaming executor begins tool execution while the model is still generating. ^[4] When the output token budget nears exhaustion, an invisible continuation message triggers up to 3 recovery attempts. ^[5]

Section 04

Tool System & Registry

The 45+ built-in tools conform to a unified TypeScript interface with Zod schema validation, permission checking, and four-tier rendering. Approximately 18 tools are deferred — hidden until the model discovers them via ToolSearchTool, keeping the prompt lean.

Figure 3 — Tool Taxonomy & Deferred Discovery

Section 05

Context Management & Compaction

One of the most distinctive decisions: complete rejection of codebase indexing. Boris Cherny confirmed: early versions used RAG with a local vector database, but agentic search outperformed it "by a lot." ^[2] This eliminated vector databases, embedding pipelines, and index staleness entirely.

Figure 4 — Agentic Search Hierarchy (Ascending Token Cost)

Three-Layer Compaction

Figure 5 — Context Window Compaction Pipeline

After each tool call, Claude receives a token usage warning enabling self-aware context management. Extended thinking tokens are automatically stripped from subsequent turns to conserve space.

Section 06

System Prompt Assembly

The system prompt isn't monolithic — it's built at runtime from modular section-builder functions, split into two segments for prompt caching optimization.

Figure 6 — Dynamic System Prompt Assembly

Key behavioral instructions: professional objectivity, anti-over-engineering, git safety rules (never force-push, never amend without request). The three-phase model instructs Claude to blend gathering context, taking action, and verifying results.

Section 07

MCP Integration

Claude Code acts as an MCP client connecting to external servers over three transports. ^[10] MCP tools use mcp__<server>__<tool> naming and can be configured at three scopes: user, project, and local. ^[3]

Figure 7 — MCP Client Architecture

Capabilities extend beyond tools: resources via @ mentions, elicitation for structured mid-task input, channels for push messaging, and OAuth 2.0 authentication. Claude Code itself serves as an MCP server via claude mcp serve.

Section 08

Security Architecture

Security operates on a "fail-closed" principle: unmatched operations require manual approval. Rules evaluate in deny → ask → allow order, first match wins.

Figure 8 — Layered Security Model

OS sandboxing uses macOS Seatbelt or Linux bubblewrap, reducing permission prompts by 84%. ^[3] The Auto Mode classifier is critically reasoning-blind. ^[12] Over 20 default block rules prevent force-pushing, mass-deleting cloud storage, and crossing trust boundaries.

Section 09

Configuration Hierarchy

Figure 9 — Configuration Cascade (Highest → Lowest Priority)

Hooks provide deterministic lifecycle control at 11 event points. Plugins package skills, sub-agents, hooks, and MCP servers into distributable units with a marketplace ecosystem.

Section 10

Performance & Caching

97.8%

Peak Cache Reuse

81%

Cost Reduction

5–10×

TTFT Improvement

~2.5×

Fast Mode Speedup

The most impactful optimization is aggressive prompt caching. ^[9] Real API traces show 92% prefix reuse, with peak execution at 97.83%. This yields 81% cost reduction ($6.00 → $1.15 for a 2M-token session) and 5–10× time-to-first-token improvement. ^[4]

Figure 10 — Prompt Cache Architecture Across Turns

Additional strategies: SSE streaming, parallel read-tools (up to 10), worktree isolation for concurrent sessions, background agents (Ctrl+B), and Fast Mode (/fast) for ~2.5× Opus 4.6 throughput via priority serving.

Section 11

Comparative Landscape

The table below summarizes key architectural trade-offs across four tools. Each makes different bets about where complexity should live; none is strictly superior across all dimensions. Entries are based on publicly available documentation and may not reflect internal changes since publication.

Dimension	Claude Code	Cursor	Aider	GitHub Copilot Agent
Context Retrieval	Agentic search via grep/glob tools; no indexing step. ^[2]	RAG with vector indexing; also supports @-mentions for manual context. ^[15]	PageRank over tree-sitter AST graph; ranks files by reference density. ^[14]	Hybrid: repository index + embeddings; integrated with GitHub's search infrastructure. ^[16]
Agent Topology	Single main loop + depth-1 sub-agents (Task tool). Sub-agents use cheaper models for read-only work. ^[4]	Background agents can run up to 8 parallel sessions in cloud sandboxes. ^[15]	Single-agent, single-turn architecture. Model-agnostic: works with any LLM provider. ^[14]	Runs as ephemeral GitHub Actions workflows; multiple agents can work on separate issues concurrently. ^[16]
Edit Strategy	Direct string replacement (Edit/Write tools). Simple but requires exact match.	Two-stage: frontier model generates diff, cheaper model applies it. Optimizes cost per edit.	Multiple edit formats (whole-file, diff, search-replace) selected per model capability.	Inline suggestions + autonomous agent editing in cloud environments.
Extensibility	MCP servers, hooks (11 lifecycle events), plugins, skills, Agent SDK. ^[3]	Rules files, MCP servers, custom docs, @-mention context providers.	Fully open-source (Apache 2.0); model-agnostic; community conventions.	Extensions API, MCP servers, GitHub Actions, custom agent instructions. ^[17]
Primary Surface	Terminal-first; same loop runs in IDE, web, CI, desktop.	IDE-first (VS Code fork); optimized for inline editing UX.	Terminal-only; designed for pair programming over CLI.	Cloud-first; triggered from GitHub Issues/PRs; also in VS Code and CLI.

Limitations of this Comparison

This report analyzes Claude Code in depth but relies on public documentation for competitors. Cursor's internal architecture is largely undocumented beyond marketing materials. ^[15] GitHub Copilot's agent infrastructure leverages proprietary GitHub Actions internals not fully described in docs. ^[16] Aider is the most transparent due to being fully open-source. ^[14] A fair head-to-head comparison would require equivalent reverse-engineering depth for each tool.

Architectural thesis: Claude Code bets that less scaffolding around a more capable model beats more scaffolding around a less capable one. As models improve, the architecture's thinness becomes an advantage — less code to conflict with new capabilities. Whether this "radical simplicity" holds as agent tasks grow more complex remains the central open question.

References

Sources

Primary Boris Cherny, STATION F Talk: "The Model is the Product" — Describes Claude Code's origins, product overhang thesis, and the philosophy that the model is the product. March 2025.
Primary Boris Cherny (@bcherny), X post on RAG vs. agentic search — Direct confirmation that early Claude Code used RAG + local vector DB but agentic search proved better, simpler, and more reliable.
Docs Anthropic, Claude Code Documentation — Official documentation covering installation, configuration, MCP integration, permissions, hooks, and usage.
Docs Anthropic Engineering, Claude Code Best Practices — Patterns proven by internal and external engineers. Describes Claude Code as "intentionally low-level and unopinionated, providing close to raw model access."
Docs Anthropic Engineering, Effective Harnesses for Long-Running Agents — Managing agents across multiple context windows; the initializer agent pattern and state recovery approaches.
Primary Boris Cherny and Cat Wu, Latent Space Podcast: "Claude Code: Anthropic's Agent in Your Terminal" — Founding engineers discuss the tool as "not a product as much as it's a Unix utility," Bun runtime choice, and architecture decisions. May 2025.
Primary Gergely Orosz, The Pragmatic Engineer: "Building Claude Code with Boris Cherny" — Interview covering evolution from side project, parallel agents, PR structure, and codebase context retrieval. March 2025.
Primary Boris Cherny, Lenny's Newsletter: "What Happens After Coding Is Solved" — Discusses Claude Code reaching 4% of public GitHub commits, product principles, and the future of software engineering. Feb 2026.
Docs Anthropic, Prompt Caching Announcement — Up to 90% cost reduction and 85% latency reduction for long prompts. August 2024.
Docs Model Context Protocol Specification (2025-11-25) — The authoritative MCP specification defining transports, tool discovery, resources, and authentication.
Docs Anthropic Research, Building Effective Agents — Five workflow patterns (prompt chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer) forming the conceptual foundation. December 2024.
Docs Anthropic Engineering, Claude Code Auto Mode — Two-stage classifier architecture for safer autonomous operation; reasoning-blind design.
Docs Anthropic Engineering, Building Agents with the Claude Agent SDK — The agent harness underlying Claude Code, available as a standalone SDK.
External Paul Gauthier, Aider: Building a Better Repository Map with Tree Sitter — Tree-sitter + NetworkX PageRank approach to repository mapping. October 2023. Also: official docs.
External Cursor, Cursor Documentation — Official feature documentation. Architecture details are limited; see also ByteByteGo analysis for infrastructure-level discussion.
External GitHub, About GitHub Copilot Coding Agent — Official docs on the autonomous agent running in ephemeral GitHub Actions environments.
External GitHub, About Custom Agents — Documentation on creating specialized custom agents for different tasks.
Docs npm, @anthropic-ai/claude-code — Official npm package. Also: GitHub repository.