Deep Research Report

Claude Code Architecture

A radically simple agentic shell around a powerful model — how Anthropic's coding agent achieves production-grade reliability with deliberate architectural minimalism.

April 2026Prepared for Auditoria.ai R&D18 sources cited
~163K
Lines of TypeScript
92%
Prompt Cache Reuse
45+
Built-in Tools
~90%
Self-Written Code
Section 01

High-Level Architecture Overview

Claude Code embodies a counterintuitive thesis: a deliberately simple agent loop paired with a capable model outperforms complex multi-agent orchestration. Head engineer Boris Cherny describes the philosophy as "the product is the model" [1] — meaning the thin TypeScript shell exists only to give the Claude model access to tools, files, and the terminal. Every new model release lets the team delete scaffolding code rather than add it. [6]

Figure 1 — High-Level Component Architecture
USER SURFACE Terminal CLI VS Code / JetBrains claude.ai/code Desktop App CORE ENGINE QueryEngine Agentic Loop Tool Registry Permissions System Prompt Builder + Compaction + Skills / Memory EXTERNAL Anthropic APISonnet / Opus MCP Serversstdio / SSE / HTTP Filesystem + Git OS SandboxSeatbelt / bubblewrap The identical agentic loop runs across all surfaces: terminal, IDE, web, CI, and Slack.

The architecture is surface-agnostic — the identical agentic loop runs in a terminal, IDE extension, web browser, or CI pipeline. Distribution spans npm, Homebrew, WinGet, standalone binaries with auto-update, and an Agent SDK for programmatic embedding.

Section 02

Technical Stack & Runtime

Technology choices are strategic: the entire codebase is TypeScript in strict mode running on Bun (not Node.js). Bun was chosen for native JSX/TSX support without transpilation, fast bundling, and dead-code elimination. [6] The team chose a stack that Claude models excel at — enabling ~90% of the codebase to be self-authored. [7]

LayerTechnologyPurpose
RuntimeBunFast startup, native TSX, bundling
LanguageTypeScript (strict)Type safety, model-friendly authoring
Terminal UIReact 18 + Ink + YogaFlexbox-based terminal rendering
RenderingCustom reconcilerDouble buffering, blitting, interned pools
CLI ParsingCommander.jsArguments and subcommands
ValidationZodAll tool I/O validated at runtime
API Client@anthropic-ai/sdkAnthropic Messages API communication

Source Structure

DirectoryContentsScale
src/QueryEngine.tsCore LLM session coordinator~46K lines
src/tools/Tool implementations45+ tools
src/commands/Slash commands100+
src/components/React terminal UI components~146
src/ink/Custom terminal rendering framework~90 files
src/hooks/React hooks85+
src/coordinator/Multi-agent orchestration
src/services/API, analytics, MCP, compaction

Startup Optimization: Three critical operations fire in parallel before any module imports: macOS MDM policy reads, OAuth/API key keychain lookups, and TCP+TLS preconnection to the API endpoint. The init() function orchestrates 16 setup stages.

Section 03

The Agentic Loop

The core is a while(true) state machine: accumulate messages, call the Claude API with streaming, execute returned tool calls, append results, and loop until a text-only response. [11] This simplicity is the point — the model does virtually all reasoning.

Figure 2 — Core Agentic Loop Execution Flow
User Input Prefetch Memory + Skills (parallel) Context > 83%? Yes Auto-Compact~85% compression No Build System Prompt Stream API Call (SSE)to Claude Sonnet / Opus Response Type? Text Only Return Response Tool Calls Concurrency? Read-only Parallel (≤10)Glob, Grep, Read Write ops SerialEdit, Bash, Write Append Tool Results loop

Tool Concurrency Strategy

Each tool declares concurrency safety via isConcurrencySafe(). Read-only tools run up to 10 in parallel; write tools run serially. A streaming executor begins tool execution while the model is still generating. [4] When the output token budget nears exhaustion, an invisible continuation message triggers up to 3 recovery attempts. [5]

Section 04

Tool System & Registry

The 45+ built-in tools conform to a unified TypeScript interface with Zod schema validation, permission checking, and four-tier rendering. Approximately 18 tools are deferred — hidden until the model discovers them via ToolSearchTool, keeping the prompt lean.

Figure 3 — Tool Taxonomy & Deferred Discovery
ALWAYS LOADED (~27 TOOLS) Read, Write, Edit Grep, Glob, LS Bash (120s timeout) TodoWrite / Read WebSearch, WebFetch Task, SendMessage MultiEdit, AskUser, NotebookRead...Alphabetically sorted for cache stability DEFERRED (~18 TOOLS) NotebookEdit Specialized MCP Advanced Search & Analysis tools ToolSearchToolregex + BM25 semantic matching TOOL REGISTRY Zod validation → Permission check → 4-tier rendering (use, progress, result, error)
Section 05

Context Management & Compaction

One of the most distinctive decisions: complete rejection of codebase indexing. Boris Cherny confirmed: early versions used RAG with a local vector database, but agentic search outperformed it "by a lot." [2] This eliminated vector databases, embedding pipelines, and index staleness entirely.

Figure 4 — Agentic Search Hierarchy (Ascending Token Cost)
DeveloperQuery Glob~0 tokensPath patterns Grep (ripgrep)Low costMatching lines Read500–5K tokFull content ExploreSub-Agent (Haiku)Returns summaries Low costHigh cost

Three-Layer Compaction

Figure 5 — Context Window Compaction Pipeline
Context Window: 200K (standard) / 1M (Max/Enterprise) ~65%~83.5% Layer 1: MicrocompactClear older tool outputsPreserve "hot tail" Layer 2: Auto-CompactStructured summary~85% compression Layer 3: /compactManual triggerOptional focus hints Context Rebuild After Compaction Re-read 5 most recently accessed files Preserve task lists and plan state Reload CLAUDE.md from disk (survives compaction) Inject continuation message

After each tool call, Claude receives a token usage warning enabling self-aware context management. Extended thinking tokens are automatically stripped from subsequent turns to conserve space.

Section 06

System Prompt Assembly

The system prompt isn't monolithic — it's built at runtime from modular section-builder functions, split into two segments for prompt caching optimization.

Figure 6 — Dynamic System Prompt Assembly
STATIC PREFIX — GLOBALLY CACHED Agent identity & personality Permission mode rules Hooks config Code style rules Security instr. Tool preferences Output efficiency guidelines Identical across turns → max cache hits CACHE BOUNDARY DYNAMIC SUFFIX — SESSION-SPECIFIC Agent tools + full JSON schemas Skills definitions CLAUDE.md content (from disk) Model overrides Env (CWD, OS, git) MCP server instr. Context budget Changes per session, ordered for prefix overlap Final Prompt: ~20,000+ tokens

Key behavioral instructions: professional objectivity, anti-over-engineering, git safety rules (never force-push, never amend without request). The three-phase model instructs Claude to blend gathering context, taking action, and verifying results.

Section 07

MCP Integration

Claude Code acts as an MCP client connecting to external servers over three transports. [10] MCP tools use mcp__<server>__<tool> naming and can be configured at three scopes: user, project, and local. [3]

Figure 7 — MCP Client Architecture
Claude CodeMCP Client stdioLocal processes SSEStreaming remote HTTPCloud servers GitHub Slack Database Custom Tools ToolSearch: 77K → 8.7K tokens (85% reduction) Only names at session start; full defs on-demand

Capabilities extend beyond tools: resources via @ mentions, elicitation for structured mid-task input, channels for push messaging, and OAuth 2.0 authentication. Claude Code itself serves as an MCP server via claude mcp serve.

Section 08

Security Architecture

Security operates on a "fail-closed" principle: unmatched operations require manual approval. Rules evaluate in deny → ask → allow order, first match wins.

Figure 8 — Layered Security Model
Tool Action Requested Permission? Deny Blocked Allow No match (fail-closed) Ask User Approved OS SANDBOX Filesystem isolation Write: working dir only Network isolation Unix socket proxy Child inheritance Execute AUTO MODE (MARCH 2026) Stage 1: Single-Token Filter8.5% false positive rate Stage 2: Chain-of-Thought0.4% false positive rate Reasoning-blind: classifier sees only user msgs + tool cmds, never agent reasoning

OS sandboxing uses macOS Seatbelt or Linux bubblewrap, reducing permission prompts by 84%. [3] The Auto Mode classifier is critically reasoning-blind. [12] Over 20 default block rules prevent force-pushing, mass-deleting cloud storage, and crossing trust boundaries.

Section 09

Configuration Hierarchy

Figure 9 — Configuration Cascade (Highest → Lowest Priority)
ManagedEnterprise MDMCannot override CLI Args--flag values Localsettings.local.jsongitignored Projectsettings.jsonversion controlled User~/.claude/settingsglobal defaults HIGHEST PRIORITY LOWEST CLAUDE.MD /etc → ~/.claude → project tree → rules/ Best practice: under 200 lines AUTO MEMORY + DREAM MEMORY.md index (200-line limit, 25KB loaded) Auto Dream: periodic dedup + pruning

Hooks provide deterministic lifecycle control at 11 event points. Plugins package skills, sub-agents, hooks, and MCP servers into distributable units with a marketplace ecosystem.

Section 10

Performance & Caching

97.8%
Peak Cache Reuse
81%
Cost Reduction
5–10×
TTFT Improvement
~2.5×
Fast Mode Speedup

The most impactful optimization is aggressive prompt caching. [9] Real API traces show 92% prefix reuse, with peak execution at 97.83%. This yields 81% cost reduction ($6.00 → $1.15 for a 2M-token session) and 5–10× time-to-first-token improvement. [4]

Figure 10 — Prompt Cache Architecture Across Turns
TURN N Static Prefix (identity, tools, rules) Dynamic Suffix (env, CLAUDE.md) Conversation History 5min / 1hr TTL TURN N+1 Static Prefix ✓ CACHED Dynamic Suffix ✓ CACHED History + New Tool Results Cache-Friendly Invariants Alphabetical tool sort • Static/dynamic split • Microcompaction preserves prefix • Warm-up calls

Additional strategies: SSE streaming, parallel read-tools (up to 10), worktree isolation for concurrent sessions, background agents (Ctrl+B), and Fast Mode (/fast) for ~2.5× Opus 4.6 throughput via priority serving.

Section 11

Comparative Landscape

The table below summarizes key architectural trade-offs across four tools. Each makes different bets about where complexity should live; none is strictly superior across all dimensions. Entries are based on publicly available documentation and may not reflect internal changes since publication.

DimensionClaude CodeCursorAiderGitHub Copilot Agent
Context RetrievalAgentic search via grep/glob tools; no indexing step. [2]RAG with vector indexing; also supports @-mentions for manual context. [15]PageRank over tree-sitter AST graph; ranks files by reference density. [14]Hybrid: repository index + embeddings; integrated with GitHub's search infrastructure. [16]
Agent TopologySingle main loop + depth-1 sub-agents (Task tool). Sub-agents use cheaper models for read-only work. [4]Background agents can run up to 8 parallel sessions in cloud sandboxes. [15]Single-agent, single-turn architecture. Model-agnostic: works with any LLM provider. [14]Runs as ephemeral GitHub Actions workflows; multiple agents can work on separate issues concurrently. [16]
Edit StrategyDirect string replacement (Edit/Write tools). Simple but requires exact match.Two-stage: frontier model generates diff, cheaper model applies it. Optimizes cost per edit.Multiple edit formats (whole-file, diff, search-replace) selected per model capability.Inline suggestions + autonomous agent editing in cloud environments.
ExtensibilityMCP servers, hooks (11 lifecycle events), plugins, skills, Agent SDK. [3]Rules files, MCP servers, custom docs, @-mention context providers.Fully open-source (Apache 2.0); model-agnostic; community conventions.Extensions API, MCP servers, GitHub Actions, custom agent instructions. [17]
Primary SurfaceTerminal-first; same loop runs in IDE, web, CI, desktop.IDE-first (VS Code fork); optimized for inline editing UX.Terminal-only; designed for pair programming over CLI.Cloud-first; triggered from GitHub Issues/PRs; also in VS Code and CLI.

Limitations of this Comparison

This report analyzes Claude Code in depth but relies on public documentation for competitors. Cursor's internal architecture is largely undocumented beyond marketing materials. [15] GitHub Copilot's agent infrastructure leverages proprietary GitHub Actions internals not fully described in docs. [16] Aider is the most transparent due to being fully open-source. [14] A fair head-to-head comparison would require equivalent reverse-engineering depth for each tool.

Architectural thesis: Claude Code bets that less scaffolding around a more capable model beats more scaffolding around a less capable one. As models improve, the architecture's thinness becomes an advantage — less code to conflict with new capabilities. Whether this "radical simplicity" holds as agent tasks grow more complex remains the central open question.

References

Sources

  1. Primary Boris Cherny, STATION F Talk: "The Model is the Product" — Describes Claude Code's origins, product overhang thesis, and the philosophy that the model is the product. March 2025.
  2. Primary Boris Cherny (@bcherny), X post on RAG vs. agentic search — Direct confirmation that early Claude Code used RAG + local vector DB but agentic search proved better, simpler, and more reliable.
  3. Docs Anthropic, Claude Code Documentation — Official documentation covering installation, configuration, MCP integration, permissions, hooks, and usage.
  4. Docs Anthropic Engineering, Claude Code Best Practices — Patterns proven by internal and external engineers. Describes Claude Code as "intentionally low-level and unopinionated, providing close to raw model access."
  5. Docs Anthropic Engineering, Effective Harnesses for Long-Running Agents — Managing agents across multiple context windows; the initializer agent pattern and state recovery approaches.
  6. Primary Boris Cherny and Cat Wu, Latent Space Podcast: "Claude Code: Anthropic's Agent in Your Terminal" — Founding engineers discuss the tool as "not a product as much as it's a Unix utility," Bun runtime choice, and architecture decisions. May 2025.
  7. Primary Gergely Orosz, The Pragmatic Engineer: "Building Claude Code with Boris Cherny" — Interview covering evolution from side project, parallel agents, PR structure, and codebase context retrieval. March 2025.
  8. Primary Boris Cherny, Lenny's Newsletter: "What Happens After Coding Is Solved" — Discusses Claude Code reaching 4% of public GitHub commits, product principles, and the future of software engineering. Feb 2026.
  9. Docs Anthropic, Prompt Caching Announcement — Up to 90% cost reduction and 85% latency reduction for long prompts. August 2024.
  10. Docs Model Context Protocol Specification (2025-11-25) — The authoritative MCP specification defining transports, tool discovery, resources, and authentication.
  11. Docs Anthropic Research, Building Effective Agents — Five workflow patterns (prompt chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer) forming the conceptual foundation. December 2024.
  12. Docs Anthropic Engineering, Claude Code Auto Mode — Two-stage classifier architecture for safer autonomous operation; reasoning-blind design.
  13. Docs Anthropic Engineering, Building Agents with the Claude Agent SDK — The agent harness underlying Claude Code, available as a standalone SDK.
  14. External Paul Gauthier, Aider: Building a Better Repository Map with Tree Sitter — Tree-sitter + NetworkX PageRank approach to repository mapping. October 2023. Also: official docs.
  15. External Cursor, Cursor Documentation — Official feature documentation. Architecture details are limited; see also ByteByteGo analysis for infrastructure-level discussion.
  16. External GitHub, About GitHub Copilot Coding Agent — Official docs on the autonomous agent running in ephemeral GitHub Actions environments.
  17. External GitHub, About Custom Agents — Documentation on creating specialized custom agents for different tasks.
  18. Docs npm, @anthropic-ai/claude-code — Official npm package. Also: GitHub repository.