Claude Code Memory for Large Codebases: CLAUDE.md, Memory Files & Subagents Explained
Claude Code does not have a memory plugin. It has something messier and more powerful — a file-based memory system that scales differently than you would expect. This guide walks through CLAUDE.md, memory files, the 3-layer sparse loading pattern, and subagent parallelism, written from daily production use across 5 web projects, 450+ articles, and 100+ memory files.
- No, there is no Claude Code memory plugin. The system is file-based: CLAUDE.md (always loaded), memory files (loaded on demand), skills (reusable workflows), and subagents (parallel workers).
- CLAUDE.md is your durable system prompt. Keep it 150-500 lines. Past ~1000 lines it starts crowding out real conversation.
- Memory files live at
~/.claude/projects/{project}/memory/— one topic per file, loaded by intent rather than brute-force. - The 3-layer sparse loading pattern (intent → Top-2 memory → interleaved deep-read on demand) is how you scale past a few dozen memory files without blowing the context window.
- Subagents are the real unlock for >500K-line codebases. Four parallel mappers can explore a monorepo while the main agent holds only the structured summaries.
- Biggest gotcha: stale memory. A memory written two weeks ago may cite a function that no longer exists. Always date-stamp and verify before acting.
Is There a Claude Code Memory Plugin?
No. Nothing in the Claude Code plugin ecosystem is called a "memory plugin." People searching for this are usually coming from ChatGPT, where Memory is a toggleable feature that remembers facts across conversations, or from Cursor, where @Codebase builds an embedding index.
Claude Code's approach is different. Instead of one opaque system, there are four distinct mechanisms, and they compose:
- CLAUDE.md — a project-root markdown file auto-loaded at session start.
- Memory files — per-topic files at
~/.claude/projects/{project}/memory/, loaded when relevant. - Skills — packaged workflows the agent can invoke via the Skill tool.
- Subagents — parallel worker processes that see a scoped sub-task instead of the whole conversation.
On small projects you do not need to think about this. On anything past a few thousand files with mixed concerns, the difference between "Claude Code is magic" and "Claude Code hallucinates constantly" is whether you have used these pieces correctly.
CLAUDE.md: The Always-Loaded Context File
CLAUDE.md lives at the root of your repository and is automatically prepended to the agent's context whenever a new session starts in that directory. Think of it as a durable system prompt that travels with the code.
What belongs in CLAUDE.md:
- Repository layout (where code lives, where docs live, what each top-level folder is for)
- Build and deploy commands (so the agent runs them instead of guessing)
- Iron-law rules — things that must never be done, like "never push directly to main"
- Known pitfalls ("the Prisma client cache needs to be regenerated after schema changes")
- A small index of where detailed docs live for specific topics
What does not belong in CLAUDE.md: the entire API reference, full database schema, every style rule your team has ever written, or a changelog. Those go in separate files and are loaded on demand.
Size sweet spot, from real-world use: 150 to 500 lines. Below 150, you are not giving the agent enough context. Above 500, you start spending a visible fraction of the context window on rules before the real work begins. Past around 1024 lines, recent Claude Code releases hard-truncate CLAUDE.md on load — which turns the "durable system prompt" into a silent data-loss bug if you do not know about the cap.
A test I run every week: search CLAUDE.md for any section that has not been referenced in the last 30 days, and move that section out to a memory file. CLAUDE.md is scarce real estate. Treat it that way.
Memory Files: Per-Topic Durable Storage
Memory files sit in ~/.claude/projects/{project-identifier}/memory/ and are markdown files named by topic. In one of our live SEO projects, that folder currently holds 112 files covering everything from individual site quirks to pitfalls hit during specific deployments. A few examples from that directory:
site-notes.md— per-site stack, blog path, deploy method, affiliate setuppitfalls.md— every production bug we have hit, so the agent does not repeat thembrowser-automation.md— DrissionPage patterns and anti-detection learningsaffiliate-rejection-history.md— which affiliate programs we applied to, which declined, cooldowns
Each memory file has a short frontmatter block with a name, a one-line description, and a type (user, feedback, project, reference). The description is the key field — it is what the agent scans to decide whether to load the file at all.
The frontmatter pattern that works:
--- name: Affiliate Rejection History description: Record of declined affiliate applications with reasons, thresholds, and when to re-apply — avoids wasting time re-submitting too early type: project --- ## Affiliate Rejection History **Why:** Record rejections so we do not re-apply within the cooldown window. **How to apply:** Before any new affiliate application, check this file first. ### 2026-04-11 — SEMrush (via impact.com) — DECLINED ...
Memory files are not auto-loaded at session start — that would defeat the point. They are loaded when the agent routes to them, either because CLAUDE.md's index pointed to the file or because the agent searched for a matching description. On a 100-file memory directory, total overhead from the index pass is around 2-3K tokens, versus roughly 180K if you tried to load all 112 files.
3-Layer Sparse Loading (The Real Unlock)
This is the pattern that made Claude Code tractable for us on a project with 450+ articles, 5 production sites, and more than a hundred memory files. It is inspired by sparse attention research, but the mechanics are pure markdown:
- Layer 1 — Intent classification. When a user message arrives, the agent first classifies it into one of ~10 task types (blog, backlink, seo-check, debug, code, etc). No files loaded yet. Cost: zero tokens.
- Layer 2 — Top-2 sparse load. The intent maps to a routing table in CLAUDE.md that specifies at most 2 memory files to load for that intent, and which sections of each file matter. Cost: roughly 20-30 lines of context per task, not the entire file.
- Layer 3 — Interleaved deep reads. While executing, if the agent hits a specific problem ("the deploy is failing") it dynamically loads a relevant sub-section of another memory file. Loads are triggered by actual need, not speculation. Cost: usually 5-15 additional lines per deep read.
The observed impact: on one project, the same work that previously consumed about 80 lines of CLAUDE.md context now consumes about 35 lines of interleaved loads — a 55% reduction — with no loss of relevant memory.
The routing table is the ugly, unglamorous piece of CLAUDE.md that does most of the work. Simplified from a production file:
| Intent | Load these memory files (Top-2) | Deep-read if… | |-------------|----------------------------------------------|-----------------------------| | blog | site-notes.md + blog-writing-checklist.md | deploy fails → pitfalls.md | | backlink | backlink-master-playbook.md | form breaks → browser-automation.md | | seo-check | affiliate-accounts.md (compact) + pitfalls | stale data → seo-history.md | | debug | pitfalls.md | bug specific → target file | | code | site-notes.md (stack section) + pitfalls.md | test fails → testing.md |
Maintenance is the catch. The routing table is only as good as the maintenance discipline behind it. Every few weeks something drifts: a memory file gets renamed, a new topic emerges, an old entry becomes irrelevant. Budget 15 minutes a week to re-audit the table. Without that audit, the pattern gradually degenerates into "load everything" and the context benefits evaporate.
Subagents: Parallel Work Without Context Bloat
The biggest mental shift when moving from single-agent AI tools (Cursor, Copilot) to Claude Code is that you can spawn parallel workers. Each subagent gets a fresh context, a scoped task, and writes structured output back to a file the main agent reads.
The canonical example is codebase mapping. Asking "explain how authentication works in this monorepo" naively requires loading hundreds of files. With subagents:
- Main agent spawns 4 parallel "mapper" subagents, each given one quadrant of the repo
- Each mapper explores its quadrant, writes findings to a short markdown file
- Main agent reads the 4 summary files (maybe 600 tokens total), not the raw code
- Main agent answers the user's original question with a coherent picture
For a 500K-line monorepo, this is the difference between "impossible, will not fit in context" and "tractable in about 90 seconds of parallel work." Our own deployment uses this pattern whenever a task requires touching files across more than three top-level directories.
The related workflow for collaborative AI work — where you want multiple specialized agents to tackle different parts of a complex change — is covered separately in our Claude Code multi-agent tutorial and the more advanced agent teams guide.
Where Claude Code Memory Actually Breaks
The marketing says: unlimited context, persistent memory, scales to any codebase. The reality is messier. Three failure modes come up enough to call out.
1. Stale memory acted on as fact. A memory file written two weeks ago may cite a function that has since been deleted. The agent reads the memory, treats it as current truth, and recommends calling a function that no longer exists. Mitigation: every memory entry needs a date stamp, and the agent's system prompt should include a rule to verify file existence before citing it. In our setup this is literally one of the top five rules in CLAUDE.md.
2. CLAUDE.md bloat eats the conversation. The first six months of any project, CLAUDE.md grows. By the time it hits 1000+ lines, a significant fraction of your context window is being spent on rules before any actual work starts. This caps how complex the work itself can be. Mitigation: the weekly audit mentioned earlier. Brutal pruning. Move details out to memory files, leave only the routing table and the most violated iron-laws.
3. Subagent coordination overhead. Subagents are great for parallel exploration, but they are expensive to orchestrate. Each spawn has latency and token cost. For small tasks (touching under ~5 files) the overhead eats the benefit. Mitigation: use subagents only when the alternative is clearly "cannot fit in one context."
Fourth, quieter failure: the agent trusts memory over reality. This is subtle. The memory says "the deploy command is X." The deploy command is actually Y now. Unless the agent is explicitly told to verify command accuracy against the current package.json, it will confidently run X and report failure. The fix is a rule in CLAUDE.md: treat memory as hypothesis, verify against live state for anything load-bearing.
How It Compares to Cursor, Augment & Copilot
Different tools solve the large-codebase problem with different trade-offs. Cursor's @Codebase builds a local embedding index of your repo and retrieves files when you invoke it. Works smoothly up to roughly 150K lines. Above that, retrieval quality drops — G2 users consistently mention this in 3-star reviews.
Augment Code's Context Engine is the heaviest solution — it indexes the entire repository including dependencies and runs semantic search at query time. On SWE-bench Pro benchmarks, Augment held the #1 position among enterprise coding tools as of early 2026. The trade-off is pricing: real enterprise usage pushes $100-$200/month per seat, which is a real commitment.
GitHub Copilot Enterprise ($39/user/month) leans on GitHub's metadata — pull requests, issues, knowledge bases — rather than semantic code search. It is the safe choice for compliance-heavy teams but is not the sharpest on pure codebase navigation.
Claude Code's approach is the most flexible but also the most configuration-heavy. There is no pre-built index doing the work for you. In exchange, you get CLAUDE.md, memory files, and subagents as first-class primitives you can compose however you want. On a project where you have invested in the memory system, this beats every other tool for maintainability. On a fresh repo with zero memory setup, Cursor's out-of-the-box experience is smoother.
A broader head-to-head on this exact question is in our deeper comparison on AI coding tools for large codebases, which covers pricing, G2 ratings, and benchmarked failure modes for each.
Setting This Up On Your Own Project
Minimum viable memory setup, in order of value:
- Create
CLAUDE.mdat your repo root. Put in stack, key commands, and two or three iron-law rules. Keep it under 200 lines. - Create a memory directory at
~/.claude/projects/{your-project}/memory/and add apitfalls.mdfile. Whenever a bug bites you, write 3-5 lines about it. - After 2-3 weeks, look at which memory files actually got loaded. Move anything unused into an archive folder. Promote anything loaded frequently into CLAUDE.md's routing table.
- When CLAUDE.md crosses ~400 lines, build the intent dispatch routing table. Move detail out of CLAUDE.md into memory files, leaving only routing logic.
- Only introduce subagents when a single task needs to touch more files than your context window can reasonably hold. Before that, they add overhead without benefit.
Skip the temptation to set all of this up on day one. Start with CLAUDE.md and a single pitfalls file. Let friction tell you what to add next — every memory file should exist because you hit a real problem you do not want to hit again.
FAQ
Is there a Claude Code memory plugin?
No. Claude Code does not have a memory plugin the way ChatGPT has the Memory feature. It uses a file-based memory system built around three layers: CLAUDE.md (always loaded), topic-scoped memory files in ~/.claude/projects/{project}/memory/ (loaded on demand), and skills (reusable workflows). Combined with subagents, this scales far beyond what a single 200K context window would suggest.
How does CLAUDE.md work in a large project?
CLAUDE.md lives at the repository root and is auto-loaded into every session. Treat it as a durable system prompt. A healthy CLAUDE.md sits between 150 and 500 lines; past roughly 1024 lines it becomes counter-productive and may be truncated silently.
What is the difference between memory files and skills?
Memory files store knowledge — facts, history, pitfalls. Skills store procedures — reusable step-by-step workflows. A memory file might say "the DB password rotated on April 9"; a skill would say "here is the 12-step recipe for publishing a blog post." Both persist across sessions.
How do I prevent CLAUDE.md from getting too large?
Use an intent dispatch pattern. Instead of dumping every rule into CLAUDE.md, add a short table mapping task types (blog, backlink, debug) to the Top 1-2 memory files that should be loaded. A project with 100 memory files can still run on a 200-line CLAUDE.md if the intent dispatch is clean. Periodically compact large memory files into sibling .compact.md files that store structure and stats.
Can subagents work across a large codebase in parallel?
Yes — and this is the single biggest lever for scaling Claude Code past the context window limit. You spawn multiple subagents (for example, four parallel codebase-mapper agents, one per repo quadrant) and each writes structured findings to a file. The main agent then reads only the summaries, keeping its own context lean.
What are the real failure modes?
Three common ones: stale memory (claims facts that are no longer true), CLAUDE.md bloat (crowds out actual conversation), and memory-over-reality (agent trusts recall instead of verifying against live files). Mitigate with date-stamped memory, weekly CLAUDE.md audits, and explicit "verify before acting" rules in the system prompt.
Jim Liu is an independent developer based in Sydney running 5 production web projects (AI tools, SEO, finance, gaming) that are maintained almost entirely through Claude Code. The memory system described in this article is the setup keeping those projects coherent across hundreds of articles and dozens of recurring workflows. Published on OpenAI Tools Hub since 2024.
- AI Coding Tools for Large Codebases: What Actually Scales Past 100K Lines — head-to-head comparison of Augment, Cursor, Claude Code, Copilot Enterprise
- How to Build a Multi-Agent AI Team with Claude Code — hands-on tutorial for spawning and orchestrating subagents
- Best Claude Code Skills Ranked by GitHub Stars — the skills ecosystem that sits alongside the memory system