The Problem
You're working on a 50-file backend service. You ask Claude to add a new API endpoint. It generates code that breaks existing patterns, ignores your authentication middleware, and reinvents utilities you already have.
Why? The AI never saw those files. Even with a 1M-token window, loading your entire codebase dilutes attention into noise — the agent gets fewer useful signals, not more. You're flying blind, and so is the AI.
The rookie mistake: dumping every file into context and hitting limits. The expert move: strategic budgeting that gives AI exactly what it needs.
The Core Insight
Context windows are RAM, not hard drives.
Just like you don't load your entire filesystem into memory, you shouldn't load your entire codebase into context. You need a paging strategy - what's hot (actively needed), what's warm (might be referenced), what's cold (archive until needed).
Most developers treat context as infinite until it isn't. Smart developers treat it as the constraint it is from day one.
The Walkthrough
Step 1: Know Your Budget
Different AI tools have different limits:
| Tool | Context Window | Practical Limit |
|---|---|---|
| Claude Sonnet 4.6 / Opus 4.6 | 1M tokens | ~700k before attention decay bites |
| Claude Haiku 4.5 | 200k tokens | ~150k usable |
| GPT-5 / Gemini 2.5 Pro | 1M+ tokens | ~500k usable before quality degrades |
The 2026 reality: raw token ceilings aren't the bottleneck anymore — attention decay is. Loading 900k of "context" gets you a model that retrieves less reliably than 150k of focused context. Budget for signal, not ceiling.
Rule of thumb: 1 token ≈ 4 characters in most codebases (more in minified JS, less in verbose configs).
# Quick estimate
wc -c src/**/*.py | tail -1
# Divide by 4 to get rough token count
Step 2: Categorize Your Files
Not all files are created equal for context. Build a mental model:
- Core Architecture (10-20% of tokens): Files that define patterns everyone else follows. Main entry points, base classes, core interfaces.
- Active Working Set (40-50%): The files you're actually modifying plus their direct dependencies.
- Reference Documentation (20-30%): READMEs, API contracts, type definitions - high information density.
- Supporting Context (10-20%): Related files that might be referenced but won't change.
- Cold Storage (excluded): Everything else. Tests for other modules, config files, generated code.
Step 3: Build a Context Loading Strategy
Three patterns for managing what gets loaded:
Pattern 1: Explicit Minimal Context (Best for focused tasks)
# Example: Adding a new API route
Context to load:
1. src/routes/_template.js (pattern example)
2. src/middleware/auth.js (will be referenced)
3. src/models/User.js (data structure)
4. API_STYLE_GUIDE.md (conventions)
Skip: All other routes, tests, config
Pattern 2: Layered Context Discovery (Best for exploratory work)
# Start minimal, expand as needed
Layer 1: "Where does authentication happen in this codebase?"
→ AI discovers auth.js, session.js
Layer 2: "@auth.js @session.js Add rate limiting to login"
→ Now AI has targeted context
Layer 3: If AI asks about config, load config.js
→ Just-in-time context expansion
Pattern 3: Architecture-First Context (Best for large refactors)
# Front-load the mental model
1. PROJECT_STRUCTURE.md
2. src/index.js (entry point)
3. src/core/*.js (base abstractions)
Then: Specific files for the task
This gives AI the "map" before the "territory"
The 70/30 Rule
Spend 70% of context on files that will be modified, 30% on files that provide understanding. If you're spending more than 30% on reference material, you're diluting the working set.
Step 4: Prune Aggressively
Signs you're wasting context budget:
- AI mentions files you didn't load (hallucination from token pressure)
- Responses slow down significantly
- AI starts giving generic advice instead of codebase-specific guidance
- You're @-mentioning 10+ files for a simple change
Pruning tactics:
- Show excerpts, not full files: "Here's the authentication pattern (lines 45-80 from auth.js)" instead of the whole file
- Summarize then reference: "We use JWT in cookies. See auth.js for implementation" instead of pasting auth.js
- Use type definitions over implementations: Share the interface, not the 500-line class
- Start new sessions: Context from earlier in the conversation you no longer need? Start fresh.
Failure Patterns
1. The Kitchen Sink
Symptom: You @-mention 15 files "just in case." AI responses are slow and generic.
Fix: Start with 3-5 files maximum. Add more only when AI explicitly needs them.
2. The Context Hoarding
Symptom: You keep entire conversation history even though you moved to a different task 10 messages ago.
Fix: Start new chat sessions when switching tasks. Old context is dead weight.
3. The Underload
Symptom: AI invents patterns instead of following yours because it never saw the examples.
Fix: Always include at least one "pattern example" file showing the established approach.
4. The Dilution
Symptom: You include tests, mocks, fixtures - now AI is confused about what's real.
Fix: Test files are context vampires. Only include them when directly working on tests.
The README Trap
Including a massive README.md "for context" often backfires. READMEs are written for humans discovering the project, not AI with working memory. Better: Write a AI_CONTEXT.md that's dense, technical, and pattern-focused.
Real-World Example: Building a Feature
Task: Add GitHub OAuth to an existing Express app with 80 files.
Rookie Approach (Context Overflow)
@src/ # Tries to load everything
"Add GitHub OAuth login"
Result: AI writes boilerplate that ignores existing auth patterns
Expert Approach (Budgeted Context)
# Step 1: Discovery (minimal context)
"Show me the current authentication implementation"
AI finds: src/auth/local-strategy.js
# Step 2: Pattern extraction (targeted load)
@src/auth/local-strategy.js
"I want to add GitHub OAuth following the same pattern"
# Step 3: Implementation (working set only)
@src/auth/github-strategy.js (new)
@src/auth/passport-config.js (existing)
@src/routes/auth.js (existing)
"Implement GitHub OAuth strategy"
# Step 4: Integration (add necessary config)
@config/auth.js
"Add GitHub OAuth config"
Total context: 4 files + conversation. Fits easily in 30k tokens.
Advanced Techniques
Technique 1: Context Prefetching
For large features, create a "Context Index" file:
// CONTEXT_INDEX.md
# Payment Feature Context Map
Core Files:
- src/payments/processor.js (Stripe integration)
- src/payments/models.js (data models)
Patterns to Follow:
- Error handling: See src/core/errors.js lines 20-45
- API responses: See src/api/base-controller.js
Do NOT reference:
- Legacy payment code in src/legacy/ (deprecated)
- Test mocks in __mocks__/
Load this index first, then specific files as needed.
Technique 2: Differential Context Loading
For refactoring, show the delta not the whole:
"I'm refactoring authentication. Here's the OLD approach:
[paste 20 lines from old auth.js]
Here's the NEW pattern I want:
[paste example from new-auth.js]
Now update all files in src/routes/ to use new pattern"
AI gets before/after context without loading every route file.
Quick Reference
Context Budget Rules:
- Target: Use 60-70% of available context (leave room for response)
- Maximum files per task: 5-7 for focused work, 10-15 for architecture changes
- Start new session after 15-20 exchanges or when switching tasks
Priority Loading Order:
- Files you're modifying (highest priority)
- Direct dependencies of those files
- Pattern examples showing established conventions
- Type definitions and interfaces
- Documentation (only if high density)
Warning Signs You're Over Budget:
- AI responses take 30+ seconds
- AI gives generic advice instead of codebase-specific
- AI hallucinates file names or APIs
- AI contradicts itself within same response
Quick Context Estimation:
# Bash one-liner
find src -name "*.js" -exec wc -c {} + | awk '{sum+=$1} END {print sum/4 " tokens (approx)"}'