Context Budgeting for Large Projects | Module 06

The Problem

You're working on a 50-file backend service. You ask Claude to add a new API endpoint. It generates code that breaks existing patterns, ignores your authentication middleware, and reinvents utilities you already have.

Why? The AI never saw those files. Even with a 1M-token window, loading your entire codebase dilutes attention into noise — the agent gets fewer useful signals, not more. You're flying blind, and so is the AI.

The rookie mistake: dumping every file into context and hitting limits. The expert move: strategic budgeting that gives AI exactly what it needs.

The Core Insight

Context windows are RAM, not hard drives.

Just like you don't load your entire filesystem into memory, you shouldn't load your entire codebase into context. You need a paging strategy - what's hot (actively needed), what's warm (might be referenced), what's cold (archive until needed).

Most developers treat context as infinite until it isn't. Smart developers treat it as the constraint it is from day one.

The Walkthrough

Step 1: Know Your Budget

Different AI tools have different limits:

Tool	Context Window	Practical Limit
Claude Sonnet 4.6 / Opus 4.6	1M tokens	~700k before attention decay bites
Claude Haiku 4.5	200k tokens	~150k usable
GPT-5 / Gemini 2.5 Pro	1M+ tokens	~500k usable before quality degrades

The 2026 reality: raw token ceilings aren't the bottleneck anymore — attention decay is. Loading 900k of "context" gets you a model that retrieves less reliably than 150k of focused context. Budget for signal, not ceiling.

Rule of thumb: 1 token ≈ 4 characters in most codebases (more in minified JS, less in verbose configs).

# Quick estimate
wc -c src/**/*.py | tail -1
# Divide by 4 to get rough token count

Step 2: Categorize Your Files

Not all files are created equal for context. Build a mental model:

Core Architecture (10-20% of tokens): Files that define patterns everyone else follows. Main entry points, base classes, core interfaces.
Active Working Set (40-50%): The files you're actually modifying plus their direct dependencies.
Reference Documentation (20-30%): READMEs, API contracts, type definitions - high information density.
Supporting Context (10-20%): Related files that might be referenced but won't change.
Cold Storage (excluded): Everything else. Tests for other modules, config files, generated code.

Step 3: Build a Context Loading Strategy

Three patterns for managing what gets loaded:

Pattern 1: Explicit Minimal Context (Best for focused tasks)

# Example: Adding a new API route
Context to load:
1. src/routes/_template.js (pattern example)
2. src/middleware/auth.js (will be referenced)
3. src/models/User.js (data structure)
4. API_STYLE_GUIDE.md (conventions)

Skip: All other routes, tests, config

Pattern 2: Layered Context Discovery (Best for exploratory work)

# Start minimal, expand as needed
Layer 1: "Where does authentication happen in this codebase?"
  → AI discovers auth.js, session.js
Layer 2: "@auth.js @session.js Add rate limiting to login"
  → Now AI has targeted context
Layer 3: If AI asks about config, load config.js
  → Just-in-time context expansion

Pattern 3: Architecture-First Context (Best for large refactors)

# Front-load the mental model
1. PROJECT_STRUCTURE.md
2. src/index.js (entry point)
3. src/core/*.js (base abstractions)
Then: Specific files for the task

This gives AI the "map" before the "territory"

The 70/30 Rule

Spend 70% of context on files that will be modified, 30% on files that provide understanding. If you're spending more than 30% on reference material, you're diluting the working set.

Step 4: Prune Aggressively

Signs you're wasting context budget:

AI mentions files you didn't load (hallucination from token pressure)
Responses slow down significantly
AI starts giving generic advice instead of codebase-specific guidance
You're @-mentioning 10+ files for a simple change

Pruning tactics:

Show excerpts, not full files: "Here's the authentication pattern (lines 45-80 from auth.js)" instead of the whole file
Summarize then reference: "We use JWT in cookies. See auth.js for implementation" instead of pasting auth.js
Use type definitions over implementations: Share the interface, not the 500-line class
Start new sessions: Context from earlier in the conversation you no longer need? Start fresh.

Failure Patterns

1. The Kitchen Sink

Symptom: You @-mention 15 files "just in case." AI responses are slow and generic.

Fix: Start with 3-5 files maximum. Add more only when AI explicitly needs them.

2. The Context Hoarding

Symptom: You keep entire conversation history even though you moved to a different task 10 messages ago.

Fix: Start new chat sessions when switching tasks. Old context is dead weight.

3. The Underload

Symptom: AI invents patterns instead of following yours because it never saw the examples.

Fix: Always include at least one "pattern example" file showing the established approach.

4. The Dilution

Symptom: You include tests, mocks, fixtures - now AI is confused about what's real.

Fix: Test files are context vampires. Only include them when directly working on tests.

The README Trap

Including a massive README.md "for context" often backfires. READMEs are written for humans discovering the project, not AI with working memory. Better: Write a AI_CONTEXT.md that's dense, technical, and pattern-focused.

Real-World Example: Building a Feature

Task: Add GitHub OAuth to an existing Express app with 80 files.

Rookie Approach (Context Overflow)

@src/  # Tries to load everything
"Add GitHub OAuth login"
Result: AI writes boilerplate that ignores existing auth patterns

Expert Approach (Budgeted Context)

# Step 1: Discovery (minimal context)
"Show me the current authentication implementation"
AI finds: src/auth/local-strategy.js

# Step 2: Pattern extraction (targeted load)
@src/auth/local-strategy.js
"I want to add GitHub OAuth following the same pattern"

# Step 3: Implementation (working set only)
@src/auth/github-strategy.js (new)
@src/auth/passport-config.js (existing)
@src/routes/auth.js (existing)
"Implement GitHub OAuth strategy"

# Step 4: Integration (add necessary config)
@config/auth.js
"Add GitHub OAuth config"

Total context: 4 files + conversation. Fits easily in 30k tokens.

Advanced Techniques

Technique 1: Context Prefetching

For large features, create a "Context Index" file:

// CONTEXT_INDEX.md
# Payment Feature Context Map

Core Files:
- src/payments/processor.js (Stripe integration)
- src/payments/models.js (data models)

Patterns to Follow:
- Error handling: See src/core/errors.js lines 20-45
- API responses: See src/api/base-controller.js

Do NOT reference:
- Legacy payment code in src/legacy/ (deprecated)
- Test mocks in __mocks__/

Load this index first, then specific files as needed.

Technique 2: Differential Context Loading

For refactoring, show the delta not the whole:

"I'm refactoring authentication. Here's the OLD approach:
[paste 20 lines from old auth.js]

Here's the NEW pattern I want:
[paste example from new-auth.js]

Now update all files in src/routes/ to use new pattern"

AI gets before/after context without loading every route file.

Quick Reference

Context Budget Rules:

Target: Use 60-70% of available context (leave room for response)
Maximum files per task: 5-7 for focused work, 10-15 for architecture changes
Start new session after 15-20 exchanges or when switching tasks

Priority Loading Order:

Files you're modifying (highest priority)
Direct dependencies of those files
Pattern examples showing established conventions
Type definitions and interfaces
Documentation (only if high density)

Warning Signs You're Over Budget:

AI responses take 30+ seconds
AI gives generic advice instead of codebase-specific
AI hallucinates file names or APIs
AI contradicts itself within same response

Quick Context Estimation:

# Bash one-liner
find src -name "*.js" -exec wc -c {} + | awk '{sum+=$1} END {print sum/4 " tokens (approx)"}'