Context Budgeting for Large Projects

Module 06: Context Window Mastery | Expansion Guide

Back to Module 06

The Problem

You're working on a 50-file backend service. You ask Claude to add a new API endpoint. It generates code that breaks existing patterns, ignores your authentication middleware, and reinvents utilities you already have.

Why? The AI never saw those files. Even with a 1M-token window, loading your entire codebase dilutes attention into noise — the agent gets fewer useful signals, not more. You're flying blind, and so is the AI.

The rookie mistake: dumping every file into context and hitting limits. The expert move: strategic budgeting that gives AI exactly what it needs.

The Core Insight

Context windows are RAM, not hard drives.

Just like you don't load your entire filesystem into memory, you shouldn't load your entire codebase into context. You need a paging strategy - what's hot (actively needed), what's warm (might be referenced), what's cold (archive until needed).

Most developers treat context as infinite until it isn't. Smart developers treat it as the constraint it is from day one.

The Walkthrough

Step 1: Know Your Budget

Different AI tools have different limits:

Tool Context Window Practical Limit
Claude Sonnet 4.6 / Opus 4.6 1M tokens ~700k before attention decay bites
Claude Haiku 4.5 200k tokens ~150k usable
GPT-5 / Gemini 2.5 Pro 1M+ tokens ~500k usable before quality degrades

The 2026 reality: raw token ceilings aren't the bottleneck anymore — attention decay is. Loading 900k of "context" gets you a model that retrieves less reliably than 150k of focused context. Budget for signal, not ceiling.

Rule of thumb: 1 token ≈ 4 characters in most codebases (more in minified JS, less in verbose configs).

# Quick estimate
wc -c src/**/*.py | tail -1
# Divide by 4 to get rough token count

Step 2: Categorize Your Files

Not all files are created equal for context. Build a mental model:

Step 3: Build a Context Loading Strategy

Three patterns for managing what gets loaded:

Pattern 1: Explicit Minimal Context (Best for focused tasks)

# Example: Adding a new API route
Context to load:
1. src/routes/_template.js (pattern example)
2. src/middleware/auth.js (will be referenced)
3. src/models/User.js (data structure)
4. API_STYLE_GUIDE.md (conventions)

Skip: All other routes, tests, config

Pattern 2: Layered Context Discovery (Best for exploratory work)

# Start minimal, expand as needed
Layer 1: "Where does authentication happen in this codebase?"
  → AI discovers auth.js, session.js
Layer 2: "@auth.js @session.js Add rate limiting to login"
  → Now AI has targeted context
Layer 3: If AI asks about config, load config.js
  → Just-in-time context expansion

Pattern 3: Architecture-First Context (Best for large refactors)

# Front-load the mental model
1. PROJECT_STRUCTURE.md
2. src/index.js (entry point)
3. src/core/*.js (base abstractions)
Then: Specific files for the task

This gives AI the "map" before the "territory"

The 70/30 Rule

Spend 70% of context on files that will be modified, 30% on files that provide understanding. If you're spending more than 30% on reference material, you're diluting the working set.

Step 4: Prune Aggressively

Signs you're wasting context budget:

Pruning tactics:

  1. Show excerpts, not full files: "Here's the authentication pattern (lines 45-80 from auth.js)" instead of the whole file
  2. Summarize then reference: "We use JWT in cookies. See auth.js for implementation" instead of pasting auth.js
  3. Use type definitions over implementations: Share the interface, not the 500-line class
  4. Start new sessions: Context from earlier in the conversation you no longer need? Start fresh.

Failure Patterns

1. The Kitchen Sink

Symptom: You @-mention 15 files "just in case." AI responses are slow and generic.

Fix: Start with 3-5 files maximum. Add more only when AI explicitly needs them.

2. The Context Hoarding

Symptom: You keep entire conversation history even though you moved to a different task 10 messages ago.

Fix: Start new chat sessions when switching tasks. Old context is dead weight.

3. The Underload

Symptom: AI invents patterns instead of following yours because it never saw the examples.

Fix: Always include at least one "pattern example" file showing the established approach.

4. The Dilution

Symptom: You include tests, mocks, fixtures - now AI is confused about what's real.

Fix: Test files are context vampires. Only include them when directly working on tests.

The README Trap

Including a massive README.md "for context" often backfires. READMEs are written for humans discovering the project, not AI with working memory. Better: Write a AI_CONTEXT.md that's dense, technical, and pattern-focused.

Real-World Example: Building a Feature

Task: Add GitHub OAuth to an existing Express app with 80 files.

Rookie Approach (Context Overflow)

@src/  # Tries to load everything
"Add GitHub OAuth login"
Result: AI writes boilerplate that ignores existing auth patterns

Expert Approach (Budgeted Context)

# Step 1: Discovery (minimal context)
"Show me the current authentication implementation"
AI finds: src/auth/local-strategy.js

# Step 2: Pattern extraction (targeted load)
@src/auth/local-strategy.js
"I want to add GitHub OAuth following the same pattern"

# Step 3: Implementation (working set only)
@src/auth/github-strategy.js (new)
@src/auth/passport-config.js (existing)
@src/routes/auth.js (existing)
"Implement GitHub OAuth strategy"

# Step 4: Integration (add necessary config)
@config/auth.js
"Add GitHub OAuth config"

Total context: 4 files + conversation. Fits easily in 30k tokens.

Advanced Techniques

Technique 1: Context Prefetching

For large features, create a "Context Index" file:

// CONTEXT_INDEX.md
# Payment Feature Context Map

Core Files:
- src/payments/processor.js (Stripe integration)
- src/payments/models.js (data models)

Patterns to Follow:
- Error handling: See src/core/errors.js lines 20-45
- API responses: See src/api/base-controller.js

Do NOT reference:
- Legacy payment code in src/legacy/ (deprecated)
- Test mocks in __mocks__/

Load this index first, then specific files as needed.

Technique 2: Differential Context Loading

For refactoring, show the delta not the whole:

"I'm refactoring authentication. Here's the OLD approach:
[paste 20 lines from old auth.js]

Here's the NEW pattern I want:
[paste example from new-auth.js]

Now update all files in src/routes/ to use new pattern"

AI gets before/after context without loading every route file.

Quick Reference

Context Budget Rules:

Priority Loading Order:

  1. Files you're modifying (highest priority)
  2. Direct dependencies of those files
  3. Pattern examples showing established conventions
  4. Type definitions and interfaces
  5. Documentation (only if high density)

Warning Signs You're Over Budget:

Quick Context Estimation:

# Bash one-liner
find src -name "*.js" -exec wc -c {} + | awk '{sum+=$1} END {print sum/4 " tokens (approx)"}'