The Problem
You ask AI to fix a bug in checkout.py. Your tool dumps 50 files into context: the entire user module, payment processing, inventory, logging utilities, even the README. The AI drowns in irrelevance. It suggests changes to unrelated code. Or it hallucinates connections that don't exist.
More context isn't better context. You need signal, not noise.
The challenge: automatically identifying which files, functions, and dependencies actually matter for the task at hand.
The Core Insight
Relevance is task-dependent and graph-based, not filesystem-based.
A file's relevance isn't about proximity in the directory tree - it's about dependency connections, call graphs, and modification history. utils/logger.py might be critical if the bug is in logging. It's irrelevant if you're fixing a CSS layout issue.
The key insight: build a relevance graph based on code relationships, then traverse it from your task entry point.
The Walkthrough
Layer 1: Direct Dependency Analysis
Start with the simplest filter: what does this file import?
import ast
def get_direct_dependencies(file_path):
"""Extract all imports from a Python file."""
with open(file_path) as f:
tree = ast.parse(f.read())
dependencies = set()
for node in ast.walk(tree):
if isinstance(node, ast.Import):
for alias in node.names:
dependencies.add(alias.name)
elif isinstance(node, ast.ImportFrom):
if node.module:
dependencies.add(node.module)
return dependencies
# Usage
deps = get_direct_dependencies('checkout.py')
# Returns: {'payment', 'inventory', 'user', 'logging'}
This gives you the first ring of relevance. If you're debugging checkout.py, these are your primary suspects.
Layer 2: Call Graph Analysis
Dependencies tell you what's imported. Call graphs tell you what's actually used.
def build_call_graph(file_path):
"""Map which functions call which."""
with open(file_path) as f:
tree = ast.parse(f.read())
calls = {}
current_function = None
for node in ast.walk(tree):
if isinstance(node, ast.FunctionDef):
current_function = node.name
calls[current_function] = set()
elif isinstance(node, ast.Call) and current_function:
if isinstance(node.func, ast.Name):
calls[current_function].add(node.func.id)
elif isinstance(node.func, ast.Attribute):
calls[current_function].add(node.func.attr)
return calls
# Usage
graph = build_call_graph('checkout.py')
# {'process_order': {'validate_payment', 'update_inventory'},
# 'validate_payment': {'get_user_balance', 'log_transaction'}}
Now you know: fixing process_order requires context on validate_payment and update_inventory, but not send_email_receipt which is never called.
Layer 3: Git History Relevance
Files changed together are often related. Mine git history for patterns:
import subprocess
def get_cochange_files(target_file, limit=10):
"""Find files frequently changed with target."""
# Get commits that touched target file
cmd = f"git log --format=%H --follow {target_file}"
commits = subprocess.check_output(cmd, shell=True)
commits = commits.decode().strip().split('\n')[:limit]
cochanges = {}
for commit in commits:
# Get all files in that commit
cmd = f"git show --name-only --format= {commit}"
files = subprocess.check_output(cmd, shell=True)
files = files.decode().strip().split('\n')
for f in files:
if f != target_file:
cochanges[f] = cochanges.get(f, 0) + 1
# Sort by frequency
return sorted(cochanges.items(), key=lambda x: x[1], reverse=True)
# Usage
related = get_cochange_files('checkout.py')
# [('payment.py', 8), ('inventory.py', 6), ('user.py', 3)]
Why Git History Matters
Static analysis misses runtime relationships and business logic connections. If checkout.py and fraud_detection.py are always changed together, that's signal even if there's no direct import.
Layer 4: Error Context Analysis
When debugging, the error itself tells you what's relevant:
def extract_relevant_from_traceback(traceback_text):
"""Parse traceback to find involved files and functions."""
import re
# Extract file paths from traceback
file_pattern = r'File "([^"]+)"'
files = re.findall(file_pattern, traceback_text)
# Extract function names
func_pattern = r'in (\w+)'
functions = re.findall(func_pattern, traceback_text)
# Extract error type and message
error_pattern = r'(\w+Error): (.+)$'
match = re.search(error_pattern, traceback_text, re.MULTILINE)
return {
'files': list(set(files)),
'functions': list(set(functions)),
'error_type': match.group(1) if match else None,
'error_msg': match.group(2) if match else None
}
# Usage
context = extract_relevant_from_traceback(error_trace)
# {'files': ['checkout.py', 'payment.py'],
# 'functions': ['process_order', 'charge_card'],
# 'error_type': 'ValueError',
# 'error_msg': 'Invalid card number'}
Combining Filters: The Relevance Scoring System
Each layer provides signals. Combine them into a relevance score:
class RelevanceScorer:
def __init__(self, target_file):
self.target = target_file
self.scores = {}
def score_file(self, candidate_file):
"""Calculate relevance score for a file."""
score = 0
# Direct dependency: +10 points
if candidate_file in get_direct_dependencies(self.target):
score += 10
# In call graph: +8 points
if candidate_file in extract_from_call_graph(self.target):
score += 8
# Co-changed in git: +1 per occurrence
cochanges = get_cochange_files(self.target)
for f, count in cochanges:
if f == candidate_file:
score += count
# In error traceback: +15 points
if has_recent_error_involving(candidate_file):
score += 15
# Recently modified: +5 points
if was_modified_recently(candidate_file, days=7):
score += 5
self.scores[candidate_file] = score
return score
def get_top_n_relevant(self, all_files, n=10):
"""Return top N most relevant files."""
for f in all_files:
self.score_file(f)
sorted_files = sorted(
self.scores.items(),
key=lambda x: x[1],
reverse=True
)
return [f for f, score in sorted_files[:n]]
Real Example: Filtering a Debugging Session
Bug report: "Checkout fails for users with store credit."
Your codebase: 200 Python files, 85K total tokens.
| File | Relevance Score | Reason | Include? |
|---|---|---|---|
| checkout.py | 100 | Target file | YES |
| payment.py | 25 | Direct import + call graph + error trace | YES |
| store_credit.py | 18 | In error trace + co-changed | YES |
| user.py | 13 | Direct import + call graph | YES |
| inventory.py | 8 | Direct import only | YES |
| logging.py | 3 | Imported but not in call path | NO |
| email_sender.py | 1 | Co-changed once | NO |
| analytics.py | 0 | No connection | NO |
Result: 5 files (4.2K tokens) instead of 200 files (85K tokens). 95% reduction with no loss of debugging context.
Failure Patterns
1. Over-Filtering Critical Utilities
Symptom: AI can't solve the bug because a critical utility was scored too low.
Fix: Boost scores for files with high fan-out (imported by many others). They're infrastructure.
def boost_infrastructure_files(scores):
"""Give extra points to heavily-imported files."""
import_counts = count_importers_for_each_file()
for file, count in import_counts.items():
if count > 5: # Imported by 5+ files
scores[file] = scores.get(file, 0) + 5
return scores
2. Including Too Much Test Code
Symptom: Context window filled with test files that don't help fix the bug.
Fix: Penalize test files unless explicitly debugging tests.
def penalize_tests(file_path, base_score):
if 'test_' in file_path or '/tests/' in file_path:
return base_score * 0.3 # 70% penalty
return base_score
3. Missing Dynamic Dependencies
Symptom: AI suggests changes that break runtime imports (plugins, dynamic loading).
Fix: Add runtime analysis or configuration-based dependency tracking.
# Track dynamic imports
dynamic_deps = {
'checkout.py': ['plugins/stripe.py', 'plugins/paypal.py']
}
def include_dynamic_deps(file_path, scores):
if file_path in dynamic_deps:
for dep in dynamic_deps[file_path]:
scores[dep] = scores.get(dep, 0) + 7
return scores
4. Ignoring Documentation
Symptom: AI violates API contracts because relevant docs were filtered out.
Fix: Include docstrings and README sections related to modified code.
When to Skip Filtering
- Small codebases: Under 20 files? Just include everything.
- First-time exploration: You don't know what's relevant yet.
- Refactoring across modules: Cross-cutting changes need broad context.
- Architecture questions: "How does this system work?" needs the full picture.
Automated Filter Pipeline
Put it all together in a reusable pipeline:
class ContextFilter:
def __init__(self, codebase_root):
self.root = codebase_root
def filter_for_task(self, target_file, task_type, max_files=10):
"""
Filter codebase to relevant files for a task.
task_type: 'debug', 'feature', 'refactor', 'explore'
"""
all_files = find_all_code_files(self.root)
scorer = RelevanceScorer(target_file)
# Task-specific scoring weights
if task_type == 'debug':
# Prioritize error traces and direct deps
scorer.error_weight = 2.0
scorer.dep_weight = 1.5
elif task_type == 'feature':
# Prioritize tests and related features
scorer.test_weight = 1.2
scorer.cochange_weight = 1.5
elif task_type == 'refactor':
# Prioritize call graph and dependents
scorer.call_graph_weight = 2.0
scorer.dependent_weight = 1.8
# Score and filter
relevant_files = scorer.get_top_n_relevant(all_files, max_files)
return {
'files': relevant_files,
'scores': scorer.scores,
'total_tokens': sum(count_tokens(f) for f in relevant_files)
}
Quick Reference
Relevance Scoring Heuristics:
- +15 points: In error traceback
- +10 points: Direct dependency (import)
- +8 points: In call graph of target
- +5 points: Modified in last 7 days
- +1 per occurrence: Co-changed in git history
- -70% penalty: Test files (unless debugging tests)
Filter Decision Tree:
if task == "debug":
include: error_trace + direct_deps + call_graph
max_files: 8-12
elif task == "feature":
include: direct_deps + tests + related_features
max_files: 15-20
elif task == "refactor":
include: call_graph + dependents + tests
max_files: 20-30
elif task == "explore":
skip_filtering() # Need broad context
Implementation Checklist:
- Build dependency graph (AST analysis)
- Build call graph (function usage tracking)
- Analyze git co-change history
- Parse error traces if available
- Score all files based on task type
- Select top N files by score
- Verify token budget not exceeded