Relevance Filtering for Code Context | Module 13

The Problem

You ask AI to fix a bug in checkout.py. Your tool dumps 50 files into context: the entire user module, payment processing, inventory, logging utilities, even the README. The AI drowns in irrelevance. It suggests changes to unrelated code. Or it hallucinates connections that don't exist.

More context isn't better context. You need signal, not noise.

The challenge: automatically identifying which files, functions, and dependencies actually matter for the task at hand.

The Core Insight

Relevance is task-dependent and graph-based, not filesystem-based.

A file's relevance isn't about proximity in the directory tree - it's about dependency connections, call graphs, and modification history. utils/logger.py might be critical if the bug is in logging. It's irrelevant if you're fixing a CSS layout issue.

The key insight: build a relevance graph based on code relationships, then traverse it from your task entry point.

The Walkthrough

Layer 1: Direct Dependency Analysis

Start with the simplest filter: what does this file import?

import ast

def get_direct_dependencies(file_path):
    """Extract all imports from a Python file."""
    with open(file_path) as f:
        tree = ast.parse(f.read())

    dependencies = set()
    for node in ast.walk(tree):
        if isinstance(node, ast.Import):
            for alias in node.names:
                dependencies.add(alias.name)
        elif isinstance(node, ast.ImportFrom):
            if node.module:
                dependencies.add(node.module)

    return dependencies

# Usage
deps = get_direct_dependencies('checkout.py')
# Returns: {'payment', 'inventory', 'user', 'logging'}

This gives you the first ring of relevance. If you're debugging checkout.py, these are your primary suspects.

Layer 2: Call Graph Analysis

Dependencies tell you what's imported. Call graphs tell you what's actually used.

def build_call_graph(file_path):
    """Map which functions call which."""
    with open(file_path) as f:
        tree = ast.parse(f.read())

    calls = {}
    current_function = None

    for node in ast.walk(tree):
        if isinstance(node, ast.FunctionDef):
            current_function = node.name
            calls[current_function] = set()

        elif isinstance(node, ast.Call) and current_function:
            if isinstance(node.func, ast.Name):
                calls[current_function].add(node.func.id)
            elif isinstance(node.func, ast.Attribute):
                calls[current_function].add(node.func.attr)

    return calls

# Usage
graph = build_call_graph('checkout.py')
# {'process_order': {'validate_payment', 'update_inventory'},
#  'validate_payment': {'get_user_balance', 'log_transaction'}}

Now you know: fixing process_order requires context on validate_payment and update_inventory, but not send_email_receipt which is never called.

Layer 3: Git History Relevance

Files changed together are often related. Mine git history for patterns:

import subprocess

def get_cochange_files(target_file, limit=10):
    """Find files frequently changed with target."""
    # Get commits that touched target file
    cmd = f"git log --format=%H --follow {target_file}"
    commits = subprocess.check_output(cmd, shell=True)
    commits = commits.decode().strip().split('\n')[:limit]

    cochanges = {}
    for commit in commits:
        # Get all files in that commit
        cmd = f"git show --name-only --format= {commit}"
        files = subprocess.check_output(cmd, shell=True)
        files = files.decode().strip().split('\n')

        for f in files:
            if f != target_file:
                cochanges[f] = cochanges.get(f, 0) + 1

    # Sort by frequency
    return sorted(cochanges.items(), key=lambda x: x[1], reverse=True)

# Usage
related = get_cochange_files('checkout.py')
# [('payment.py', 8), ('inventory.py', 6), ('user.py', 3)]

Why Git History Matters

Static analysis misses runtime relationships and business logic connections. If checkout.py and fraud_detection.py are always changed together, that's signal even if there's no direct import.

Layer 4: Error Context Analysis

When debugging, the error itself tells you what's relevant:

def extract_relevant_from_traceback(traceback_text):
    """Parse traceback to find involved files and functions."""
    import re

    # Extract file paths from traceback
    file_pattern = r'File "([^"]+)"'
    files = re.findall(file_pattern, traceback_text)

    # Extract function names
    func_pattern = r'in (\w+)'
    functions = re.findall(func_pattern, traceback_text)

    # Extract error type and message
    error_pattern = r'(\w+Error): (.+)$'
    match = re.search(error_pattern, traceback_text, re.MULTILINE)

    return {
        'files': list(set(files)),
        'functions': list(set(functions)),
        'error_type': match.group(1) if match else None,
        'error_msg': match.group(2) if match else None
    }

# Usage
context = extract_relevant_from_traceback(error_trace)
# {'files': ['checkout.py', 'payment.py'],
#  'functions': ['process_order', 'charge_card'],
#  'error_type': 'ValueError',
#  'error_msg': 'Invalid card number'}

Combining Filters: The Relevance Scoring System

Each layer provides signals. Combine them into a relevance score:

class RelevanceScorer:
    def __init__(self, target_file):
        self.target = target_file
        self.scores = {}

    def score_file(self, candidate_file):
        """Calculate relevance score for a file."""
        score = 0

        # Direct dependency: +10 points
        if candidate_file in get_direct_dependencies(self.target):
            score += 10

        # In call graph: +8 points
        if candidate_file in extract_from_call_graph(self.target):
            score += 8

        # Co-changed in git: +1 per occurrence
        cochanges = get_cochange_files(self.target)
        for f, count in cochanges:
            if f == candidate_file:
                score += count

        # In error traceback: +15 points
        if has_recent_error_involving(candidate_file):
            score += 15

        # Recently modified: +5 points
        if was_modified_recently(candidate_file, days=7):
            score += 5

        self.scores[candidate_file] = score
        return score

    def get_top_n_relevant(self, all_files, n=10):
        """Return top N most relevant files."""
        for f in all_files:
            self.score_file(f)

        sorted_files = sorted(
            self.scores.items(),
            key=lambda x: x[1],
            reverse=True
        )
        return [f for f, score in sorted_files[:n]]

Real Example: Filtering a Debugging Session

Bug report: "Checkout fails for users with store credit."

Your codebase: 200 Python files, 85K total tokens.

File	Relevance Score	Reason	Include?
checkout.py	100	Target file	YES
payment.py	25	Direct import + call graph + error trace	YES
store_credit.py	18	In error trace + co-changed	YES
user.py	13	Direct import + call graph	YES
inventory.py	8	Direct import only	YES
logging.py	3	Imported but not in call path	NO
email_sender.py	1	Co-changed once	NO
analytics.py	0	No connection	NO

Result: 5 files (4.2K tokens) instead of 200 files (85K tokens). 95% reduction with no loss of debugging context.

Failure Patterns

1. Over-Filtering Critical Utilities

Symptom: AI can't solve the bug because a critical utility was scored too low.

Fix: Boost scores for files with high fan-out (imported by many others). They're infrastructure.

def boost_infrastructure_files(scores):
    """Give extra points to heavily-imported files."""
    import_counts = count_importers_for_each_file()

    for file, count in import_counts.items():
        if count > 5:  # Imported by 5+ files
            scores[file] = scores.get(file, 0) + 5
    return scores

2. Including Too Much Test Code

Symptom: Context window filled with test files that don't help fix the bug.

Fix: Penalize test files unless explicitly debugging tests.

def penalize_tests(file_path, base_score):
    if 'test_' in file_path or '/tests/' in file_path:
        return base_score * 0.3  # 70% penalty
    return base_score

3. Missing Dynamic Dependencies

Symptom: AI suggests changes that break runtime imports (plugins, dynamic loading).

Fix: Add runtime analysis or configuration-based dependency tracking.

# Track dynamic imports
dynamic_deps = {
    'checkout.py': ['plugins/stripe.py', 'plugins/paypal.py']
}

def include_dynamic_deps(file_path, scores):
    if file_path in dynamic_deps:
        for dep in dynamic_deps[file_path]:
            scores[dep] = scores.get(dep, 0) + 7
    return scores

4. Ignoring Documentation

Symptom: AI violates API contracts because relevant docs were filtered out.

Fix: Include docstrings and README sections related to modified code.

When to Skip Filtering

Small codebases: Under 20 files? Just include everything.
First-time exploration: You don't know what's relevant yet.
Refactoring across modules: Cross-cutting changes need broad context.
Architecture questions: "How does this system work?" needs the full picture.

Automated Filter Pipeline

Put it all together in a reusable pipeline:

class ContextFilter:
    def __init__(self, codebase_root):
        self.root = codebase_root

    def filter_for_task(self, target_file, task_type, max_files=10):
        """
        Filter codebase to relevant files for a task.

        task_type: 'debug', 'feature', 'refactor', 'explore'
        """
        all_files = find_all_code_files(self.root)
        scorer = RelevanceScorer(target_file)

        # Task-specific scoring weights
        if task_type == 'debug':
            # Prioritize error traces and direct deps
            scorer.error_weight = 2.0
            scorer.dep_weight = 1.5
        elif task_type == 'feature':
            # Prioritize tests and related features
            scorer.test_weight = 1.2
            scorer.cochange_weight = 1.5
        elif task_type == 'refactor':
            # Prioritize call graph and dependents
            scorer.call_graph_weight = 2.0
            scorer.dependent_weight = 1.8

        # Score and filter
        relevant_files = scorer.get_top_n_relevant(all_files, max_files)

        return {
            'files': relevant_files,
            'scores': scorer.scores,
            'total_tokens': sum(count_tokens(f) for f in relevant_files)
        }

Quick Reference

Relevance Scoring Heuristics:

+15 points: In error traceback
+10 points: Direct dependency (import)
+8 points: In call graph of target
+5 points: Modified in last 7 days
+1 per occurrence: Co-changed in git history
-70% penalty: Test files (unless debugging tests)

Filter Decision Tree:

if task == "debug":
    include: error_trace + direct_deps + call_graph
    max_files: 8-12

elif task == "feature":
    include: direct_deps + tests + related_features
    max_files: 15-20

elif task == "refactor":
    include: call_graph + dependents + tests
    max_files: 20-30

elif task == "explore":
    skip_filtering()  # Need broad context

Implementation Checklist:

Build dependency graph (AST analysis)
Build call graph (function usage tracking)
Analyze git co-change history
Parse error traces if available
Score all files based on task type
Select top N files by score
Verify token budget not exceeded