Code Review Checklist for AI Output | Module 07

The Problem

AI generates code. It looks clean. Syntax is perfect. Names are reasonable. You accept it. Two weeks later: production crashes, security vulnerabilities, user data lost, 3am debugging session.

The issue: AI-generated code optimizes for "looks good" not "works correctly."

AI doesn't think about edge cases, race conditions, or security implications. It generates plausible code based on patterns, but plausible isn't the same as correct. You need a systematic review process to catch what AI misses.

The Core Insight

AI code requires more scrutiny than human code, not less.

When a senior developer writes code, you can trust they considered edge cases and security. When AI writes code, it optimized for statistical likelihood of being "code-like." It has no mental model of your system's failure modes.

Think of AI output like code from an intern: potentially brilliant, definitely needs review. Your job is to be the senior engineer who catches the landmines before they explode.

The 5-Layer Review Process

Review AI code in layers, from surface to deep:

Layer 1: Does It Run?

Time: 30 seconds

Basic execution test:

Does it compile/parse?
Are imports correct?
Do functions exist?
Can you run the happy path?

Red flags:

Imports from non-existent libraries
Function calls to undefined methods
Syntax errors (rare but happens)
Type errors in strongly-typed languages

Layer 2: Does It Follow Patterns?

Time: 2-3 minutes

Architectural consistency check:

Does it match existing file structure?
Does it follow naming conventions?
Does it use established patterns (error handling, logging)?
Does it integrate with existing modules correctly?

Red flags:

Reinvents utilities you already have
Different naming style than rest of codebase
Ignores existing error handling patterns
Bypasses established abstractions

Layer 3: Does It Handle Errors?

Time: 5 minutes

The critical layer AI often fails:

Check	What to Look For	Common AI Failure
Null/undefined	Guards against null values	Assumes inputs are always valid
Network calls	Try/catch on async operations	Happy path only
Database ops	Transaction handling, rollbacks	No error recovery
User input	Validation and sanitization	Trusts all input
File operations	Checks file exists, handles read errors	Assumes success

Test with bad inputs:

# Ask yourself:
- What if this API returns 500?
- What if the user sends an empty string?
- What if the database connection drops mid-query?
- What if the file doesn't exist?
- What if two users do this simultaneously?

Layer 4: Is It Secure?

Time: 5-10 minutes

AI often generates insecure code because training data includes bad practices:

SQL Injection: Does it use parameterized queries or string concatenation?
XSS: Does it sanitize user input before rendering?
Authentication: Does it check permissions before operations?
Secrets: Are API keys/passwords hardcoded or environment variables?
CSRF: Are state-changing operations protected?

AI's Security Blindspot

AI generates "typical" code. Unfortunately, a lot of typical code on the internet is insecure. Always assume AI output needs security hardening.

Layer 5: Is It Maintainable?

Time: 5 minutes

Future-you will need to understand this:

Comments: Are complex sections explained?
Function length: Are functions under 50 lines?
Cyclomatic complexity: Too many nested ifs/loops?
Magic numbers: Are constants defined with names?
Testability: Can you unit test this easily?

The Walkthrough: Reviewing an API Endpoint

AI generated this login endpoint. Let's review:

// Generated by AI
app.post('/login', async (req, res) => {
    const { email, password } = req.body;
    const user = await User.findOne({ email });

    if (user.password === password) {
        const token = jwt.sign({ id: user.id }, 'secret123');
        res.json({ token });
    } else {
        res.status(401).json({ error: 'Invalid credentials' });
    }
});

Layer 1 (Does it run?): ✅ Syntax correct, imports assumed present.

Layer 2 (Patterns?): ⚠️ No input validation middleware. Other routes use it.

Layer 3 (Error handling?): ❌ Critical failures:

No try/catch on async operations
No check if user exists (crashes on user.password)
No database error handling

Layer 4 (Security?): ❌ Multiple vulnerabilities:

Plaintext password comparison (no hashing)
JWT secret hardcoded
No rate limiting (brute force possible)
Timing attack possible (different response times for valid vs invalid emails)

Layer 5 (Maintainable?): ⚠️ Short but missing comments on security requirements.

Verdict: ❌ Do not ship. Major rewrites needed.

Fixed version:

app.post('/login',
    rateLimit({ max: 5, window: '15m' }), // Prevent brute force
    validateInput(loginSchema), // Input validation
    async (req, res) => {
        try {
            const { email, password } = req.body;

            // Find user
            const user = await User.findOne({ email });
            if (!user) {
                // Same response for invalid email/password (prevent timing attacks)
                return res.status(401).json({ error: 'Invalid credentials' });
            }

            // Compare hashed password
            const isValid = await bcrypt.compare(password, user.passwordHash);
            if (!isValid) {
                return res.status(401).json({ error: 'Invalid credentials' });
            }

            // Generate token with environment secret
            const token = jwt.sign(
                { id: user.id },
                process.env.JWT_SECRET,
                { expiresIn: '24h' }
            );

            res.json({ token });
        } catch (error) {
            logger.error('Login error:', error);
            res.status(500).json({ error: 'Internal server error' });
        }
    }
);

Common AI Code Smells

1. The Trusting Optimist

Smell: AI assumes every operation succeeds.

// AI often generates:
const data = await fetch(url);
const json = data.json(); // No error check

// Should be:
const response = await fetch(url);
if (!response.ok) throw new Error(`HTTP ${response.status}`);
const json = await response.json();

2. The Magic String Lover

Smell: Hardcoded values everywhere.

// AI generates:
if (user.role === 'admin') { ... }

// Should be:
const ROLES = { ADMIN: 'admin', USER: 'user' };
if (user.role === ROLES.ADMIN) { ... }

3. The Copy-Paste Architect

Smell: Duplicated logic across functions.

AI generates similar code blocks instead of extracting shared functions. Watch for repeated patterns that should be abstracted.

4. The Synchronous Blocker

Smell: Using sync operations in async contexts.

// AI sometimes generates:
const file = fs.readFileSync('huge-file.json'); // Blocks event loop

// Should be:
const file = await fs.promises.readFile('huge-file.json');

Automated Checklist Tools

Use these tools to catch issues automatically:

ESLint/Pylint: Code style and common bugs
TypeScript: Type safety catches many AI errors
SonarQube: Security vulnerabilities and code smells
Snyk: Dependency vulnerabilities
Jest/Pytest: Run tests, check coverage

But don't rely on tools alone. They catch syntax issues, not logical flaws.

The 10-Minute Rule

Spend at least 10 minutes reviewing AI code, regardless of how simple it looks. Most critical bugs hide in "obviously correct" code.

Quick Reference

Review Checklist (Print this):

[ ] Layer 1: Execution (30 sec)

Runs without errors?
Imports exist?
Happy path works?

[ ] Layer 2: Patterns (2 min)

Matches project structure?
Follows naming conventions?
Uses existing utilities?

[ ] Layer 3: Error Handling (5 min)

Null checks present?
Try/catch on async ops?
Input validation exists?
Graceful failure handling?

[ ] Layer 4: Security (5-10 min)

No SQL injection?
Input sanitized?
Auth checks present?
Secrets in environment?
CSRF protection?

[ ] Layer 5: Maintainability (5 min)

Comments on complex logic?
Functions under 50 lines?
No magic numbers?
Testable structure?

Fast Fail Signals:

Hardcoded secrets → Immediate rejection
No error handling → Must fix before accepting
SQL string concatenation → Security risk
Sync file operations → Performance issue

Prompt for AI to review its own code:

"Review this code for:
1. Error handling gaps
2. Security vulnerabilities
3. Edge cases not handled
4. Performance issues
5. Maintainability concerns

Be critical. What could go wrong?"