Code Review Checklist for AI Output

Module 07: AI Development Workflows | Expansion Guide

Back to Module 07

The Problem

AI generates code. It looks clean. Syntax is perfect. Names are reasonable. You accept it. Two weeks later: production crashes, security vulnerabilities, user data lost, 3am debugging session.

The issue: AI-generated code optimizes for "looks good" not "works correctly."

AI doesn't think about edge cases, race conditions, or security implications. It generates plausible code based on patterns, but plausible isn't the same as correct. You need a systematic review process to catch what AI misses.

The Core Insight

AI code requires more scrutiny than human code, not less.

When a senior developer writes code, you can trust they considered edge cases and security. When AI writes code, it optimized for statistical likelihood of being "code-like." It has no mental model of your system's failure modes.

Think of AI output like code from an intern: potentially brilliant, definitely needs review. Your job is to be the senior engineer who catches the landmines before they explode.

The 5-Layer Review Process

Review AI code in layers, from surface to deep:

Layer 1: Does It Run?

Time: 30 seconds

Basic execution test:

Red flags:

Layer 2: Does It Follow Patterns?

Time: 2-3 minutes

Architectural consistency check:

Red flags:

Layer 3: Does It Handle Errors?

Time: 5 minutes

The critical layer AI often fails:

Check What to Look For Common AI Failure
Null/undefined Guards against null values Assumes inputs are always valid
Network calls Try/catch on async operations Happy path only
Database ops Transaction handling, rollbacks No error recovery
User input Validation and sanitization Trusts all input
File operations Checks file exists, handles read errors Assumes success

Test with bad inputs:

# Ask yourself:
- What if this API returns 500?
- What if the user sends an empty string?
- What if the database connection drops mid-query?
- What if the file doesn't exist?
- What if two users do this simultaneously?

Layer 4: Is It Secure?

Time: 5-10 minutes

AI often generates insecure code because training data includes bad practices:

AI's Security Blindspot

AI generates "typical" code. Unfortunately, a lot of typical code on the internet is insecure. Always assume AI output needs security hardening.

Layer 5: Is It Maintainable?

Time: 5 minutes

Future-you will need to understand this:

The Walkthrough: Reviewing an API Endpoint

AI generated this login endpoint. Let's review:

// Generated by AI
app.post('/login', async (req, res) => {
    const { email, password } = req.body;
    const user = await User.findOne({ email });

    if (user.password === password) {
        const token = jwt.sign({ id: user.id }, 'secret123');
        res.json({ token });
    } else {
        res.status(401).json({ error: 'Invalid credentials' });
    }
});

Layer 1 (Does it run?): ✅ Syntax correct, imports assumed present.

Layer 2 (Patterns?): ⚠️ No input validation middleware. Other routes use it.

Layer 3 (Error handling?): ❌ Critical failures:

Layer 4 (Security?): ❌ Multiple vulnerabilities:

Layer 5 (Maintainable?): ⚠️ Short but missing comments on security requirements.

Verdict: ❌ Do not ship. Major rewrites needed.

Fixed version:

app.post('/login',
    rateLimit({ max: 5, window: '15m' }), // Prevent brute force
    validateInput(loginSchema), // Input validation
    async (req, res) => {
        try {
            const { email, password } = req.body;

            // Find user
            const user = await User.findOne({ email });
            if (!user) {
                // Same response for invalid email/password (prevent timing attacks)
                return res.status(401).json({ error: 'Invalid credentials' });
            }

            // Compare hashed password
            const isValid = await bcrypt.compare(password, user.passwordHash);
            if (!isValid) {
                return res.status(401).json({ error: 'Invalid credentials' });
            }

            // Generate token with environment secret
            const token = jwt.sign(
                { id: user.id },
                process.env.JWT_SECRET,
                { expiresIn: '24h' }
            );

            res.json({ token });
        } catch (error) {
            logger.error('Login error:', error);
            res.status(500).json({ error: 'Internal server error' });
        }
    }
);

Common AI Code Smells

1. The Trusting Optimist

Smell: AI assumes every operation succeeds.

// AI often generates:
const data = await fetch(url);
const json = data.json(); // No error check

// Should be:
const response = await fetch(url);
if (!response.ok) throw new Error(`HTTP ${response.status}`);
const json = await response.json();

2. The Magic String Lover

Smell: Hardcoded values everywhere.

// AI generates:
if (user.role === 'admin') { ... }

// Should be:
const ROLES = { ADMIN: 'admin', USER: 'user' };
if (user.role === ROLES.ADMIN) { ... }

3. The Copy-Paste Architect

Smell: Duplicated logic across functions.

AI generates similar code blocks instead of extracting shared functions. Watch for repeated patterns that should be abstracted.

4. The Synchronous Blocker

Smell: Using sync operations in async contexts.

// AI sometimes generates:
const file = fs.readFileSync('huge-file.json'); // Blocks event loop

// Should be:
const file = await fs.promises.readFile('huge-file.json');

Automated Checklist Tools

Use these tools to catch issues automatically:

But don't rely on tools alone. They catch syntax issues, not logical flaws.

The 10-Minute Rule

Spend at least 10 minutes reviewing AI code, regardless of how simple it looks. Most critical bugs hide in "obviously correct" code.

Quick Reference

Review Checklist (Print this):

[ ] Layer 1: Execution (30 sec)

[ ] Layer 2: Patterns (2 min)

[ ] Layer 3: Error Handling (5 min)

[ ] Layer 4: Security (5-10 min)

[ ] Layer 5: Maintainability (5 min)

Fast Fail Signals:

Prompt for AI to review its own code:

"Review this code for:
1. Error handling gaps
2. Security vulnerabilities
3. Edge cases not handled
4. Performance issues
5. Maintainability concerns

Be critical. What could go wrong?"