The Problem
AI generates code. It looks clean. Syntax is perfect. Names are reasonable. You accept it. Two weeks later: production crashes, security vulnerabilities, user data lost, 3am debugging session.
The issue: AI-generated code optimizes for "looks good" not "works correctly."
AI doesn't think about edge cases, race conditions, or security implications. It generates plausible code based on patterns, but plausible isn't the same as correct. You need a systematic review process to catch what AI misses.
The Core Insight
AI code requires more scrutiny than human code, not less.
When a senior developer writes code, you can trust they considered edge cases and security. When AI writes code, it optimized for statistical likelihood of being "code-like." It has no mental model of your system's failure modes.
Think of AI output like code from an intern: potentially brilliant, definitely needs review. Your job is to be the senior engineer who catches the landmines before they explode.
The 5-Layer Review Process
Review AI code in layers, from surface to deep:
Layer 1: Does It Run?
Time: 30 seconds
Basic execution test:
- Does it compile/parse?
- Are imports correct?
- Do functions exist?
- Can you run the happy path?
Red flags:
- Imports from non-existent libraries
- Function calls to undefined methods
- Syntax errors (rare but happens)
- Type errors in strongly-typed languages
Layer 2: Does It Follow Patterns?
Time: 2-3 minutes
Architectural consistency check:
- Does it match existing file structure?
- Does it follow naming conventions?
- Does it use established patterns (error handling, logging)?
- Does it integrate with existing modules correctly?
Red flags:
- Reinvents utilities you already have
- Different naming style than rest of codebase
- Ignores existing error handling patterns
- Bypasses established abstractions
Layer 3: Does It Handle Errors?
Time: 5 minutes
The critical layer AI often fails:
| Check | What to Look For | Common AI Failure |
|---|---|---|
| Null/undefined | Guards against null values | Assumes inputs are always valid |
| Network calls | Try/catch on async operations | Happy path only |
| Database ops | Transaction handling, rollbacks | No error recovery |
| User input | Validation and sanitization | Trusts all input |
| File operations | Checks file exists, handles read errors | Assumes success |
Test with bad inputs:
# Ask yourself:
- What if this API returns 500?
- What if the user sends an empty string?
- What if the database connection drops mid-query?
- What if the file doesn't exist?
- What if two users do this simultaneously?
Layer 4: Is It Secure?
Time: 5-10 minutes
AI often generates insecure code because training data includes bad practices:
- SQL Injection: Does it use parameterized queries or string concatenation?
- XSS: Does it sanitize user input before rendering?
- Authentication: Does it check permissions before operations?
- Secrets: Are API keys/passwords hardcoded or environment variables?
- CSRF: Are state-changing operations protected?
AI's Security Blindspot
AI generates "typical" code. Unfortunately, a lot of typical code on the internet is insecure. Always assume AI output needs security hardening.
Layer 5: Is It Maintainable?
Time: 5 minutes
Future-you will need to understand this:
- Comments: Are complex sections explained?
- Function length: Are functions under 50 lines?
- Cyclomatic complexity: Too many nested ifs/loops?
- Magic numbers: Are constants defined with names?
- Testability: Can you unit test this easily?
The Walkthrough: Reviewing an API Endpoint
AI generated this login endpoint. Let's review:
// Generated by AI
app.post('/login', async (req, res) => {
const { email, password } = req.body;
const user = await User.findOne({ email });
if (user.password === password) {
const token = jwt.sign({ id: user.id }, 'secret123');
res.json({ token });
} else {
res.status(401).json({ error: 'Invalid credentials' });
}
});
Layer 1 (Does it run?): ✅ Syntax correct, imports assumed present.
Layer 2 (Patterns?): ⚠️ No input validation middleware. Other routes use it.
Layer 3 (Error handling?): ❌ Critical failures:
- No try/catch on async operations
- No check if user exists (crashes on
user.password) - No database error handling
Layer 4 (Security?): ❌ Multiple vulnerabilities:
- Plaintext password comparison (no hashing)
- JWT secret hardcoded
- No rate limiting (brute force possible)
- Timing attack possible (different response times for valid vs invalid emails)
Layer 5 (Maintainable?): ⚠️ Short but missing comments on security requirements.
Verdict: ❌ Do not ship. Major rewrites needed.
Fixed version:
app.post('/login',
rateLimit({ max: 5, window: '15m' }), // Prevent brute force
validateInput(loginSchema), // Input validation
async (req, res) => {
try {
const { email, password } = req.body;
// Find user
const user = await User.findOne({ email });
if (!user) {
// Same response for invalid email/password (prevent timing attacks)
return res.status(401).json({ error: 'Invalid credentials' });
}
// Compare hashed password
const isValid = await bcrypt.compare(password, user.passwordHash);
if (!isValid) {
return res.status(401).json({ error: 'Invalid credentials' });
}
// Generate token with environment secret
const token = jwt.sign(
{ id: user.id },
process.env.JWT_SECRET,
{ expiresIn: '24h' }
);
res.json({ token });
} catch (error) {
logger.error('Login error:', error);
res.status(500).json({ error: 'Internal server error' });
}
}
);
Common AI Code Smells
1. The Trusting Optimist
Smell: AI assumes every operation succeeds.
// AI often generates:
const data = await fetch(url);
const json = data.json(); // No error check
// Should be:
const response = await fetch(url);
if (!response.ok) throw new Error(`HTTP ${response.status}`);
const json = await response.json();
2. The Magic String Lover
Smell: Hardcoded values everywhere.
// AI generates:
if (user.role === 'admin') { ... }
// Should be:
const ROLES = { ADMIN: 'admin', USER: 'user' };
if (user.role === ROLES.ADMIN) { ... }
3. The Copy-Paste Architect
Smell: Duplicated logic across functions.
AI generates similar code blocks instead of extracting shared functions. Watch for repeated patterns that should be abstracted.
4. The Synchronous Blocker
Smell: Using sync operations in async contexts.
// AI sometimes generates:
const file = fs.readFileSync('huge-file.json'); // Blocks event loop
// Should be:
const file = await fs.promises.readFile('huge-file.json');
Automated Checklist Tools
Use these tools to catch issues automatically:
- ESLint/Pylint: Code style and common bugs
- TypeScript: Type safety catches many AI errors
- SonarQube: Security vulnerabilities and code smells
- Snyk: Dependency vulnerabilities
- Jest/Pytest: Run tests, check coverage
But don't rely on tools alone. They catch syntax issues, not logical flaws.
The 10-Minute Rule
Spend at least 10 minutes reviewing AI code, regardless of how simple it looks. Most critical bugs hide in "obviously correct" code.
Quick Reference
Review Checklist (Print this):
[ ] Layer 1: Execution (30 sec)
- Runs without errors?
- Imports exist?
- Happy path works?
[ ] Layer 2: Patterns (2 min)
- Matches project structure?
- Follows naming conventions?
- Uses existing utilities?
[ ] Layer 3: Error Handling (5 min)
- Null checks present?
- Try/catch on async ops?
- Input validation exists?
- Graceful failure handling?
[ ] Layer 4: Security (5-10 min)
- No SQL injection?
- Input sanitized?
- Auth checks present?
- Secrets in environment?
- CSRF protection?
[ ] Layer 5: Maintainability (5 min)
- Comments on complex logic?
- Functions under 50 lines?
- No magic numbers?
- Testable structure?
Fast Fail Signals:
- Hardcoded secrets → Immediate rejection
- No error handling → Must fix before accepting
- SQL string concatenation → Security risk
- Sync file operations → Performance issue
Prompt for AI to review its own code:
"Review this code for:
1. Error handling gaps
2. Security vulnerabilities
3. Edge cases not handled
4. Performance issues
5. Maintainability concerns
Be critical. What could go wrong?"