AI-Driven Development Workflow

A practical SOP for engineering with Cursor and Claude Code, distilled from leading legacy system migrations and building production AI products. Not a hype piece — these are the actual patterns I use daily.

Why This Repo?

In the past 18 months, I migrated 2 outsourced C# legacy systems to Python and shipped a production Text2SQL engine. Most of the heavy lifting was done with AI tools, not in spite of them.

This repo documents what actually works, what fails silently, and the patterns I wish I'd known on day 1.

It's not a tutorial. It's an honest field manual.

My Stack

Tool	Role	Why
Cursor	Primary editor	Multi-file edit, in-line context, agentic mode for short tasks
Claude Code	Standalone agent	Deeper investigations, multi-step refactors, terminal automation
Claude Opus / Sonnet	Heavy reasoning	Architecture, debugging, code review
GPT-4o	Code generation	Faster, cheaper for boilerplate
pytest + ruff + mypy	Validation layer	AI generates → tests catch hallucinations
`.cursorrules` / `CLAUDE.md`	Project memory	Consistent context across sessions

Core Workflow Patterns

Pattern 1: Greenfield Feature Development

1. Write the test first (you, not AI)
   - Forces you to define the contract
   - Gives AI a target to satisfy

2. Sketch the function signature + docstring
   - "def text2sql(question: str, schema: dict) -> str"

3. Cursor → Cmd-K → "implement this function"
   - Or Claude Code if multi-file

4. Run tests immediately
   - Don't trust AI output without execution

5. Refine via prompt → re-run tests
   - Iterate, don't rewrite from scratch

Key insight: AI is a fast pair-programmer, not an architect. You decide what to build; AI helps you build it faster.

Pattern 2: Legacy Refactoring (the killer use case)

This is where AI tools 10x your output. Here's the playbook from migrating C# → Python:

Phase 1: Reconnaissance (1 day)
   ├─ Boot original system
   ├─ Probe via Swagger UI / curl every endpoint
   ├─ Capture request/response samples to JSON fixtures
   └─ Note observable behavior (don't read code yet)

Phase 2: AI-Assisted Code Reading (2-3 days)
   ├─ Feed controller code + fixtures to Cursor
   ├─ Prompt: "Summarize the business logic of this endpoint
              based on the code AND these observed I/O samples"
   ├─ Cross-validate AI summary vs your manual reading of edge cases
   └─ Output: functional spec doc per endpoint

Phase 3: Spec → Tests (2 days)
   ├─ Generate pytest tests from the spec doc
   ├─ Use captured fixtures as input/expected pairs
   └─ All tests should fail (no impl yet)

Phase 4: TDD Implementation (1-2 weeks)
   ├─ Cursor implements function-by-function
   ├─ Tests catch hallucinations
   ├─ When ambiguous, refer back to original C# (single source of truth)
   └─ Output: green tests + parallel-run validation

Phase 5: Parallel Validation (1-2 weeks)
   ├─ Mirror production traffic to both old and new
   ├─ diff outputs daily
   └─ Investigate every diff before cutover

My actual numbers: 3 weeks total for 2 projects that nobody else wanted to touch. Without AI: estimated 2-3 months.

Pattern 3: Code Review with AI

Before pushing:
  $ git diff main...HEAD | claude "Review this diff for:
                                    1. Security risks (SQL injection, auth)
                                    2. Edge cases not handled
                                    3. Performance smells
                                    4. Style consistency with rest of codebase"

AI catches ~30% of issues a human reviewer would, in 30 seconds. Not a replacement for human review — a first pass filter.

Pattern 4: Production Debugging

1. Paste error log + relevant code into Claude Code
2. Prompt: "Hypothesize 3 causes ranked by likelihood, with how to verify each"
3. Verify hypothesis 1 → if wrong, move to 2
4. Don't ask AI for the fix until you've confirmed root cause

Why: AI loves to suggest fixes for the symptom. Diagnosis first, treatment second.

Pattern 5: Documentation Generation

After feature complete:
  $ claude "Generate API docs for the routes in src/api/text2sql.py.
            Include: endpoint, params, response schema, example curl,
            common errors. Match the style of docs/api/auth.md."

Saves 1-2 hours per feature. Always review — AI gets parameter types wrong ~10% of the time.

Case Study: Legacy C# → Python Migration

Context

2 outsourced C# projects
No documentation, no original developers reachable
Production traffic, can't take down
Estimated 2-3 months by traditional approach

Approach

Used the 5-phase playbook above. Total elapsed: 3 weeks.

What Surprised Me

AI was best at "reading and summarizing", not at "writing equivalent code"
- Step 2 (recon + summarize) saved the most time
- Step 4 (rewrite) still needed substantial human effort
Stored procedures were the hardest
- SQL logic mixed with business rules, hard for AI to disentangle
- Solved by asking AI to convert SP → pseudocode first, then human → Python
Test fixtures were gold
- 40 captured request/response pairs caught 80% of hallucinations
- Investing 1 day in fixture capture saved a week of debugging

What I'd Do Differently

Capture more diverse fixtures (we missed some edge cases)
Run parallel validation longer (we cut over after 2 weeks; should have been 4)
Set up shadow logging earlier

Daily Prompt Templates

For Architecture Analysis

"Walk me through how data flows from [endpoint X] to [database Y].
List every transformation, validation, and side effect in order.
Flag anything that looks fragile."

For Bug Investigation

"Given this error trace and these 3 candidate files,
hypothesize the top 3 root causes ranked by likelihood.
For each, tell me what to check to confirm or rule it out."

For Refactoring

"Refactor this function to:
1. [Specific goal: e.g., reduce nested conditionals]
2. Preserve exact behavior (current tests must pass)
3. Match the style of [reference file]"

For Test Generation

"Generate pytest tests for this function covering:
- Happy path
- Boundary inputs (empty, max, min)
- Invalid inputs
- The edge case I described in the docstring"

For Documentation

"Write API docs for these routes matching the style of [reference].
For each endpoint: summary, params (with types), response schema,
example curl, common 4xx errors."

Anti-Patterns to Avoid

❌ "Vibe coding" without tests

Symptom: Asking AI to write features without writing tests first. Result: Hallucinated APIs, silent failures, regressions you find in production. Fix: Test-first or test-immediately-after. No exceptions for "small" changes.

❌ Trusting AI on critical paths (Auth, Crypto, SQL)

Symptom: AI confidently writes auth logic that has subtle bugs. Result: Security holes that pass review because the code "looks right". Fix: For auth/crypto/SQL — AI drafts, human re-derives from first principles.

❌ Long prompts > multiple short prompts

Symptom: One 2000-word prompt with 15 requirements. Result: AI satisfies 11/15, you don't notice the 4 it skipped. Fix: Decompose. Each prompt → one outcome → verify → next prompt.

❌ Letting AI invent APIs

Symptom: AI imports pandas.read_excel_with_formatting() (doesn't exist). Result: Code that looks plausible, fails at import. Fix: Validate every import / API call against actual library docs. Run before trusting.

❌ Refactoring without baseline tests

Symptom: "Let me clean this up" → submitting a 500-line diff with no tests. Result: Subtle behavior changes nobody notices until production breaks. Fix: Tests first. If tests don't exist, write them before refactoring.

❌ Forgetting AI doesn't know your codebase

Symptom: AI suggests patterns that conflict with your project's conventions. Result: Inconsistent codebase, reviewer fatigue. Fix: Use .cursorrules / CLAUDE.md to encode project conventions. Reference style files in prompts.

Lessons Learned

AI amplifies your taste. The better you are at recognizing good code, the more value AI gives. Conversely, AI gives juniors confidence without competence.
Speed comes from removing your bottlenecks, not from typing faster. AI helps you read code faster (huge), debug faster (medium), write boilerplate faster (small). Optimize for the first.
The best prompts include constraints, not just goals. "Implement X" is weak. "Implement X using only the imports in this file, returning Y type, raising Z on invalid input" is strong.
Always run before trusting. Plausible ≠ correct. If you can't run it (e.g., no test infra), AI isn't ready to help on that task.
Multi-file context matters. Cursor's tabbed file context and Claude Code's project loading both significantly outperform pasting snippets into a chat.
Stack-specific knowledge is uneven. AI is great at Python/JS/TS, decent at C#/Go, weak at Rust traits and complex SQL. Adjust your trust by language.
AI is not a knowledge base. It hallucinates package versions, API signatures, and historical facts. Use docs as ground truth.
Pair AI with linters and type checkers. ruff/mypy/eslint catch ~50% of AI hallucinations automatically. Don't skip them.
Document AI usage in commit messages. Future-you (or your team) wants to know which parts were AI-generated for review depth.
The meta-skill is prompt iteration, not prompt engineering. Don't perfect a prompt in your head — send a draft, see what comes back, refine. Iteration speed > prompt quality.

Setup & Configuration

Cursor Setup

Enable Auto-import for the language
Configure Cmd-K context to include current file + open tabs
Add .cursorrules to project root

Claude Code Setup

Install via npm install -g @anthropic-ai/claude-code
Run claude in project root → it reads CLAUDE.md for context
Enable --dangerously-skip-permissions only in throwaway environments

`.cursorrules` Template (drop in project root)

# Project: [Name]
# Stack: Python 3.11, FastAPI, PostgreSQL, Pydantic v2

## Conventions
- Use type hints on all function signatures
- Prefer Pydantic models over dicts for API contracts
- Async functions for I/O, sync for pure logic
- Test file structure mirrors src/ structure

## Style
- Match existing imports order: stdlib → third-party → local
- Docstrings: Google style, only for public functions
- No nested ternaries
- Max line length 100

## Don'ts
- Don't add print() — use logger
- Don't catch bare except
- Don't import * 
- Don't suggest libraries not already in pyproject.toml

`CLAUDE.md` Template (for Claude Code)

# Project Context

## What this project does
[1-2 sentences]

## Key files
- src/api/: HTTP routes (FastAPI)
- src/services/: Business logic
- src/repos/: DB access (SQLAlchemy)
- tests/: Mirror of src structure

## Running tests
pytest tests/ -v

## Common workflows
- Add new endpoint: route in src/api/, service in src/services/, test in tests/api/
- Database migration: alembic revision --autogenerate -m "..."

## Things AI commonly gets wrong here
- Don't use `from sqlalchemy import declarative_base` (deprecated, use `from sqlalchemy.orm import DeclarativeBase`)
- Don't suggest pydantic v1 syntax (we're on v2)
- Don't import from src/api/ in src/services/ (circular)

Closing

The best engineers I know aren't the ones who avoid AI tools — they're the ones who use them aggressively while keeping their judgment sharp.

If you're early in adopting these tools: don't try to "let AI do it all". Use it for reading, summarizing, boilerplate, and first drafts. Keep architecture, security, and final review in human hands.

The market is already separating people who AI-amplify into 5x engineers from people who AI-rely into juniors-with-a-co-pilot. Pick the first path.

About Me

Alex (Li-Feng Lin) — Senior Backend Engineer, AI & Cloud · Taipei, Taiwan

Currently leading Navi 2.0 (.NET → Python / Cloud-Native) at Gaia Information
5+ years backend, recent focus on LLM integration (Text2SQL, RAG, multi-cloud)
AZ-204, AZ-900 certified
📧 ko1314520ya@gmail.com
🔗 github.com/D11225687

"The best engineers don't just adapt to change — they lead it."

If this was useful, ⭐ the repo. PRs welcome — especially counter-examples where my advice fails.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

AI-Driven Development Workflow

Why This Repo?

My Stack

Core Workflow Patterns

Pattern 1: Greenfield Feature Development

Pattern 2: Legacy Refactoring (the killer use case)

Pattern 3: Code Review with AI

Pattern 4: Production Debugging

Pattern 5: Documentation Generation

Case Study: Legacy C# → Python Migration

Context

Approach

What Surprised Me

What I'd Do Differently

Daily Prompt Templates

For Architecture Analysis

For Bug Investigation

For Refactoring

For Test Generation

For Documentation

Anti-Patterns to Avoid

❌ "Vibe coding" without tests

❌ Trusting AI on critical paths (Auth, Crypto, SQL)

❌ Long prompts > multiple short prompts

❌ Letting AI invent APIs

❌ Refactoring without baseline tests

❌ Forgetting AI doesn't know your codebase

Lessons Learned

Setup & Configuration

Cursor Setup

Claude Code Setup

.cursorrules Template (drop in project root)

CLAUDE.md Template (for Claude Code)

Closing

About Me

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

`.cursorrules` Template (drop in project root)

`CLAUDE.md` Template (for Claude Code)

Packages