A practical SOP for engineering with Cursor and Claude Code, distilled from leading legacy system migrations and building production AI products. Not a hype piece — these are the actual patterns I use daily.
In the past 18 months, I migrated 2 outsourced C# legacy systems to Python and shipped a production Text2SQL engine. Most of the heavy lifting was done with AI tools, not in spite of them.
This repo documents what actually works, what fails silently, and the patterns I wish I'd known on day 1.
It's not a tutorial. It's an honest field manual.
| Tool | Role | Why |
|---|---|---|
| Cursor | Primary editor | Multi-file edit, in-line context, agentic mode for short tasks |
| Claude Code | Standalone agent | Deeper investigations, multi-step refactors, terminal automation |
| Claude Opus / Sonnet | Heavy reasoning | Architecture, debugging, code review |
| GPT-4o | Code generation | Faster, cheaper for boilerplate |
| pytest + ruff + mypy | Validation layer | AI generates → tests catch hallucinations |
.cursorrules / CLAUDE.md |
Project memory | Consistent context across sessions |
1. Write the test first (you, not AI)
- Forces you to define the contract
- Gives AI a target to satisfy
2. Sketch the function signature + docstring
- "def text2sql(question: str, schema: dict) -> str"
3. Cursor → Cmd-K → "implement this function"
- Or Claude Code if multi-file
4. Run tests immediately
- Don't trust AI output without execution
5. Refine via prompt → re-run tests
- Iterate, don't rewrite from scratch
Key insight: AI is a fast pair-programmer, not an architect. You decide what to build; AI helps you build it faster.
This is where AI tools 10x your output. Here's the playbook from migrating C# → Python:
Phase 1: Reconnaissance (1 day)
├─ Boot original system
├─ Probe via Swagger UI / curl every endpoint
├─ Capture request/response samples to JSON fixtures
└─ Note observable behavior (don't read code yet)
Phase 2: AI-Assisted Code Reading (2-3 days)
├─ Feed controller code + fixtures to Cursor
├─ Prompt: "Summarize the business logic of this endpoint
based on the code AND these observed I/O samples"
├─ Cross-validate AI summary vs your manual reading of edge cases
└─ Output: functional spec doc per endpoint
Phase 3: Spec → Tests (2 days)
├─ Generate pytest tests from the spec doc
├─ Use captured fixtures as input/expected pairs
└─ All tests should fail (no impl yet)
Phase 4: TDD Implementation (1-2 weeks)
├─ Cursor implements function-by-function
├─ Tests catch hallucinations
├─ When ambiguous, refer back to original C# (single source of truth)
└─ Output: green tests + parallel-run validation
Phase 5: Parallel Validation (1-2 weeks)
├─ Mirror production traffic to both old and new
├─ diff outputs daily
└─ Investigate every diff before cutover
My actual numbers: 3 weeks total for 2 projects that nobody else wanted to touch. Without AI: estimated 2-3 months.
Before pushing:
$ git diff main...HEAD | claude "Review this diff for:
1. Security risks (SQL injection, auth)
2. Edge cases not handled
3. Performance smells
4. Style consistency with rest of codebase"
AI catches ~30% of issues a human reviewer would, in 30 seconds. Not a replacement for human review — a first pass filter.
1. Paste error log + relevant code into Claude Code
2. Prompt: "Hypothesize 3 causes ranked by likelihood, with how to verify each"
3. Verify hypothesis 1 → if wrong, move to 2
4. Don't ask AI for the fix until you've confirmed root cause
Why: AI loves to suggest fixes for the symptom. Diagnosis first, treatment second.
After feature complete:
$ claude "Generate API docs for the routes in src/api/text2sql.py.
Include: endpoint, params, response schema, example curl,
common errors. Match the style of docs/api/auth.md."
Saves 1-2 hours per feature. Always review — AI gets parameter types wrong ~10% of the time.
- 2 outsourced C# projects
- No documentation, no original developers reachable
- Production traffic, can't take down
- Estimated 2-3 months by traditional approach
Used the 5-phase playbook above. Total elapsed: 3 weeks.
- AI was best at "reading and summarizing", not at "writing equivalent code"
- Step 2 (recon + summarize) saved the most time
- Step 4 (rewrite) still needed substantial human effort
- Stored procedures were the hardest
- SQL logic mixed with business rules, hard for AI to disentangle
- Solved by asking AI to convert SP → pseudocode first, then human → Python
- Test fixtures were gold
- 40 captured request/response pairs caught 80% of hallucinations
- Investing 1 day in fixture capture saved a week of debugging
- Capture more diverse fixtures (we missed some edge cases)
- Run parallel validation longer (we cut over after 2 weeks; should have been 4)
- Set up shadow logging earlier
"Walk me through how data flows from [endpoint X] to [database Y].
List every transformation, validation, and side effect in order.
Flag anything that looks fragile."
"Given this error trace and these 3 candidate files,
hypothesize the top 3 root causes ranked by likelihood.
For each, tell me what to check to confirm or rule it out."
"Refactor this function to:
1. [Specific goal: e.g., reduce nested conditionals]
2. Preserve exact behavior (current tests must pass)
3. Match the style of [reference file]"
"Generate pytest tests for this function covering:
- Happy path
- Boundary inputs (empty, max, min)
- Invalid inputs
- The edge case I described in the docstring"
"Write API docs for these routes matching the style of [reference].
For each endpoint: summary, params (with types), response schema,
example curl, common 4xx errors."
Symptom: Asking AI to write features without writing tests first. Result: Hallucinated APIs, silent failures, regressions you find in production. Fix: Test-first or test-immediately-after. No exceptions for "small" changes.
Symptom: AI confidently writes auth logic that has subtle bugs. Result: Security holes that pass review because the code "looks right". Fix: For auth/crypto/SQL — AI drafts, human re-derives from first principles.
Symptom: One 2000-word prompt with 15 requirements. Result: AI satisfies 11/15, you don't notice the 4 it skipped. Fix: Decompose. Each prompt → one outcome → verify → next prompt.
Symptom: AI imports pandas.read_excel_with_formatting() (doesn't exist).
Result: Code that looks plausible, fails at import.
Fix: Validate every import / API call against actual library docs. Run before trusting.
Symptom: "Let me clean this up" → submitting a 500-line diff with no tests. Result: Subtle behavior changes nobody notices until production breaks. Fix: Tests first. If tests don't exist, write them before refactoring.
Symptom: AI suggests patterns that conflict with your project's conventions.
Result: Inconsistent codebase, reviewer fatigue.
Fix: Use .cursorrules / CLAUDE.md to encode project conventions. Reference style files in prompts.
-
AI amplifies your taste. The better you are at recognizing good code, the more value AI gives. Conversely, AI gives juniors confidence without competence.
-
Speed comes from removing your bottlenecks, not from typing faster. AI helps you read code faster (huge), debug faster (medium), write boilerplate faster (small). Optimize for the first.
-
The best prompts include constraints, not just goals. "Implement X" is weak. "Implement X using only the imports in this file, returning Y type, raising Z on invalid input" is strong.
-
Always run before trusting. Plausible ≠ correct. If you can't run it (e.g., no test infra), AI isn't ready to help on that task.
-
Multi-file context matters. Cursor's tabbed file context and Claude Code's project loading both significantly outperform pasting snippets into a chat.
-
Stack-specific knowledge is uneven. AI is great at Python/JS/TS, decent at C#/Go, weak at Rust traits and complex SQL. Adjust your trust by language.
-
AI is not a knowledge base. It hallucinates package versions, API signatures, and historical facts. Use docs as ground truth.
-
Pair AI with linters and type checkers. ruff/mypy/eslint catch ~50% of AI hallucinations automatically. Don't skip them.
-
Document AI usage in commit messages. Future-you (or your team) wants to know which parts were AI-generated for review depth.
-
The meta-skill is prompt iteration, not prompt engineering. Don't perfect a prompt in your head — send a draft, see what comes back, refine. Iteration speed > prompt quality.
- Enable Auto-import for the language
- Configure Cmd-K context to include current file + open tabs
- Add
.cursorrulesto project root
- Install via
npm install -g @anthropic-ai/claude-code - Run
claudein project root → it readsCLAUDE.mdfor context - Enable
--dangerously-skip-permissionsonly in throwaway environments
# Project: [Name]
# Stack: Python 3.11, FastAPI, PostgreSQL, Pydantic v2
## Conventions
- Use type hints on all function signatures
- Prefer Pydantic models over dicts for API contracts
- Async functions for I/O, sync for pure logic
- Test file structure mirrors src/ structure
## Style
- Match existing imports order: stdlib → third-party → local
- Docstrings: Google style, only for public functions
- No nested ternaries
- Max line length 100
## Don'ts
- Don't add print() — use logger
- Don't catch bare except
- Don't import *
- Don't suggest libraries not already in pyproject.toml
# Project Context
## What this project does
[1-2 sentences]
## Key files
- src/api/: HTTP routes (FastAPI)
- src/services/: Business logic
- src/repos/: DB access (SQLAlchemy)
- tests/: Mirror of src structure
## Running tests
pytest tests/ -v
## Common workflows
- Add new endpoint: route in src/api/, service in src/services/, test in tests/api/
- Database migration: alembic revision --autogenerate -m "..."
## Things AI commonly gets wrong here
- Don't use `from sqlalchemy import declarative_base` (deprecated, use `from sqlalchemy.orm import DeclarativeBase`)
- Don't suggest pydantic v1 syntax (we're on v2)
- Don't import from src/api/ in src/services/ (circular)The best engineers I know aren't the ones who avoid AI tools — they're the ones who use them aggressively while keeping their judgment sharp.
If you're early in adopting these tools: don't try to "let AI do it all". Use it for reading, summarizing, boilerplate, and first drafts. Keep architecture, security, and final review in human hands.
The market is already separating people who AI-amplify into 5x engineers from people who AI-rely into juniors-with-a-co-pilot. Pick the first path.
Alex (Li-Feng Lin) — Senior Backend Engineer, AI & Cloud · Taipei, Taiwan
- Currently leading Navi 2.0 (.NET → Python / Cloud-Native) at Gaia Information
- 5+ years backend, recent focus on LLM integration (Text2SQL, RAG, multi-cloud)
- AZ-204, AZ-900 certified
- 📧 ko1314520ya@gmail.com
- 🔗 github.com/D11225687
"The best engineers don't just adapt to change — they lead it."
If this was useful, ⭐ the repo. PRs welcome — especially counter-examples where my advice fails.