A data-driven field guide for AI/Data engineering careers in India — built from real job postings, Reddit threads, HackerNews discussions, and Dev.to articles.
Inspired by alexeygrigorev/ai-engineering-field-guide — India-specific fork with real scraped data.
Real data from 378 job postings (Naukri, Indeed, Glassdoor, Adzuna) collected March 2026.
| Metric | Finding |
|---|---|
| #1 role in India | Data Engineer (unlike global where AI Engineer leads) |
| Non-negotiable skills | Python (64%) and SQL (52%) |
| Fastest growing pattern | RAG — 30.8% of postings |
| Cloud leader | AWS > Azure > GCP |
| Top city | Bengaluru — 43% of all jobs |
| GenAI adoption | 37% of roles work directly on AI/ML |
| Source | Items | Type |
|---|---|---|
| Naukri | 250 jobs | Job postings |
| Indeed | 82 jobs | Job postings |
| Glassdoor | 1 job | Job postings |
| Adzuna | 45 jobs | Job postings |
| Reddit (Arctic Shift) | 412 posts/comments | Interview experiences |
| HackerNews | 67 items | Hiring + interviews |
| Dev.to | 91 articles | Technical content |
├── job-market/
│ ├── src/
│ │ ├── collect/ # All scrapers
│ │ │ ├── collect_naukri.py
│ │ │ ├── collect_indeed.py
│ │ │ ├── collect_glassdoor.py
│ │ │ ├── collect_adzuna.py
│ │ │ ├── collect_reddit.py # Arctic Shift API
│ │ │ ├── collect_hackernews.py # HN Firebase + Algolia
│ │ │ ├── collect_devto.py # Dev.to API
│ │ │ └── collect_fresher.py # Fresher/junior roles
│ │ ├── process/
│ │ │ └── process_jobs.py # Dedup, seniority, role classification
│ │ └── analyze/
│ │ ├── reload_cache.py # Load JD cache → parquet
│ │ ├── skills_baseline.py # Skill extraction
│ │ ├── enrich_jobs.py # Post-processing enrichment
│ │ ├── fix_role_family.py # Role classifier
│ │ ├── normalize_titles.py # Title cleaning
│ │ ├── generate_report.py # India market report
│ │ └── generate_job_descriptions.py # YAML from real data
│ └── app/
│ └── streamlit_app.py # Interactive dashboard
│
├── interview/
│ └── data/
│ ├── sources/ # Curated resources
│ │ ├── india-sources.md # Books, blogs, courses (India)
│ │ └── devto-articles-*.md # Auto-collected Dev.to articles
│ ├── job-descriptions/ # Company hiring profiles
│ │ ├── india-companies-generated.yaml # Auto-generated from scraped data
│ │ ├── india-companies-summary.md
│ │ └── india-companies.yaml # Hand-curated: Swiggy, Flipkart, Razorpay...
│ └── research-exports/ # Raw interview data
│ ├── india-interview-patterns.md
│ ├── reddit/ # Reddit interview experiences
│ └── hackernews/ # HN interview experiences
│
├── data/
│ ├── raw/ # Raw scraped data (JSONL)
│ │ ├── naukri/
│ │ ├── indeed/
│ │ ├── glassdoor/
│ │ ├── adzuna/
│ │ ├── reddit/
│ │ ├── hackernews/
│ │ └── devto/
│ └── processed/
│ ├── jobs.parquet
│ ├── jobs_with_skills.parquet
│ └── jd_cache/ # Fetched job descriptions
│
└── docs/
├── india-market-analysis.md # Main report
└── charts/ # Analysis charts (14 PNG files)
python -m venv venv
venv\Scripts\activate
pip install playwright pandas pyarrow requests pyyaml matplotlib seaborn
playwright install chromium# 1. Collect jobs
python job-market\src\collect\collect_naukri.py
python job-market\src\collect\collect_indeed.py
python job-market\src\collect\collect_adzuna.py
python job-market\src\collect\collect_fresher.py
# 2. Collect community data (no auth needed)
python job-market\src\collect\collect_reddit.py
python job-market\src\collect\collect_hackernews.py
python job-market\src\collect\collect_devto.py
# 3. Process
python job-market\src\process\process_jobs.py
python job-market\src\analyze\reload_cache.py
python job-market\src\analyze\fix_role_family.py
python job-market\src\analyze\enrich_jobs.py
# 4. Generate outputs
python job-market\src\analyze\generate_report.py
python job-market\src\analyze\generate_job_descriptions.py
# 5. Run dashboard
streamlit run job-market\app\streamlit_app.pypython job-market\src\analyze\fetch_jd.py| Skill | Data Engineer | Data Analyst | ML Engineer | Data Scientist | GenAI Engineer |
|---|---|---|---|---|---|
| Python | ✅ | ✅ | ✅ | ✅ | ✅ |
| SQL | ✅ | ✅ | ⬜ | ✅ | ⬜ |
| Apache Spark | ✅ | ⬜ | ⬜ | ⬜ | ⬜ |
| Apache Kafka | ✅ | ⬜ | ⬜ | ⬜ | ⬜ |
| AWS/Azure/GCP | ✅ | ⬜ | ✅ | ⬜ | ✅ |
| PyTorch/TF | ⬜ | ⬜ | ✅ | ✅ | ⬜ |
| RAG/LangChain | ⬜ | ⬜ | ⬜ | ⬜ | ✅ |
| Power BI/Tableau | ⬜ | ✅ | ⬜ | ⬜ | ⬜ |
| Aspect | India | Global |
|---|---|---|
| Most in-demand role | Data Engineer | AI Engineer |
| Entry-level openings | High (63% of DA roles) | Low |
| GenAI adoption | 37% | 55%+ |
| Salary transparency | Very low | Moderate |
| Remote work | ~10% | ~35% |
| Top hiring company type | Service (TCS, Capgemini) | Product/Startup |
- Service companies (TCS, Infosys, Capgemini) — bulk of fresher hiring, structured growth
- GCCs (Walmart, JPMorgan, Barclays) — best pay, US/UK work culture, 4-8 week process
- Unicorns (Swiggy, Flipkart, Razorpay) — best tech stack, ownership culture, 2-3 week process
- Startups (Krutrim, Sarvam, AryaXAI) — cutting-edge AI, ESOPs, fast hiring
Hand-curated + auto-collected interview experiences from:
- Reddit — r/developersIndia, r/indianstartups (412 posts/comments)
- HackerNews — "Who is hiring?" threads + Algolia search (67 items)
- Company profiles — Swiggy, Flipkart, Razorpay, PhonePe, JPMorgan, TCS, Google India, Krutrim, Sarvam
See interview/data/research-exports/ for full interview patterns.
Found a mistake? Know something that's changed? PRs welcome.
- Add your interview experience →
interview/data/research-exports/ - Add a company profile →
interview/data/job-descriptions/india-companies.yaml - Fix a skill mapping →
job-market/src/analyze/reload_cache.py
- Role pages: ML Engineer, GenAI Engineer, Data Scientist, MLOps
- Interview prep guides per role
- GitHub Pages public site
- AmbitionBox scraper for India-specific interview reviews
- Salary data from community submissions
- Quarterly refresh automation
Data collected March 2026. Re-run scrapers for fresh data.
Not affiliated with any job board or company mentioned.