🇮🇳 India AI Engineering Field Guide

A data-driven field guide for AI/Data engineering careers in India — built from real job postings, Reddit threads, HackerNews discussions, and Dev.to articles.

Inspired by alexeygrigorev/ai-engineering-field-guide — India-specific fork with real scraped data.

What's Inside

📊 Job Market Analysis

Real data from 378 job postings (Naukri, Indeed, Glassdoor, Adzuna) collected March 2026.

Metric	Finding
#1 role in India	Data Engineer (unlike global where AI Engineer leads)
Non-negotiable skills	Python (64%) and SQL (52%)
Fastest growing pattern	RAG — 30.8% of postings
Cloud leader	AWS > Azure > GCP
Top city	Bengaluru — 43% of all jobs
GenAI adoption	37% of roles work directly on AI/ML

🗂️ Data Sources

Source	Items	Type
Naukri	250 jobs	Job postings
Indeed	82 jobs	Job postings
Glassdoor	1 job	Job postings
Adzuna	45 jobs	Job postings
Reddit (Arctic Shift)	412 posts/comments	Interview experiences
HackerNews	67 items	Hiring + interviews
Dev.to	91 articles	Technical content

Repo Structure

├── job-market/
│   ├── src/
│   │   ├── collect/          # All scrapers
│   │   │   ├── collect_naukri.py
│   │   │   ├── collect_indeed.py
│   │   │   ├── collect_glassdoor.py
│   │   │   ├── collect_adzuna.py
│   │   │   ├── collect_reddit.py       # Arctic Shift API
│   │   │   ├── collect_hackernews.py   # HN Firebase + Algolia
│   │   │   ├── collect_devto.py        # Dev.to API
│   │   │   └── collect_fresher.py      # Fresher/junior roles
│   │   ├── process/
│   │   │   └── process_jobs.py         # Dedup, seniority, role classification
│   │   └── analyze/
│   │       ├── reload_cache.py         # Load JD cache → parquet
│   │       ├── skills_baseline.py      # Skill extraction
│   │       ├── enrich_jobs.py          # Post-processing enrichment
│   │       ├── fix_role_family.py      # Role classifier
│   │       ├── normalize_titles.py     # Title cleaning
│   │       ├── generate_report.py      # India market report
│   │       └── generate_job_descriptions.py  # YAML from real data
│   └── app/
│       └── streamlit_app.py            # Interactive dashboard
│
├── interview/
│   └── data/
│       ├── sources/                    # Curated resources
│       │   ├── india-sources.md        # Books, blogs, courses (India)
│       │   └── devto-articles-*.md     # Auto-collected Dev.to articles
│       ├── job-descriptions/           # Company hiring profiles
│       │   ├── india-companies-generated.yaml   # Auto-generated from scraped data
│       │   ├── india-companies-summary.md
│       │   └── india-companies.yaml    # Hand-curated: Swiggy, Flipkart, Razorpay...
│       └── research-exports/           # Raw interview data
│           ├── india-interview-patterns.md
│           ├── reddit/                 # Reddit interview experiences
│           └── hackernews/             # HN interview experiences
│
├── data/
│   ├── raw/                            # Raw scraped data (JSONL)
│   │   ├── naukri/
│   │   ├── indeed/
│   │   ├── glassdoor/
│   │   ├── adzuna/
│   │   ├── reddit/
│   │   ├── hackernews/
│   │   └── devto/
│   └── processed/
│       ├── jobs.parquet
│       ├── jobs_with_skills.parquet
│       └── jd_cache/                   # Fetched job descriptions
│
└── docs/
    ├── india-market-analysis.md        # Main report
    └── charts/                         # Analysis charts (14 PNG files)

Quick Start

Prerequisites

python -m venv venv
venv\Scripts\activate
pip install playwright pandas pyarrow requests pyyaml matplotlib seaborn
playwright install chromium

Run the full pipeline

# 1. Collect jobs
python job-market\src\collect\collect_naukri.py
python job-market\src\collect\collect_indeed.py
python job-market\src\collect\collect_adzuna.py
python job-market\src\collect\collect_fresher.py

# 2. Collect community data (no auth needed)
python job-market\src\collect\collect_reddit.py
python job-market\src\collect\collect_hackernews.py
python job-market\src\collect\collect_devto.py

# 3. Process
python job-market\src\process\process_jobs.py
python job-market\src\analyze\reload_cache.py
python job-market\src\analyze\fix_role_family.py
python job-market\src\analyze\enrich_jobs.py

# 4. Generate outputs
python job-market\src\analyze\generate_report.py
python job-market\src\analyze\generate_job_descriptions.py

# 5. Run dashboard
streamlit run job-market\app\streamlit_app.py

Fetch full job descriptions (slow, ~30 mins)

python job-market\src\analyze\fetch_jd.py

Key Findings

Skills by Role (India, March 2026)

Skill	Data Engineer	Data Analyst	ML Engineer	Data Scientist	GenAI Engineer
Python	✅	✅	✅	✅	✅
SQL	✅	✅	⬜	✅	⬜
Apache Spark	✅	⬜	⬜	⬜	⬜
Apache Kafka	✅	⬜	⬜	⬜	⬜
AWS/Azure/GCP	✅	⬜	✅	⬜	✅
PyTorch/TF	⬜	⬜	✅	✅	⬜
RAG/LangChain	⬜	⬜	⬜	⬜	✅
Power BI/Tableau	⬜	✅	⬜	⬜	⬜

India vs Global

Aspect	India	Global
Most in-demand role	Data Engineer	AI Engineer
Entry-level openings	High (63% of DA roles)	Low
GenAI adoption	37%	55%+
Salary transparency	Very low	Moderate
Remote work	~10%	~35%
Top hiring company type	Service (TCS, Capgemini)	Product/Startup

Company Types Hiring

Service companies (TCS, Infosys, Capgemini) — bulk of fresher hiring, structured growth
GCCs (Walmart, JPMorgan, Barclays) — best pay, US/UK work culture, 4-8 week process
Unicorns (Swiggy, Flipkart, Razorpay) — best tech stack, ownership culture, 2-3 week process
Startups (Krutrim, Sarvam, AryaXAI) — cutting-edge AI, ESOPs, fast hiring

Interview Data

Hand-curated + auto-collected interview experiences from:

Reddit — r/developersIndia, r/indianstartups (412 posts/comments)
HackerNews — "Who is hiring?" threads + Algolia search (67 items)
Company profiles — Swiggy, Flipkart, Razorpay, PhonePe, JPMorgan, TCS, Google India, Krutrim, Sarvam

See interview/data/research-exports/ for full interview patterns.

Contributing

Found a mistake? Know something that's changed? PRs welcome.

Add your interview experience → interview/data/research-exports/
Add a company profile → interview/data/job-descriptions/india-companies.yaml
Fix a skill mapping → job-market/src/analyze/reload_cache.py

Roadmap

Role pages: ML Engineer, GenAI Engineer, Data Scientist, MLOps
Interview prep guides per role
GitHub Pages public site
AmbitionBox scraper for India-specific interview reviews
Salary data from community submissions
Quarterly refresh automation

Data collected March 2026. Re-run scrapers for fresh data.
Not affiliated with any job board or company mentioned.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
docs		docs
interview/data		interview/data
job-market		job-market
role		role
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Makefile		Makefile
README.md		README.md
check_models_Version23.py		check_models_Version23.py
check_resume.py		check_resume.py
collect_company_pages.py		collect_company_pages.py
collect_wellfound.py		collect_wellfound.py
data-analyst-india.md		data-analyst-india.md
data-engineer-india.md		data-engineer-india.md
data_sources.md		data_sources.md
debug_quota_Version23.py		debug_quota_Version23.py
debug_skills.py		debug_skills.py
diag.py		diag.py
env		env
generate_sample_data.py		generate_sample_data.py
job_schema.md		job_schema.md
ngrok.exe		ngrok.exe
process_jobs.py		process_jobs.py
setup_repo.ps1		setup_repo.ps1
skills_baseline.py		skills_baseline.py
skills_llm.py		skills_llm.py
sql-python-rounds.md		sql-python-rounds.md
streamlit_app.py		streamlit_app.py
test_adzuna_env.py		test_adzuna_env.py
test_gemini.py		test_gemini.py
test_groq.py		test_groq.py
test_openrouter.py		test_openrouter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🇮🇳 India AI Engineering Field Guide

What's Inside

📊 Job Market Analysis

🗂️ Data Sources

Repo Structure

Quick Start

Prerequisites

Run the full pipeline

Fetch full job descriptions (slow, ~30 mins)

Key Findings

Skills by Role (India, March 2026)

India vs Global

Company Types Hiring

Interview Data

Contributing

Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🇮🇳 India AI Engineering Field Guide

What's Inside

📊 Job Market Analysis

🗂️ Data Sources

Repo Structure

Quick Start

Prerequisites

Run the full pipeline

Fetch full job descriptions (slow, ~30 mins)

Key Findings

Skills by Role (India, March 2026)

India vs Global

Company Types Hiring

Interview Data

Contributing

Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages