Skip to content

joshuvavinith/ai-engineering-field-guide-india

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🇮🇳 India AI Engineering Field Guide

A data-driven field guide for AI/Data engineering careers in India — built from real job postings, Reddit threads, HackerNews discussions, and Dev.to articles.

Inspired by alexeygrigorev/ai-engineering-field-guide — India-specific fork with real scraped data.


What's Inside

📊 Job Market Analysis

Real data from 378 job postings (Naukri, Indeed, Glassdoor, Adzuna) collected March 2026.

Metric Finding
#1 role in India Data Engineer (unlike global where AI Engineer leads)
Non-negotiable skills Python (64%) and SQL (52%)
Fastest growing pattern RAG — 30.8% of postings
Cloud leader AWS > Azure > GCP
Top city Bengaluru — 43% of all jobs
GenAI adoption 37% of roles work directly on AI/ML

🗂️ Data Sources

Source Items Type
Naukri 250 jobs Job postings
Indeed 82 jobs Job postings
Glassdoor 1 job Job postings
Adzuna 45 jobs Job postings
Reddit (Arctic Shift) 412 posts/comments Interview experiences
HackerNews 67 items Hiring + interviews
Dev.to 91 articles Technical content

Repo Structure

├── job-market/
│   ├── src/
│   │   ├── collect/          # All scrapers
│   │   │   ├── collect_naukri.py
│   │   │   ├── collect_indeed.py
│   │   │   ├── collect_glassdoor.py
│   │   │   ├── collect_adzuna.py
│   │   │   ├── collect_reddit.py       # Arctic Shift API
│   │   │   ├── collect_hackernews.py   # HN Firebase + Algolia
│   │   │   ├── collect_devto.py        # Dev.to API
│   │   │   └── collect_fresher.py      # Fresher/junior roles
│   │   ├── process/
│   │   │   └── process_jobs.py         # Dedup, seniority, role classification
│   │   └── analyze/
│   │       ├── reload_cache.py         # Load JD cache → parquet
│   │       ├── skills_baseline.py      # Skill extraction
│   │       ├── enrich_jobs.py          # Post-processing enrichment
│   │       ├── fix_role_family.py      # Role classifier
│   │       ├── normalize_titles.py     # Title cleaning
│   │       ├── generate_report.py      # India market report
│   │       └── generate_job_descriptions.py  # YAML from real data
│   └── app/
│       └── streamlit_app.py            # Interactive dashboard
│
├── interview/
│   └── data/
│       ├── sources/                    # Curated resources
│       │   ├── india-sources.md        # Books, blogs, courses (India)
│       │   └── devto-articles-*.md     # Auto-collected Dev.to articles
│       ├── job-descriptions/           # Company hiring profiles
│       │   ├── india-companies-generated.yaml   # Auto-generated from scraped data
│       │   ├── india-companies-summary.md
│       │   └── india-companies.yaml    # Hand-curated: Swiggy, Flipkart, Razorpay...
│       └── research-exports/           # Raw interview data
│           ├── india-interview-patterns.md
│           ├── reddit/                 # Reddit interview experiences
│           └── hackernews/             # HN interview experiences
│
├── data/
│   ├── raw/                            # Raw scraped data (JSONL)
│   │   ├── naukri/
│   │   ├── indeed/
│   │   ├── glassdoor/
│   │   ├── adzuna/
│   │   ├── reddit/
│   │   ├── hackernews/
│   │   └── devto/
│   └── processed/
│       ├── jobs.parquet
│       ├── jobs_with_skills.parquet
│       └── jd_cache/                   # Fetched job descriptions
│
└── docs/
    ├── india-market-analysis.md        # Main report
    └── charts/                         # Analysis charts (14 PNG files)

Quick Start

Prerequisites

python -m venv venv
venv\Scripts\activate
pip install playwright pandas pyarrow requests pyyaml matplotlib seaborn
playwright install chromium

Run the full pipeline

# 1. Collect jobs
python job-market\src\collect\collect_naukri.py
python job-market\src\collect\collect_indeed.py
python job-market\src\collect\collect_adzuna.py
python job-market\src\collect\collect_fresher.py

# 2. Collect community data (no auth needed)
python job-market\src\collect\collect_reddit.py
python job-market\src\collect\collect_hackernews.py
python job-market\src\collect\collect_devto.py

# 3. Process
python job-market\src\process\process_jobs.py
python job-market\src\analyze\reload_cache.py
python job-market\src\analyze\fix_role_family.py
python job-market\src\analyze\enrich_jobs.py

# 4. Generate outputs
python job-market\src\analyze\generate_report.py
python job-market\src\analyze\generate_job_descriptions.py

# 5. Run dashboard
streamlit run job-market\app\streamlit_app.py

Fetch full job descriptions (slow, ~30 mins)

python job-market\src\analyze\fetch_jd.py

Key Findings

Skills by Role (India, March 2026)

Skill Data Engineer Data Analyst ML Engineer Data Scientist GenAI Engineer
Python
SQL
Apache Spark
Apache Kafka
AWS/Azure/GCP
PyTorch/TF
RAG/LangChain
Power BI/Tableau

India vs Global

Aspect India Global
Most in-demand role Data Engineer AI Engineer
Entry-level openings High (63% of DA roles) Low
GenAI adoption 37% 55%+
Salary transparency Very low Moderate
Remote work ~10% ~35%
Top hiring company type Service (TCS, Capgemini) Product/Startup

Company Types Hiring

  • Service companies (TCS, Infosys, Capgemini) — bulk of fresher hiring, structured growth
  • GCCs (Walmart, JPMorgan, Barclays) — best pay, US/UK work culture, 4-8 week process
  • Unicorns (Swiggy, Flipkart, Razorpay) — best tech stack, ownership culture, 2-3 week process
  • Startups (Krutrim, Sarvam, AryaXAI) — cutting-edge AI, ESOPs, fast hiring

Interview Data

Hand-curated + auto-collected interview experiences from:

  • Reddit — r/developersIndia, r/indianstartups (412 posts/comments)
  • HackerNews — "Who is hiring?" threads + Algolia search (67 items)
  • Company profiles — Swiggy, Flipkart, Razorpay, PhonePe, JPMorgan, TCS, Google India, Krutrim, Sarvam

See interview/data/research-exports/ for full interview patterns.


Contributing

Found a mistake? Know something that's changed? PRs welcome.

  • Add your interview experience → interview/data/research-exports/
  • Add a company profile → interview/data/job-descriptions/india-companies.yaml
  • Fix a skill mapping → job-market/src/analyze/reload_cache.py

Roadmap

  • Role pages: ML Engineer, GenAI Engineer, Data Scientist, MLOps
  • Interview prep guides per role
  • GitHub Pages public site
  • AmbitionBox scraper for India-specific interview reviews
  • Salary data from community submissions
  • Quarterly refresh automation

Data collected March 2026. Re-run scrapers for fresh data.
Not affiliated with any job board or company mentioned.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors