An AI-powered job intelligence platform that automatically discovers, enriches, and lets you chat with job postings data -- running entirely in Docker.
- Scheduled data collection -- A cron job fires every morning and calls the FastAPI backend, which queries the Google Custom Search API for new engineering/data science job postings at Google and Meta.
- Skills extraction -- For each job link, the backend scrapes the qualifications section from the company careers page (BeautifulSoup) and sends it to Gemini 2.5 Flash, which returns a concise list of required skills (e.g.,
["Python", "Spark", "MLOps"]). - Cloud storage -- Enriched job records (date, title, skills, link) are stored as dated JSON blobs in Google Cloud Storage.
- Job table dashboard -- A Streamlit page reads from GCS and renders an interactive, filterable table with company sidebar checkboxes and a date range slider.
- AI chatbot -- A second Streamlit page starts a Gemini chat session pre-loaded with all your job data so you can ask questions like "Which Google roles require Kubernetes?" or "Summarize all Meta jobs from this week."
- Resume matcher -- Upload a PDF resume and get a skill-match score against every collected job posting, with missing skills highlighted.
To add screenshots: run the app, take screenshots of each page, save them to
docs/, and uncomment the lines above.
Google Custom Search API
|
[FastAPI container] <-- POST /search_and_save/jobs
|
BeautifulSoup scraper <-- Google Careers & Meta Careers pages
|
Gemini 2.5 Flash <-- skills extraction
|
Google Cloud Storage <-- dated JSON blobs
|
[Cron container] <-- triggers FastAPI daily at 8 AM
|
[Streamlit container]
|-- Page 1: Job Table <-- filterable by company, deduped by link
|-- Page 2: AI Chatbot <-- context-aware Gemini chat over your job data
+-- Page 3: Resume Matcher <-- PDF upload, skill matching, gap analysis
All three containers communicate over a shared Docker bridge network.
| Layer | Technology |
|---|---|
| REST API | FastAPI + Pydantic |
| Frontend | Streamlit (multi-page) |
| Container orchestration | Docker Compose |
| Scheduling | cron (Linux container) |
| Web scraping | BeautifulSoup 4 |
| LLM | Gemini 2.5 Flash (google-genai SDK) |
| Cloud storage | Google Cloud Storage |
| Job search | Google Custom Search API |
| Language | Python 3.13 |
- Docker Desktop installed and running
- A GCP project with Cloud Storage enabled
- A GCP service account JSON key file
- API keys for Google Custom Search and Gemini
-
Clone the repo:
git clone https://github.com/lokeshmuvva/roleindex.git cd roleindex -
Copy the example env file and fill in your credentials:
cp .env.example .env
-
Build and start:
docker compose build docker compose up
-
Open http://localhost in your browser.
- Table View -- filterable job table with date, title, skills, and clickable links
- Chatbot -- ask Gemini anything about your current job postings
- Resume Matcher -- upload a PDF resume and see match scores against all jobs
To trigger a data fetch without waiting for the 8 AM cron:
curl -X POST http://localhost:8000/search_and_save/jobs \
-H "Content-Type: application/json" \
-d '{"no_days_to_search": 5, "job_title": "engineer", "company_dict": {"Meta": "http://www.metacareers.com/jobs", "Google": "https://www.google.com/about/careers/applications/jobs"}}'roleindex/
|-- docker-compose.yml # 3-service orchestration
|-- .env.example # template for secrets
|-- fastapi/
| |-- Dockerfile
| |-- environment.yml
| |-- extract_save_data.py # FastAPI app + pipeline endpoint
| |-- gemini_summarizer.py # Gemini skills extraction
| |-- google_parser.py # scraper for Google Careers
| |-- meta_parser.py # scraper for Meta Careers
| +-- user_definition.py # env var loading
|-- streamlit/
| |-- Dockerfile
| |-- environment.yml
| |-- main.py # multi-page router
| |-- dashboard.py # job table page
| |-- chatbot.py # AI chatbot page
| |-- resume_matcher.py # resume skill matching page
| +-- user_definition.py # env var loading
+-- crontab/
|-- Dockerfile
|-- api-cron # cron schedule (daily at 8 AM)
+-- entrypoint.sh # startup script with health check