RoleIndex

An AI-powered job intelligence platform that automatically discovers, enriches, and lets you chat with job postings data -- running entirely in Docker.

What It Does

Scheduled data collection -- A cron job fires every morning and calls the FastAPI backend, which queries the Google Custom Search API for new engineering/data science job postings at Google and Meta.
Skills extraction -- For each job link, the backend scrapes the qualifications section from the company careers page (BeautifulSoup) and sends it to Gemini 2.5 Flash, which returns a concise list of required skills (e.g., ["Python", "Spark", "MLOps"]).
Cloud storage -- Enriched job records (date, title, skills, link) are stored as dated JSON blobs in Google Cloud Storage.
Job table dashboard -- A Streamlit page reads from GCS and renders an interactive, filterable table with company sidebar checkboxes and a date range slider.
AI chatbot -- A second Streamlit page starts a Gemini chat session pre-loaded with all your job data so you can ask questions like "Which Google roles require Kubernetes?" or "Summarize all Meta jobs from this week."
Resume matcher -- Upload a PDF resume and get a skill-match score against every collected job posting, with missing skills highlighted.

Screenshots

To add screenshots: run the app, take screenshots of each page, save them to docs/, and uncomment the lines above.

Architecture

Google Custom Search API
        |
  [FastAPI container]          <-- POST /search_and_save/jobs
        |
  BeautifulSoup scraper        <-- Google Careers & Meta Careers pages
        |
  Gemini 2.5 Flash             <-- skills extraction
        |
  Google Cloud Storage         <-- dated JSON blobs
        |
  [Cron container]             <-- triggers FastAPI daily at 8 AM
        |
  [Streamlit container]
    |-- Page 1: Job Table      <-- filterable by company, deduped by link
    |-- Page 2: AI Chatbot     <-- context-aware Gemini chat over your job data
    +-- Page 3: Resume Matcher <-- PDF upload, skill matching, gap analysis

All three containers communicate over a shared Docker bridge network.

Tech Stack

Layer	Technology
REST API	FastAPI + Pydantic
Frontend	Streamlit (multi-page)
Container orchestration	Docker Compose
Scheduling	cron (Linux container)
Web scraping	BeautifulSoup 4
LLM	Gemini 2.5 Flash (google-genai SDK)
Cloud storage	Google Cloud Storage
Job search	Google Custom Search API
Language	Python 3.13

Prerequisites

Docker Desktop installed and running
A GCP project with Cloud Storage enabled
A GCP service account JSON key file
API keys for Google Custom Search and Gemini

Setup

Clone the repo:

git clone https://github.com/lokeshmuvva/roleindex.git
cd roleindex

Copy the example env file and fill in your credentials:
```
cp .env.example .env
```
Build and start:
```
docker compose build
docker compose up
```
Open http://localhost in your browser.

Pages

Table View -- filterable job table with date, title, skills, and clickable links
Chatbot -- ask Gemini anything about your current job postings
Resume Matcher -- upload a PDF resume and see match scores against all jobs

Manual Data Refresh

To trigger a data fetch without waiting for the 8 AM cron:

curl -X POST http://localhost:8000/search_and_save/jobs \
  -H "Content-Type: application/json" \
  -d '{"no_days_to_search": 5, "job_title": "engineer", "company_dict": {"Meta": "http://www.metacareers.com/jobs", "Google": "https://www.google.com/about/careers/applications/jobs"}}'

Project Structure

roleindex/
|-- docker-compose.yml          # 3-service orchestration
|-- .env.example                # template for secrets
|-- fastapi/
|   |-- Dockerfile
|   |-- environment.yml
|   |-- extract_save_data.py    # FastAPI app + pipeline endpoint
|   |-- gemini_summarizer.py    # Gemini skills extraction
|   |-- google_parser.py        # scraper for Google Careers
|   |-- meta_parser.py          # scraper for Meta Careers
|   +-- user_definition.py      # env var loading
|-- streamlit/
|   |-- Dockerfile
|   |-- environment.yml
|   |-- main.py                 # multi-page router
|   |-- dashboard.py            # job table page
|   |-- chatbot.py              # AI chatbot page
|   |-- resume_matcher.py       # resume skill matching page
|   +-- user_definition.py      # env var loading
+-- crontab/
    |-- Dockerfile
    |-- api-cron                # cron schedule (daily at 8 AM)
    +-- entrypoint.sh           # startup script with health check

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
crontab		crontab
fastapi		fastapi
streamlit		streamlit
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RoleIndex

What It Does

Screenshots

Architecture

Tech Stack

Prerequisites

Setup

Pages

Manual Data Refresh

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RoleIndex

What It Does

Screenshots

Architecture

Tech Stack

Prerequisites

Setup

Pages

Manual Data Refresh

Project Structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages