Skip to content

lokeshmuvva/roleindex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RoleIndex

An AI-powered job intelligence platform that automatically discovers, enriches, and lets you chat with job postings data -- running entirely in Docker.

What It Does

  1. Scheduled data collection -- A cron job fires every morning and calls the FastAPI backend, which queries the Google Custom Search API for new engineering/data science job postings at Google and Meta.
  2. Skills extraction -- For each job link, the backend scrapes the qualifications section from the company careers page (BeautifulSoup) and sends it to Gemini 2.5 Flash, which returns a concise list of required skills (e.g., ["Python", "Spark", "MLOps"]).
  3. Cloud storage -- Enriched job records (date, title, skills, link) are stored as dated JSON blobs in Google Cloud Storage.
  4. Job table dashboard -- A Streamlit page reads from GCS and renders an interactive, filterable table with company sidebar checkboxes and a date range slider.
  5. AI chatbot -- A second Streamlit page starts a Gemini chat session pre-loaded with all your job data so you can ask questions like "Which Google roles require Kubernetes?" or "Summarize all Meta jobs from this week."
  6. Resume matcher -- Upload a PDF resume and get a skill-match score against every collected job posting, with missing skills highlighted.

Screenshots

To add screenshots: run the app, take screenshots of each page, save them to docs/, and uncomment the lines above.

Architecture

Google Custom Search API
        |
  [FastAPI container]          <-- POST /search_and_save/jobs
        |
  BeautifulSoup scraper        <-- Google Careers & Meta Careers pages
        |
  Gemini 2.5 Flash             <-- skills extraction
        |
  Google Cloud Storage         <-- dated JSON blobs
        |
  [Cron container]             <-- triggers FastAPI daily at 8 AM
        |
  [Streamlit container]
    |-- Page 1: Job Table      <-- filterable by company, deduped by link
    |-- Page 2: AI Chatbot     <-- context-aware Gemini chat over your job data
    +-- Page 3: Resume Matcher <-- PDF upload, skill matching, gap analysis

All three containers communicate over a shared Docker bridge network.

Tech Stack

Layer Technology
REST API FastAPI + Pydantic
Frontend Streamlit (multi-page)
Container orchestration Docker Compose
Scheduling cron (Linux container)
Web scraping BeautifulSoup 4
LLM Gemini 2.5 Flash (google-genai SDK)
Cloud storage Google Cloud Storage
Job search Google Custom Search API
Language Python 3.13

Prerequisites

  • Docker Desktop installed and running
  • A GCP project with Cloud Storage enabled
  • A GCP service account JSON key file
  • API keys for Google Custom Search and Gemini

Setup

  1. Clone the repo:

    git clone https://github.com/lokeshmuvva/roleindex.git
    cd roleindex
  2. Copy the example env file and fill in your credentials:

    cp .env.example .env
  3. Build and start:

    docker compose build
    docker compose up
  4. Open http://localhost in your browser.

Pages

  • Table View -- filterable job table with date, title, skills, and clickable links
  • Chatbot -- ask Gemini anything about your current job postings
  • Resume Matcher -- upload a PDF resume and see match scores against all jobs

Manual Data Refresh

To trigger a data fetch without waiting for the 8 AM cron:

curl -X POST http://localhost:8000/search_and_save/jobs \
  -H "Content-Type: application/json" \
  -d '{"no_days_to_search": 5, "job_title": "engineer", "company_dict": {"Meta": "http://www.metacareers.com/jobs", "Google": "https://www.google.com/about/careers/applications/jobs"}}'

Project Structure

roleindex/
|-- docker-compose.yml          # 3-service orchestration
|-- .env.example                # template for secrets
|-- fastapi/
|   |-- Dockerfile
|   |-- environment.yml
|   |-- extract_save_data.py    # FastAPI app + pipeline endpoint
|   |-- gemini_summarizer.py    # Gemini skills extraction
|   |-- google_parser.py        # scraper for Google Careers
|   |-- meta_parser.py          # scraper for Meta Careers
|   +-- user_definition.py      # env var loading
|-- streamlit/
|   |-- Dockerfile
|   |-- environment.yml
|   |-- main.py                 # multi-page router
|   |-- dashboard.py            # job table page
|   |-- chatbot.py              # AI chatbot page
|   |-- resume_matcher.py       # resume skill matching page
|   +-- user_definition.py      # env var loading
+-- crontab/
    |-- Dockerfile
    |-- api-cron                # cron schedule (daily at 8 AM)
    +-- entrypoint.sh           # startup script with health check

About

AI-powered job intelligence platform with automated discovery, skill extraction, chatbot, and resume matching

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors