Intelligent automation tool to extract structured information from ANY job notification PDF using 100% FREE Google Gemini API.
Perfect for government jobs, private sector positions, internships, and international opportunities!
Automatically extracts key information from ANY job notification PDF - whether it's government, private sector, startup, or international positions:
- โ Organization Name - Company/Department/Commission name
- โ Post Name - Job title/designation/role
- โ Total Vacancies - Number of open positions
- โ Salary Range - Compensation/pay scale
- โ Location - Job location/remote options
- โ Qualifications - Required education/certifications
- โ Experience Required - Years/type of experience needed
- โ Application Deadline - Last date to apply
- โ Application Fee - Fee amount (if applicable)
- โ Age Limit - Age criteria (if applicable) Returns clean JSON format ready for database/API integration, automation workflows, or chatbot training.
- Automate government job notification processing
- Extract data from UPSC, SSC, Railway notifications
- Feed structured data to blog/content generators
- Build searchable job databases
- Process company job postings
- Standardize data from multiple sources
- Auto-populate job listings
- Track hiring trends
- Quick summary of lengthy PDFs
- Compare multiple opportunities
- Track application deadlines
- Build personal job database
- API integration ready
- Automation workflow component
- Training data for chatbots
- Analytics and insights
| Technology | Purpose |
|---|---|
| Python 3.13 | Core programming language |
| PyPDF2 | PDF text extraction |
| Google Gemini 2.5 Flash API | AI-powered data extraction (FREE!) |
| OpenAI SDK | API client library |
- Uses Google Gemini API - no credit card required
- 250 requests/day on free tier
- Perfect for startup/MVP stage
- Extracts data in 5-10 seconds
- Handles multi-page PDFs
- Real-time progress updates
- AI understands complex government documents
- Handles various PDF formats
- Smart "Not mentioned" handling for missing data
- JSON format for easy integration
- Structured, consistent data
- Ready for automation workflows
- All processing on-device
- No data stored externally
- API key stays local
- Python 3.10 or higher
- FREE Google Gemini API key
- Clone this repository git clone https://github.com/kdeepak2001/jobyaari-pdf-extractor.git cd jobyaari-pdf-extractor
- Install required packages pip install PyPDF2 openai
- Get FREE Gemini API key
- Go to Google AI Studio
- Sign in with Google account
- Click "Get API key" โ "Create API key in new project"
- Copy your key (starts with
AIzaSy...)
- Add your API key to the code
- Open
pdf_extractor.pyin any text editor - Find line 14:
api_key="YOUR_GEMINI_API_KEY_HERE" - Replace
YOUR_GEMINI_API_KEY_HEREwith your actual key - Save the file
- Open
-
Place your PDF in the same folder and name it
sample_job.pdf -
Run the extractor python pdf_extractor.py
-
Check results in
extracted_job_data.json
{ organization_name": "UNION PUBLIC SERVICE COMMISSION", "post_name": "Combined Medical Services Examination 2025", "total_vacancies": "150", "salary": "Rs. 56,100 - 1,77,500", "location": "All India", "qualifications": "MBBS degree from recognized university", "experience_required": "Not mentioned", "application_deadline": "15th October 2025", "application_fee": "Rs. 100", "age_limit": "21-32 years }
This project demonstrates:
- Practical use of LLMs (Large Language Models)
- Prompt engineering for structured extraction
- API integration and error handling
- Addresses real JobYaari use case
- Automates manual data entry
- Scalable solution (250 PDFs/day free)
- $0 monthly cost vs paid alternatives
- Perfect for startup MVP stage
- Sustainable for long-term use
- Clean, documented code
- Error handling and user feedback
- Modular, maintainable structure
- Manual PDF processing automation
- Quick data extraction from notifications
- JSON output for database storage
- Blog Generator: Feed extracted data to GPT for article generation
- Social Media Automation: Auto-create posts from job data
- Chatbot Training: Build FAQ database from job notifications
- Analytics Dashboard: Track trends in government hiring
- Multi-PDF Batch Processing: Process 100+ PDFs automatically
- 250 requests/day = 7,500 PDFs/month
- 250K tokens/minute = Handles large multi-page PDFs
- No time limit on free tier
- Average processing: 8-12 seconds/PDF
- Can process 30-40 PDFs/hour
- Suitable for JobYaari's daily job notification volume
-
PDF Text Extraction (PyPDF2)
- Reads all pages from PDF
- Extracts raw text content
- Handles multi-page documents
-
AI Processing (Gemini API)
- Sends text to Gemini 2.5 Flash model
- Uses structured prompt for JSON extraction
- Temperature set to 0.1 for consistency
-
Data Structuring
- Validates JSON output
- Saves to file for persistence
- Ready for database insertion
| Component | Purpose |
|---|---|
extract_pdf_text() |
Reads PDF file and extracts raw text from all pages |
extract_job_details() |
Sends text to Gemini AI and extracts structured JSON data |
main execution |
Coordinates the workflow and handles file I/O |
- OCR Support - Handle scanned/image-based PDFs
- Batch Processing - Process multiple PDFs in one run
- Web Interface - Streamlit UI for non-technical users
- API Endpoint - REST API for integration
- Auto-Categorization - Classify jobs by department/type
- Email Notifications - Alert when new jobs extracted
- Database Integration - Direct MongoDB/PostgreSQL save
Built by: K Deepak
Purpose: AI Agent Development Internship Application at JobYaari
Built in: October 2025
Time to Build: 1 day (concept to deployment)
I'm passionate about using AI to solve real-world problems and make information accessible. While researching JobYaari's AI Agent Development Internship, I identified their core challenge: automating government job notification processing. Instead of just applying with a resume, I built the exact solution they need. This demonstrates my approach: Don't just talk about skills - build working solutions.
JobYaari's mission to simplify job notification pdf for millions of aspirants aligns perfectly with my goal to build practical, scalable AI solutions for social impact. This tool processes 250 PDFs/day at zero cost - perfect for their growth stage.
This is a portfolio project, but suggestions are welcome! Found a bug? Have an idea? Open an issue!
MIT License - Free to use and modify
- Google for FREE Gemini API access
- JobYaari for the inspiration and mission
- Open source community for amazing tools
Open to: AI/ML Internships | Prompt Engineering Roles | Automation Projects
๐ก Interested in this project or want to collaborate? Feel free to reach out!
If you find this project useful, please give it a star! โญ It helps others discover this solution and supports my application to JobYaari!
Built with โค๏ธ for automating job information accessiblity