UPSC Answer Script Evaluation UI

A Streamlit app to upload a scanned PDF of UPSC answer sheets, run OCR (Tesseract or Google Vision), parse question-answer pairs, and evaluate answers using a rubric. Works without LLM (heuristic baseline) or with OpenAI for improved parsing and scoring.

Features

PDF to image rendering (PyMuPDF)
OCR via:
- Tesseract (local, free)
- Google Cloud Vision (service account JSON)
Q/A parsing:
- Heuristic baseline (no LLM)
- OpenAI model (e.g., gpt-4o-mini) for structured JSON extraction
Evaluation:
- Heuristic baseline scoring
- OpenAI-based rubric scoring (Relevance, Accuracy, Depth, Structure, Language)
JSON report download
Toggle to skip the first two pages during OCR (useful to ignore cover/index pages)
Optional factual grounding via web search to produce specific, cited comments

Setup (Windows / PowerShell)

Install Python 3.10+ and pip.
Create a virtual environment (optional but recommended):

py -m venv .venv
.\.venv\Scripts\Activate.ps1

Install Python dependencies:

pip install -r requirements.txt

Install Tesseract (for local OCR):

Download Windows installer from: https://github.com/UB-Mannheim/tesseract/wiki
After install, note the path to tesseract.exe (commonly C:\\Program Files\\Tesseract-OCR\\tesseract.exe).
In the app sidebar, provide this path if Tesseract is not on PATH.

Optional: Configure Google Vision OCR

Create a Google Cloud project and enable Vision API.
Create a service account and download the JSON key (a .json file).
In the app sidebar, choose "Google Vision" as the OCR provider and upload the JSON key file when prompted. The app stores it in a temporary file for this session and uses it for OCR.

Optional: Use OpenAI for parsing/scoring

Set your OpenAI key in the sidebar, or export before running:

$env:OPENAI_API_KEY = "sk-..."

Run the app

streamlit run app.py

Then open the local URL shown (typically http://localhost:8501).

Usage Tips

Start with Tesseract OCR for a quick baseline. If handwriting is poor, try Google Vision.
If you don’t have an OpenAI key, the app still works with heuristic parsing and scoring.
Adjust rubric weights in the sidebar based on the marking scheme.
If your PDF's first two pages are covers or instructions, enable "Skip first 2 pages during OCR" in the sidebar. Rendering/preview still shows all pages, but OCR starts from page 3.
Enable "Factual grounding via web search" when using OpenAI to make comments more specific and include citations. This uses DuckDuckGo search; no API key required.

Project Structure

app.py – Streamlit UI
core/ – Core logic package
- evaluator.py – Evaluation and parsing logic
- vlm.py – Vision Language Model integration
- ocr.py – OCR handling (Tesseract/Google Vision)
- pdf.py – PDF processing
- common.py – Shared utilities
- models.py – Data models
requirements.txt – Python deps
EvaluationPrototype.ipynb – Original prototype notebook

Troubleshooting

PyMuPDF or Tesseract import errors: ensure pip install -r requirements.txt was successful.
Tesseract not found: provide full path to tesseract.exe in the sidebar.
Google Vision errors: verify the service account has Vision API access and that you uploaded a valid JSON key in the sidebar.
OpenAI errors: check the model name (e.g., gpt-4o-mini) and key validity.

Error cannot import name 'Sentinel' from 'typing_extensions': your environment has an outdated typing_extensions.

Fix:

pip install -U typing_extensions
# or ensure prod env uses the pinned version from requirements.txt
pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
__pycache__		__pycache__
core		core
DASHBOARD_README.md		DASHBOARD_README.md
EvaluationPrototype.ipynb		EvaluationPrototype.ipynb
README.md		README.md
app.py		app.py
dashboard.py		dashboard.py
evaluation_legacy.py		evaluation_legacy.py
gen-lang-client-0507104344-262e49bde0ae.json		gen-lang-client-0507104344-262e49bde0ae.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UPSC Answer Script Evaluation UI

Features

Setup (Windows / PowerShell)

Run the app

Usage Tips

Project Structure

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

UPSC Answer Script Evaluation UI

Features

Setup (Windows / PowerShell)

Run the app

Usage Tips

Project Structure

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages