Skip to content

Abhaykumar9035/OCR

 
 

Repository files navigation

Uniarch OCR & Answer Assessor 📝

A Streamlit application designed to process scanned handwritten answer sheets (PDF format), extract answers using the Qwen-VL model, merge multi-page answers, and provide AI-powered assessment and chat functionalities using Google Gemini.

Placeholder Screenshot (Add a screenshot of the running application here)

Features ✨

  • PDF Upload: Upload multi-page PDF answer sheets.
  • Image Conversion: Converts PDF pages to images using PyMuPDF.
  • Advanced OCR: Utilizes the Qwen/Qwen2.5-VL-7B-Instruct model for OCR, specifically tailored to extract structured answer data (number + text) based on predefined layout rules (delimiters, number boxes).
  • JSON Output: OCR process generates structured JSON output per page.
  • Answer Merging: Intelligently merges answer text that spans multiple pages based on "Continuation" markers identified during OCR.
  • Verification Tab: Allows users to view the original image and the raw/parsed OCR output for each page.
  • AI Assessment: Uses Google Gemini (gemini-1.5-flash) to assess the quality, clarity, and coherence of the extracted answer text.
  • AI Chat Assistant: Provides a chat interface powered by Google Gemini (gemini-1.5-pro) for asking questions about the extracted content or assessments.
  • GPU Accelerated: Leverages GPU for faster Qwen-VL model inference (torch, accelerate).
  • Memory Optimization: Uses float16 precision for the Qwen model to reduce memory footprint.

Setup and Installation ⚙️

Prerequisites

  • Python: 3.9+
  • pip: Package installer for Python.
  • Git: (Optional) For cloning the repository.
  • NVIDIA GPU: Required for running the Qwen-VL model efficiently.
  • CUDA Toolkit & cuDNN: Compatible versions installed for your NVIDIA driver and PyTorch.
  • Google Gemini API Key: You need an API key from Google AI Studio.

Installation Steps

  1. Clone the Repository (Optional):

    git clone <your-repo-url>
    cd <your-repo-directory>

    Alternatively, just place main.py and requirements.txt in a directory.

  2. Create a Virtual Environment (Recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install Dependencies:

    pip install -r requirements.txt

    Note: Installing PyTorch might take time and depends on your CUDA setup. Ensure you have a compatible version.

  4. Configure API Key:

    • ⚠️ Security Warning: The current code hardcodes the Google Gemini API key. This is highly insecure for shared or deployed applications.
    • Recommended Method (Streamlit Secrets):
      • Create a directory .streamlit in your project folder.
      • Inside .streamlit, create a file named secrets.toml.
      • Add your API key to secrets.toml:
        # .streamlit/secrets.toml
        GOOGLE_API_KEY="AIzaSy..."
      • Modify main.py to load the key using st.secrets:
        # Replace the hardcoded key section in main.py
        try:
            # Attempt to load from secrets first
            api_key = st.secrets["GOOGLE_API_KEY"]
        except Exception:
             # Fallback or error (remove hardcoded fallback for production)
             st.error("Google API Key not found in Streamlit secrets (/.streamlit/secrets.toml)")
             api_key = None # Or use the hardcoded one for local testing ONLY if necessary
        
        if api_key:
            try:
                genai.configure(api_key=api_key)
                genai.list_models() # Test configuration
                st.session_state.api_key_configured = True
            except Exception as e:
                st.error(f"Gemini API configuration failed: {e}", icon="❌")
                st.session_state.api_key_configured = False
        else:
             st.session_state.api_key_configured = False
        
        # Remove the global HARDCODED_API_KEY variable and its usage
    • Alternative (Environment Variables): Set an environment variable GOOGLE_API_KEY and load it in Python using os.getenv("GOOGLE_API_KEY").

Running the Application 🚀

  1. Ensure your virtual environment is activated.
  2. Make sure the API key is configured (preferably using secrets).
  3. Run the Streamlit app:
    streamlit run main.py
  4. Open your web browser and navigate to the local URL provided by Streamlit (usually http://localhost:8501).

Docker Setup (GPU Required) 🐳

You can run this application inside a Docker container, leveraging GPU acceleration via the NVIDIA Container Toolkit.

Prerequisites

  • Docker: Install Docker Desktop or Docker Engine.
  • NVIDIA Container Toolkit: Install this to enable GPU access within Docker containers. Installation Guide

Build the Docker Image

docker build -t uniarch-ocr-assessor .

Run the Docker Container

  • Using Streamlit Secrets: Mount your .streamlit directory into the container.

    docker run --gpus all -p 8501:8501 \
      -v ./.streamlit:/app/.streamlit \
      uniarch-ocr-assessor
  • Using Environment Variables: Pass the API key as an environment variable.

    docker run --gpus all -p 8501:8501 \
      -e GOOGLE_API_KEY="AIzaSy..." \
      uniarch-ocr-assessor

    (Remember to modify main.py to read the key from os.getenv("GOOGLE_API_KEY") if using this method).

Access the application at http://localhost:8501 in your browser.

Key Technologies 🛠️

  • Streamlit: Web application framework.
  • Qwen-VL (Transformers): Vision-Language Model for OCR.
  • Google Gemini (google-generativeai): AI model for assessment and chat.
  • PyTorch: Deep learning framework (backend for Transformers).
  • PyMuPDF (fitz): PDF parsing and image conversion.
  • Pillow (PIL): Image manipulation.

Configuration ⚙️

  • API Keys: Google Gemini API key (handle securely!).
  • Models:
    • OCR: Qwen/Qwen2.5-VL-7B-Instruct
    • Assessment: gemini-1.5-flash
    • Chat: gemini-1.5-pro
    • These are hardcoded in main.py but could be made configurable.

About

Streamlit-based OCR and assessment platform integrating Qwen-VL, Gemini, PyMuPDF, and GPU-accelerated document intelligence workflows.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%