Skip to content

feat(issue-2): Add project structure for 21st Century Skills VLM Model #17

Open
ravencore06 wants to merge 6 commits into
theapprenticeproject:mainfrom
ravencore06:main
Open

feat(issue-2): Add project structure for 21st Century Skills VLM Model #17
ravencore06 wants to merge 6 commits into
theapprenticeproject:mainfrom
ravencore06:main

Conversation

@ravencore06
Copy link
Copy Markdown

@ravencore06 ravencore06 commented May 9, 2026

HI @manua-glitch ,
In this PR, I've set up the initial directory structure and dependencies for the VLM Evaluation Pipeline to clearly separate it from the existing Voice AI system.

Closes #2
Moving forward, my planned contributions for this project include:

  1. Building the data preparation scripts to format the student artifacts and rubrics.
  2. Developing the LoRA fine-tuning pipeline for an open-source VLM (like Qwen-VL or LLaVA).
  3. Creating the benchmarking and cost-evaluation scripts to ensure we achieve the < ₹0.10 target.

Looking forward to your feedback on this initial setup!

@ravencore06
Copy link
Copy Markdown
Author

ravencore06 commented May 9, 2026

Changes Made for Initial contribution:

  • Documentation: Updated the main README.md to clearly define the two distinct AI initiatives within this repository.
  • Project Structure: Created the vlm_evaluation/ directory to house all VLM-related data, scripts, and model evaluations.
  • Dependencies: Added vlm_evaluation/requirements.txt containing the necessary ML packages (transformers, torch, peft, bitsandbytes, Pillow, accelerate, datasets) required for upcoming VLM fine-tuning and inference tasks.

Related to #2

Checklist

  • Code structure set up
  • Dependencies isolated and defined
  • Documentation updated

@ravencore06 ravencore06 changed the title feat: Add project structure for 21st Century Skills VLM Model feat: Add project structure for 21st Century Skills VLM Model May 14, 2026
@ravencore06 ravencore06 changed the title feat: Add project structure for 21st Century Skills VLM Model feat(issue-2): Add project structure for 21st Century Skills VLM Model May 16, 2026
@ravencore06
Copy link
Copy Markdown
Author

Summary

Added input validation to prevent processing invalid user inputs across the voice AI pipeline.

Changes

  • asr.py: Added validate_transcription() to check for None, empty, too short (<2 chars), or too long (>500 chars) inputs
  • llm.py: Added validation in generate_response() to reject empty inputs, text >1000 chars, or >200 tokens
  • main.py: Integrated validation checks before passing input to LLM with user-friendly error messages

Related to #2

Why

Prevents crashes and provides graceful error handling with clear feedback to users.

@ravencore06
Copy link
Copy Markdown
Author

VLM Evaluation Pipeline Context

What it does

Evaluates student artifacts (images) against 21st-century skills rubrics using Vision Language Models (LLaVA-1.5-7B) with 4-bit quantization for cost efficiency.

Key Components

  • dataset.py: Loads image-metadata JSON, handles relative/absolute paths
  • evaluate.py:
    • Loads LLaVA model with 4-bit quantization
    • Runs inference on images with dynamic rubrics
    • Extracts scores from SCORE: N format
    • Computes metrics (exact accuracy, within-1 accuracy, MAE, parse rate)
  • prompts.py: System prompt + evaluation prompt template with SCORE:/FEEDBACK: format
  • generate_sample_data.py: Creates 3 dummy RGB images + JSON dataset (origami, drawing, clay model)
  • run_benchmark.ps1: PowerShell wrapper to execute evaluation with parameters
  • sample_dataset.json: 3 test entries with ground truth scores (4, 3, 5)

Use Case

Automate grading of student work using open-source VLMs instead of expensive proprietary APIs.

@ravencore06
Copy link
Copy Markdown
Author

Hi @manua-glitch
I've just pushed a major update to PR #17 to directly address the < ₹0.10 cost constraint for the VLM evaluation pipeline.
Based on the excellent points raised by other contributors in this thread (specifically regarding token budget blowouts and evaluation consistency), I have completely overhauled the inference pipeline to optimize for cost and accuracy:

  1. Structured JSON Rubric Schema: I replaced the plain text rubric strings with a strict JSON schema (e.g., {"skill": "creativity", "dimension": "originality", "max_score": 5}). This standardizes the labels for the supervised fine-tuning phase. 2. Constrained Decoding & Native Parsing: I updated the prompt architecture and the evaluate.py pipeline to force the Vision Language Model to output strictly valid JSON instead of free-form text. By parsing JSON natively instead of relying on Regex, we eliminate post-processing errors and cut output token generation by ~70%, which is crucial for hitting the cost target.
    The PR is officially linked to close this issue. I would love to hear your thoughts on this updated approach when you have a moment to review! Let me know if there are any specific adjustments you'd like to see before I move on to the dataset preparation milestone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DMP 2026]: Developing a Cost-Efficient AI Model for Evaluating 21st Century Skills

1 participant