feat(issue-2): Add project structure for 21st Century Skills VLM Model by ravencore06 · Pull Request #17 · theapprenticeproject/C4GT_2026

ravencore06 · 2026-05-09T07:48:17Z

HI @manua-glitch ,
In this PR, I've set up the initial directory structure and dependencies for the VLM Evaluation Pipeline to clearly separate it from the existing Voice AI system.

Closes #2
Moving forward, my planned contributions for this project include:

Building the data preparation scripts to format the student artifacts and rubrics.
Developing the LoRA fine-tuning pipeline for an open-source VLM (like Qwen-VL or LLaVA).
Creating the benchmarking and cost-evaluation scripts to ensure we achieve the < ₹0.10 target.

Looking forward to your feedback on this initial setup!

ravencore06 · 2026-05-09T07:49:40Z

Changes Made for Initial contribution:

Documentation: Updated the main README.md to clearly define the two distinct AI initiatives within this repository.
Project Structure: Created the vlm_evaluation/ directory to house all VLM-related data, scripts, and model evaluations.
Dependencies: Added vlm_evaluation/requirements.txt containing the necessary ML packages (transformers, torch, peft, bitsandbytes, Pillow, accelerate, datasets) required for upcoming VLM fine-tuning and inference tasks.

Related to #2

Checklist

Code structure set up
Dependencies isolated and defined
Documentation updated

ravencore06 · 2026-05-21T07:48:38Z

Summary

Added input validation to prevent processing invalid user inputs across the voice AI pipeline.

Changes

asr.py: Added validate_transcription() to check for None, empty, too short (<2 chars), or too long (>500 chars) inputs
llm.py: Added validation in generate_response() to reject empty inputs, text >1000 chars, or >200 tokens
main.py: Integrated validation checks before passing input to LLM with user-friendly error messages

Related to #2

Why

Prevents crashes and provides graceful error handling with clear feedback to users.

ravencore06 · 2026-05-21T07:49:49Z

VLM Evaluation Pipeline Context

What it does

Evaluates student artifacts (images) against 21st-century skills rubrics using Vision Language Models (LLaVA-1.5-7B) with 4-bit quantization for cost efficiency.

Key Components

dataset.py: Loads image-metadata JSON, handles relative/absolute paths
evaluate.py:
- Loads LLaVA model with 4-bit quantization
- Runs inference on images with dynamic rubrics
- Extracts scores from SCORE: N format
- Computes metrics (exact accuracy, within-1 accuracy, MAE, parse rate)
prompts.py: System prompt + evaluation prompt template with SCORE:/FEEDBACK: format
generate_sample_data.py: Creates 3 dummy RGB images + JSON dataset (origami, drawing, clay model)
run_benchmark.ps1: PowerShell wrapper to execute evaluation with parameters
sample_dataset.json: 3 test entries with ground truth scores (4, 3, 5)

Use Case

Automate grading of student work using open-source VLMs instead of expensive proprietary APIs.

ravencore06 · 2026-05-25T14:59:24Z

Hi @manua-glitch
I've just pushed a major update to PR #17 to directly address the < ₹0.10 cost constraint for the VLM evaluation pipeline.
Based on the excellent points raised by other contributors in this thread (specifically regarding token budget blowouts and evaluation consistency), I have completely overhauled the inference pipeline to optimize for cost and accuracy:

Structured JSON Rubric Schema: I replaced the plain text rubric strings with a strict JSON schema (e.g., {"skill": "creativity", "dimension": "originality", "max_score": 5}). This standardizes the labels for the supervised fine-tuning phase. 2. Constrained Decoding & Native Parsing: I updated the prompt architecture and the evaluate.py pipeline to force the Vision Language Model to output strictly valid JSON instead of free-form text. By parsing JSON natively instead of relying on Regex, we eliminate post-processing errors and cut output token generation by ~70%, which is crucial for hitting the cost target.
The PR is officially linked to close this issue. I would love to hear your thoughts on this updated approach when you have a moment to review! Let me know if there are any specific adjustments you'd like to see before I move on to the dataset preparation milestone.

ravencore06 added 2 commits May 3, 2026 07:18

Basic voice AI model added

0fb42e8

Initial Contribution

12cc486

VLM Evaluation Pipeline

eb5b74e

ravencore06 changed the title ~~feat: Add project structure for 21st Century Skills VLM Model~~ feat: Add project structure for 21st Century Skills VLM Model May 14, 2026

ravencore06 changed the title ~~feat: Add project structure for 21st Century Skills VLM Model~~ feat(issue-2): Add project structure for 21st Century Skills VLM Model May 16, 2026

Add Input Validation

002561b

feat: Implement structured JSON outputs and constrained decoding

43ba3cd

ravencore06 force-pushed the main branch from 43ba3cd to 002561b Compare May 25, 2026 14:41

feat: Implement JSON rubric schema and constrained decoding

624bbe2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(issue-2): Add project structure for 21st Century Skills VLM Model #17

feat(issue-2): Add project structure for 21st Century Skills VLM Model #17
ravencore06 wants to merge 6 commits into
theapprenticeproject:mainfrom
ravencore06:main

ravencore06 commented May 9, 2026 •

edited

Loading

Uh oh!

ravencore06 commented May 9, 2026 •

edited

Loading

Uh oh!

ravencore06 commented May 21, 2026

Uh oh!

ravencore06 commented May 21, 2026

Uh oh!

ravencore06 commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ravencore06 commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ravencore06 commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

ravencore06 commented May 21, 2026

Summary

Changes

Related to #2

Why

Uh oh!

ravencore06 commented May 21, 2026

VLM Evaluation Pipeline Context

What it does

Key Components

Use Case

Uh oh!

ravencore06 commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ravencore06 commented May 9, 2026 •

edited

Loading

ravencore06 commented May 9, 2026 •

edited

Loading