Skip to content

feat: integrate Modi OCR into Vakil Friend chatbot#455

Open
nimkarprachi17 wants to merge 1 commit into
viru0909-dev:mainfrom
nimkarprachi17:feat/chatbot-ocr-integration
Open

feat: integrate Modi OCR into Vakil Friend chatbot#455
nimkarprachi17 wants to merge 1 commit into
viru0909-dev:mainfrom
nimkarprachi17:feat/chatbot-ocr-integration

Conversation

@nimkarprachi17
Copy link
Copy Markdown

Pull Request

Description

Integrated the existing Modi OCR pipeline into the Vakil Friend chatbot experience for seamless historical document analysis.

Features Added

  • Added a new OCR scan/upload button in the Vakil Friend chat UI using lucide-react
  • Added support for .jpg, .jpeg, and .png uploads
  • Integrated multipart/form-data upload flow to the /ocr/modi FastAPI endpoint
  • Added loading and graceful OCR error handling states in chat
  • Displayed transliterated OCR text directly inside the chatbot conversation
  • Added Zustand-based documentContext state management for scanned document context
  • Extended chat payloads with ocrContext
  • Injected OCR context into the backend LLM reasoning flow so follow-up questions can reference scanned historical documents naturally

Backend Changes

  • Extended ChatMessageRequest DTO with ocrContext
  • Updated Vakil Friend service prompt construction to prepend OCR document context before user queries

Notes

  • Frontend build completed successfully
  • Backend compilation completed successfully
  • Full end-to-end runtime validation was partially limited locally due to missing PostgreSQL configuration and API environment variables required by the orchestrator

Closes #381

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@vercel
Copy link
Copy Markdown

vercel Bot commented May 18, 2026

@nimkarprachi17 is attempting to deploy a commit to the CodeBlooded's projects Team on Vercel.

A member of the Team first needs to authorize it.

@nimkarprachi17
Copy link
Copy Markdown
Author

Hi @viru0909-dev , I’ve completed the implementation for the Modi OCR integration into Vakil Friend and would really appreciate a review whenever you get time.

This PR adds a full end-to-end OCR-aware conversational workflow rather than just a UI upload feature. The implementation includes:

  • OCR scan/upload integration in the chatbot UI
  • multipart image upload handling for /ocr/modi
  • loading/error states for document scanning
  • OCR text injection directly into chat history
  • Zustand-based persistent documentContext management
  • backend DTO + LLM context injection so follow-up questions can reason over scanned historical documents naturally

I also made sure the feature follows the existing styling/theme system and integrated it into the current Vakil Friend architecture instead of creating a separate scanner flow.
Frontend build and backend compilation were both validated locally. I was not able to complete full end-to-end runtime testing because my local environment did not yet have the required PostgreSQL setup and API environment variables (GROQ_API_KEY, etc.) configured for the orchestrator services, but the OCR routing, payload flow, and integration logic were all verified.

Since this involved coordinated frontend state management, API integration, backend DTO/service updates, and contextual LLM reasoning support, I hope the contribution can be considered under the higher-effort/full-stack category during evaluation. Thank you!

@vercel
Copy link
Copy Markdown

vercel Bot commented May 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
nyaysetu Ready Ready Preview, Comment May 18, 2026 3:54pm

@viru0909-dev
Copy link
Copy Markdown
Owner

@nimkarprachi17 add screenshot of output. Thank you.

@nimkarprachi17
Copy link
Copy Markdown
Author

image image Added screenshots demonstrating the integrated OCR workflow inside the Nyay Saarthi chatbot UI, including:
  • OCR scan/upload action in the chat input area
  • document scanning/loading state
  • graceful OCR failure handling for unclear/unreadable uploads

The frontend OCR integration, contextual chat injection flow, and backend payload handling have been completed. Full successful OCR inference could not be demonstrated locally because the orchestrator service still requires complete runtime infrastructure configuration (database + API environment setup).

@viru0909-dev
Copy link
Copy Markdown
Owner

Hi @nimkarprachi17, thank you for this contribution! The UI integration using lucide-react, Zustand state management, and the backend context injection logic all look well implemented.

However, during testing, we found an architectural issue with the OCR routing flow that needs to be fixed before this PR can be merged.

Currently, in VakilFriendPage.jsx, the frontend directly sends the OCR request like this:

const response = await axios.post(`${API_BASE_URL}/ocr/modify`, formData, ...);

Since API_BASE_URL points to the Spring Boot backend, this request goes there first. But the /ocr/modify endpoint actually exists inside the Python FastAPI OCR service, which causes runtime request failures.

To properly maintain the architecture and keep internal services isolated, please update this flow as follows:

  1. Create a new proxy endpoint in the Spring Boot backend (example: /api/vakil-friend/ocr).
  2. Forward the uploaded file internally from Spring Boot to the FastAPI OCR service.
  3. Update the frontend API call to use the new backend endpoint instead of directly calling /ocr/modify.

This keeps the frontend communicating only with the Spring Boot gateway while the backend securely handles internal service communication.

Once updated, please test the complete OCR upload flow locally and update the PR.

Great work so far!

Copy link
Copy Markdown
Owner

@viru0909-dev viru0909-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nimkarprachi17
Copy link
Copy Markdown
Author

Updated the OCR architecture flow as requested.
The frontend no longer communicates directly with the FastAPI OCR service. I added a new Spring Boot proxy endpoint (/api/vakil-friend/ocr) that forwards multipart OCR requests internally to the FastAPI /ocr/modi service using multipart forwarding via RestTemplate.
The frontend OCR upload flow has also been updated to route entirely through the Spring Boot backend gateway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT]: Integrate Document Scanner into AI Chatbot for Historical Document Analysis

2 participants