Skip to content

feat: batch multi-template filling with single LLM extraction. Closes…#241

Open
utkarshqz wants to merge 2 commits intofireform-core:mainfrom
utkarshqz:feat/batch-multi-template-filling
Open

feat: batch multi-template filling with single LLM extraction. Closes…#241
utkarshqz wants to merge 2 commits intofireform-core:mainfrom
utkarshqz:feat/batch-multi-template-filling

Conversation

@utkarshqz
Copy link

Description

Adds POST /forms/fill/batch — the core multi-agency filing endpoint that closes #156.

A single incident transcript now fills Police, Fire, and Ambulance PDFs simultaneously in one request. The key architectural decision is that LLM extraction runs exactly once for the entire batch. Fields from all requested templates are merged into a dynamic superset, extracted in a single Ollama call, then split back to each template deterministically using Python. No hardcoded field categories. No redundant LLM calls. Works with any PDF structure natively.

Also adds PDF path validation on template load, permanently fixing the stale path bug (#235) that caused silent 500 errors after re-cloning the repository.

 

How the single-pass extraction works:

Step 1 — Validate all templates upfront (fail fast before any LLM work)

Step 2 — Merge ALL fields from ALL templates into one dynamic superset:
          Fire form:      {NAME, DEPT, DATE, INCIDENT_TYPE}
          Ambulance form: {PATIENT_NAME, INJURY, LOCATION}
          Police form:    {OFFICER_ID, BADGE, INCIDENT_CODE}
          ─────────────────────────────────────────────────
          Superset:       {NAME, DEPT, DATE, INCIDENT_TYPE,
                           PATIENT_NAME, INJURY, LOCATION,
                           OFFICER_ID, BADGE, INCIDENT_CODE}

Step 3 — ONE LLM call with the full superset → extracted_json

Step 4 — Python splits extracted_json per template (deterministic):
          Fire subset      → fill Fire PDF
          Ambulance subset → fill Ambulance PDF
          Police subset    → fill Police PDF

Step 5 — Save each submission to DB, return per-template download links

Files changed:

  • api/routes/forms.py — added POST /forms/fill/batch, added PDF path validation to POST /forms/fill
  • api/schemas/forms.py — added BatchFormFill, BatchResultItem, BatchFormFillResponse schemas
  • api/db/repositories.py — added get_form and get_all_templates helpers
  • src/filler.py — added fill_form_with_data() for deterministic PDF filling from pre-extracted dict
  • tests/conftest.py — added db_session, tmp_pdf, clean_db fixtures
  • tests/test_forms.py — updated 503 test, added 404 path validation test

 

Fixes #156
Fixes #235
Addresses #206
Addresses #236

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

The existing test suite was extended and all tests pass locally.

New tests added:

  • test_fill_form_pdf_not_on_disk_returns_404 — verifies path validation returns clear 404
  • test_fill_form_ollama_down_returns_503 — updated to use real tmp_pdf fixture so path validation passes before Ollama mock fires

Manual testing steps:

  1. Start Ollama: ollama serve
  2. Start API: uvicorn api.main:app --reload
  3. Upload two PDF templates via POST /templates/create
  4. Send batch request:
curl -X POST http://localhost:8000/forms/fill/batch \
  -H "Content-Type: application/json" \
  -d '{
    "input_text": "John Smith, firefighter, Emergency Medical Services, phone 916-555-0147",
    "template_ids": [1, 2]
  }'
  1. Download filled PDFs from returned download_url values
  • Batch fill with 2 templates — both PDFs filled correctly
  • Partial failure — one invalid template ID returns error for that template, other succeeds
  • Stale path — deleted PDF returns clear 404 instead of 500

Test Configuration:

  • OS: Windows 11
  • Python: 3.11
  • Ollama: local, model mistral
  • DB: SQLite (fireform.db)

Test results:

python -m pytest tests/ -v
53 passed, 0 errors

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

 

⚠️ Documentation note: SETUP.md and README.md should be updated to document the new POST /forms/fill/batch endpoint and its request/response format. This will be addressed in a follow-up documentation PR alongside the src/templates/ refactor (#236).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Stale template records cause 500 errors after re-clone or file move [FEAT]: Missing Batch Endpoint for Multi-Form Filing

1 participant