Skip to content

[FEAT]: Non-Blocking Async Extraction, Retry, and Observability for /forms/fill #152

@Acuspeedster

Description

@Acuspeedster

name: 🚀 Feature Request
about: Suggest an idea or a new capability for FireForm.
title: "[FEAT]: Non-blocking async LLM extraction with streaming, retry, and job orchestration"
labels: enhancement
assignees: ''

📝 Description

The current implementation of POST /forms/fill performs LLM extraction synchronously using requests.post() within the FastAPI request lifecycle.

Execution chain:

forms.py → controller.fill_form() → llm.get_data() → requests.post()

Because FastAPI runs on an asyncio event loop (via Uvicorn), this synchronous HTTP call blocks the event loop thread for the entire duration of Ollama inference.

Operational impact:

  • Each field extraction may take 2–5 seconds on CPU inference.
  • With sequential main_loop() (N fields), total blocking time scales linearly.
  • A 10-field form can hold the event loop for 20–50 seconds.
  • Even main_loop_batch() still blocks for the full duration of a single inference call.

This results in:

  • Event loop starvation
  • Inability to serve concurrent requests
  • 30+ second client response latency
  • No retry mechanism for missed fields
  • No per-field confidence visibility
  • No observable progress during extraction
  • No fault tolerance for partial success scenarios

💡 Rationale

PR #151 (currently open) proposes schema enforcement using Ollama’s format parameter and dynamically generated Pydantic models.

While that meaningfully improves output structural reliability, it does not address:

  • Transport-layer blocking
  • Concurrency limitations
  • Retry logic for null/missed fields
  • Event loop starvation
  • Client-side observability
  • Job orchestration

In high-impact public-sector deployments, a system must not only return structured output but must also:

  • Remain responsive under load
  • Provide progressive feedback
  • Handle partial failures gracefully
  • Support operational monitoring

This issue proposes a transport and orchestration layer redesign to meet those requirements.


🛠️ Proposed Solution

1️⃣ Asynchronous Concurrent Extraction

  • Replace requests with httpx.AsyncClient
  • Introduce async_extract_all_streaming() in src/llm.py
  • Launch per-field extraction tasks via asyncio.create_task()
  • Collect results using asyncio.as_completed()

This ensures:

  • Wall-clock time is bounded by the slowest field
  • Partial results become available immediately
  • Event loop remains free to serve other requests

If 9 fields resolve in 3 seconds and 1 resolves in 8 seconds:

  • Client receives 9 results at second 3
  • Final result at second 8

Prior implementations return nothing until second 8.


2️⃣ Two-Pass Auto-Retry Mechanism

After Pass 1 completes:

  • Any field returning None enters Pass 2.
  • _build_targeted_prompt() constructs a focused single-field prompt.
  • Explicit instruction: return -1 if not found.
  • Retry tasks launched concurrently.

Confidence scoring:

Confidence Meaning
high Extracted in Pass 1
medium Recovered in Pass 2
low Missing after both passes

This provides deterministic field-level reliability reporting.


3️⃣ Non-Blocking PDF Generation

  • Introduce fill_form_with_data() in filler.py
  • Offload pdfrw operations to a ThreadPoolExecutor via loop.run_in_executor()
  • Prevent CPU-bound PDF writes from blocking event loop

Correctness improvement:

  • None values written as empty strings instead of literal "None"

4️⃣ New Client-Observable API Surfaces

POST /forms/fill/stream

  • Returns text/event-stream
  • Emits one Server-Sent Event per field as soon as it resolves
  • Event payload includes:
    • field
    • value
    • confidence
    • phase
  • Final complete event includes:
    • submission_id
    • output_pdf_path

POST /forms/fill/async

  • Returns 202 with job_id
  • Full extraction pipeline runs as FastAPI BackgroundTask

GET /forms/jobs/{job_id}

Returns:

  • status (pending, running, complete, failed)
  • partial_results
  • field_confidence
  • output_pdf_path
  • error_message

5️⃣ Database Orchestration Layer

Introduce FillJob SQLModel table:

  • UUID primary key
  • template_id
  • input_text
  • status
  • output_pdf_path
  • partial_results (JSON)
  • field_confidence (JSON)
  • error_message
  • created_at

Repository functions:

  • create_job
  • get_job
  • update_job (**kwargs for partial updates)

This enables incremental persistence of extraction progress.


✅ Acceptance Criteria

  • Event loop no longer blocked by extraction
  • Concurrent per-field extraction implemented
  • Two-pass retry operational
  • Field-level confidence scoring implemented
  • SSE streaming endpoint functional
  • Async job queue endpoint functional
  • FillJob model implemented with incremental updates
  • Original /forms/fill endpoint preserved
  • Comprehensive test coverage

📌 Additional Context

This redesign is transport- and orchestration-focused.

It is fully compatible with schema enforcement approaches such as PR #151 and does not conflict with model-level validation strategies.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions