Skip to content

Integrate legal retrieval into document generation; add hybrid retriever and RetrievalService#9

Open
delisha02 wants to merge 4 commits intomainfrom
codex/clarify-fact-extraction-processes-and-algorithms-tlweu8
Open

Integrate legal retrieval into document generation; add hybrid retriever and RetrievalService#9
delisha02 wants to merge 4 commits intomainfrom
codex/clarify-fact-extraction-processes-and-algorithms-tlweu8

Conversation

@delisha02
Copy link
Owner

Motivation

  • Ground generated legal drafts with retrieved legal context and preserve source provenance to reduce hallucinations and improve auditability.
  • Centralize retriever construction so research and generation pipelines use a single configurable retrieval strategy (dense + sparse hybrid).
  • Make the retriever implementation resilient to missing optional dependencies and enable local testing of hybrid behavior.

Description

  • Added a retrieval facade app/services/retrieval_service.py and implemented a hybrid in-memory retriever builder get_hybrid_retriever plus resilient fallbacks in app/agents/legal_research/retrievers.py to support dense (Chroma + embeddings) and sparse (BM25) retrieval and fusion.
  • Integrated retrieval into the document generation pipeline in app/api/v1/endpoints/documents.py by building a compact retrieval query, fetching retrieved_legal_context and retrieved_legal_sources, injecting them into case_facts, and returning retrieval_sources in the generation response using the new GeneratedDocumentResponse schema in app/schemas/document.py.
  • Updated the generation prompt in app/agents/document_generator/prompt_templates.py to include the Grounded Legal Context (Retrieved) block and fixed logging indentation; updated app/agents/legal_research/agent.py to use the RetrievalService; added documentation files (docs/*.md) describing the RAG upgrade plan.
  • Added unit tests backend/unit_tests/test_retrievers_unit.py that mock embedding/vector/BM25 components to validate hybrid retriever construction and contract (invoke) behavior.

Testing

  • Ran the retriever unit tests via pytest backend/unit_tests/test_retrievers_unit.py, which exercised get_hybrid_retriever construction and the ensemble invoke contract, and the tests passed.
  • Smoke-tested the POST /generate code path locally with retrieval disabled (fallback) to ensure no runtime errors when retrieval fails and the endpoint returns persisted document fields as expected.

Codex Task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant