test: implementare evals complete pentru cei 3 agenti AI#76
Merged
Conversation
22 teste care evalueaza ExtractorAgent, DeciderAgent si CompilerAgent folosind mock-uri de LLM — ruleaza in CI fara API key. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Member
Author
|
@copilot make this pass test checks please thanks |
Contributor
Fixed in |
There was a problem hiding this comment.
Pull request overview
Adaugă un set de eval-uri (teste) pentru agenții AI din ClutterKill (ExtractorAgent, DeciderAgent, CompilerAgent) folosind mock-uri de LLM, astfel încât să ruleze în CI fără API key și să verifice parsing/validare + rutare + retry/repair pentru output malformat.
Changes:
- Introduce
tests/evals/test_agents.pycu scenarii pentru Extractor (invoice/unknown/malformed JSON), Decider (move vs quarantine) și Compiler (compilare reguli). - Include un test de pipeline end-to-end (Compiler → Extractor → Decider) cu output-uri mock-uite.
- Include teste Pydantic pentru
ActionDecision,ExtractionResult,ExtractedEntity.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+40
to
+43
| # DeciderAgent și CompilerAgent folosesc chain-uri cu StrOutputParser/PydanticOutputParser, | ||
| # deci mock-uim __or__ ca să suporte sintaxa `prompt | llm | parser` | ||
| llm.__or__ = lambda self, other: _ChainMock(response_text, other) | ||
| return llm |
Comment on lines
+46
to
+63
| class _ChainMock: | ||
| """Simulează un LangChain chain: ignoră prompt-ul și returnează mereu același text.""" | ||
|
|
||
| def __init__(self, text: str, next_step: Any) -> None: | ||
| self._text = text | ||
| self._next = next_step | ||
|
|
||
| def __or__(self, other: Any) -> "_ChainMock": | ||
| return _ChainMock(self._text, other) | ||
|
|
||
| def invoke(self, _: Any) -> Any: | ||
| # Dacă next_step e un parser Pydantic/Str, îl invocăm cu textul nostru | ||
| if hasattr(self._next, "parse"): | ||
| return self._next.parse(self._text) | ||
| if hasattr(self._next, "invoke"): | ||
| return self._next.invoke(self._text) | ||
| return self._text | ||
|
|
Comment on lines
+194
to
+196
| llm.invoke.side_effect = side_effect | ||
| llm.__or__ = lambda self, other: _ChainMock(REPAIRED_JSON, other) | ||
|
|
Comment on lines
+341
to
+346
| def test_compiler_extracts_category_from_invoice_prompt() -> None: | ||
| agent = CompilerAgent(llm=_make_llm(COMPILER_FACTURA_JSON)) | ||
| agent._chain = MagicMock() | ||
| agent._chain.invoke = MagicMock( | ||
| return_value=CompiledRule( | ||
| category="factura", |
Comment on lines
+290
to
+303
| def test_action_decision_sanitizes_filename_slashes() -> None: | ||
| d = ActionDecision( | ||
| status="move", suggested_name="factura/enel:2023.pdf", suggested_folder="." | ||
| ) | ||
| assert "/" not in d.suggested_name | ||
| assert ":" not in d.suggested_name | ||
|
|
||
|
|
||
| def test_action_decision_rejects_invalid_status() -> None: | ||
| with pytest.raises(ValidationError): | ||
| ActionDecision( | ||
| status="delete", suggested_name="doc.pdf", suggested_folder="Trash" | ||
| ) | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
22 teste care evalueaza ExtractorAgent, DeciderAgent si CompilerAgent folosind mock-uri de LLM — ruleaza in CI fara API key.