test: implementare evals complete pentru cei 3 agenti AI by Rosca-Teodora · Pull Request #76 · FerTeo/ClutterKill

Rosca-Teodora · 2026-06-15T16:18:18Z

22 teste care evalueaza ExtractorAgent, DeciderAgent si CompilerAgent folosind mock-uri de LLM — ruleaza in CI fara API key.

22 teste care evalueaza ExtractorAgent, DeciderAgent si CompilerAgent folosind mock-uri de LLM — ruleaza in CI fara API key. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Rosca-Teodora · 2026-06-15T16:20:05Z

@copilot make this pass test checks please thanks

Copilot · 2026-06-15T16:23:46Z

@copilot make this pass test checks please thanks

Fixed in 98faee3. I formatted ai/llm_config.py so ruff format --check passes, and re-ran checks locally (ruff check, ruff format --check, pytest), all green.

Copilot

Pull request overview

Adaugă un set de eval-uri (teste) pentru agenții AI din ClutterKill (ExtractorAgent, DeciderAgent, CompilerAgent) folosind mock-uri de LLM, astfel încât să ruleze în CI fără API key și să verifice parsing/validare + rutare + retry/repair pentru output malformat.

Changes:

Introduce tests/evals/test_agents.py cu scenarii pentru Extractor (invoice/unknown/malformed JSON), Decider (move vs quarantine) și Compiler (compilare reguli).
Include un test de pipeline end-to-end (Compiler → Extractor → Decider) cu output-uri mock-uite.
Include teste Pydantic pentru ActionDecision, ExtractionResult, ExtractedEntity.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    # DeciderAgent și CompilerAgent folosesc chain-uri cu StrOutputParser/PydanticOutputParser,
+    # deci mock-uim __or__ ca să suporte sintaxa `prompt | llm | parser`
+    llm.__or__ = lambda self, other: _ChainMock(response_text, other)
+    return llm


+class _ChainMock:
+    """Simulează un LangChain chain: ignoră prompt-ul și returnează mereu același text."""
+
+    def __init__(self, text: str, next_step: Any) -> None:
+        self._text = text
+        self._next = next_step
+
+    def __or__(self, other: Any) -> "_ChainMock":
+        return _ChainMock(self._text, other)
+
+    def invoke(self, _: Any) -> Any:
+        # Dacă next_step e un parser Pydantic/Str, îl invocăm cu textul nostru
+        if hasattr(self._next, "parse"):
+            return self._next.parse(self._text)
+        if hasattr(self._next, "invoke"):
+            return self._next.invoke(self._text)
+        return self._text
+


+    llm.invoke.side_effect = side_effect
+    llm.__or__ = lambda self, other: _ChainMock(REPAIRED_JSON, other)
+


+def test_compiler_extracts_category_from_invoice_prompt() -> None:
+    agent = CompilerAgent(llm=_make_llm(COMPILER_FACTURA_JSON))
+    agent._chain = MagicMock()
+    agent._chain.invoke = MagicMock(
+        return_value=CompiledRule(
+            category="factura",


+def test_action_decision_sanitizes_filename_slashes() -> None:
+    d = ActionDecision(
+        status="move", suggested_name="factura/enel:2023.pdf", suggested_folder="."
+    )
+    assert "/" not in d.suggested_name
+    assert ":" not in d.suggested_name
+
+
+def test_action_decision_rejects_invalid_status() -> None:
+    with pytest.raises(ValidationError):
+        ActionDecision(
+            status="delete", suggested_name="doc.pdf", suggested_folder="Trash"
+        )
+


test: implementare evals complete pentru cei 3 agenti AI

234a748

22 teste care evalueaza ExtractorAgent, DeciderAgent si CompilerAgent folosind mock-uri de LLM — ruleaza in CI fara API key. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Rosca-Teodora requested a review from Copilot June 15, 2026 16:20

Copilot started work on behalf of Rosca-Teodora June 15, 2026 16:20 View session

Copilot started reviewing on behalf of Rosca-Teodora June 15, 2026 16:23 View session

style: format ai/llm_config.py for CI ruff check

98faee3

Copilot finished work on behalf of Rosca-Teodora June 15, 2026 16:24

Rosca-Teodora merged commit e5fc14d into master Jun 15, 2026
4 checks passed

Copilot AI reviewed Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: implementare evals complete pentru cei 3 agenti AI#76

test: implementare evals complete pentru cei 3 agenti AI#76
Rosca-Teodora merged 2 commits into
masterfrom
ultimeleTeste

Rosca-Teodora commented Jun 15, 2026

Uh oh!

Rosca-Teodora commented Jun 15, 2026

Uh oh!

Copilot AI commented Jun 15, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		llm.invoke.side_effect = side_effect
		llm.__or__ = lambda self, other: _ChainMock(REPAIRED_JSON, other)

Conversation

Rosca-Teodora commented Jun 15, 2026

Uh oh!

Rosca-Teodora commented Jun 15, 2026

Uh oh!

Copilot AI commented Jun 15, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants