Skip to content

test: implementare evals complete pentru cei 3 agenti AI#76

Merged
Rosca-Teodora merged 2 commits into
masterfrom
ultimeleTeste
Jun 15, 2026
Merged

test: implementare evals complete pentru cei 3 agenti AI#76
Rosca-Teodora merged 2 commits into
masterfrom
ultimeleTeste

Conversation

@Rosca-Teodora

Copy link
Copy Markdown
Member

22 teste care evalueaza ExtractorAgent, DeciderAgent si CompilerAgent folosind mock-uri de LLM — ruleaza in CI fara API key.

22 teste care evalueaza ExtractorAgent, DeciderAgent si CompilerAgent
folosind mock-uri de LLM — ruleaza in CI fara API key.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Rosca-Teodora

Copy link
Copy Markdown
Member Author

@copilot make this pass test checks please thanks

Copilot AI commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

@copilot make this pass test checks please thanks

Fixed in 98faee3. I formatted ai/llm_config.py so ruff format --check passes, and re-ran checks locally (ruff check, ruff format --check, pytest), all green.

@Rosca-Teodora Rosca-Teodora merged commit e5fc14d into master Jun 15, 2026
4 checks passed

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adaugă un set de eval-uri (teste) pentru agenții AI din ClutterKill (ExtractorAgent, DeciderAgent, CompilerAgent) folosind mock-uri de LLM, astfel încât să ruleze în CI fără API key și să verifice parsing/validare + rutare + retry/repair pentru output malformat.

Changes:

  • Introduce tests/evals/test_agents.py cu scenarii pentru Extractor (invoice/unknown/malformed JSON), Decider (move vs quarantine) și Compiler (compilare reguli).
  • Include un test de pipeline end-to-end (Compiler → Extractor → Decider) cu output-uri mock-uite.
  • Include teste Pydantic pentru ActionDecision, ExtractionResult, ExtractedEntity.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +40 to +43
# DeciderAgent și CompilerAgent folosesc chain-uri cu StrOutputParser/PydanticOutputParser,
# deci mock-uim __or__ ca să suporte sintaxa `prompt | llm | parser`
llm.__or__ = lambda self, other: _ChainMock(response_text, other)
return llm
Comment on lines +46 to +63
class _ChainMock:
"""Simulează un LangChain chain: ignoră prompt-ul și returnează mereu același text."""

def __init__(self, text: str, next_step: Any) -> None:
self._text = text
self._next = next_step

def __or__(self, other: Any) -> "_ChainMock":
return _ChainMock(self._text, other)

def invoke(self, _: Any) -> Any:
# Dacă next_step e un parser Pydantic/Str, îl invocăm cu textul nostru
if hasattr(self._next, "parse"):
return self._next.parse(self._text)
if hasattr(self._next, "invoke"):
return self._next.invoke(self._text)
return self._text

Comment on lines +194 to +196
llm.invoke.side_effect = side_effect
llm.__or__ = lambda self, other: _ChainMock(REPAIRED_JSON, other)

Comment on lines +341 to +346
def test_compiler_extracts_category_from_invoice_prompt() -> None:
agent = CompilerAgent(llm=_make_llm(COMPILER_FACTURA_JSON))
agent._chain = MagicMock()
agent._chain.invoke = MagicMock(
return_value=CompiledRule(
category="factura",
Comment on lines +290 to +303
def test_action_decision_sanitizes_filename_slashes() -> None:
d = ActionDecision(
status="move", suggested_name="factura/enel:2023.pdf", suggested_folder="."
)
assert "/" not in d.suggested_name
assert ":" not in d.suggested_name


def test_action_decision_rejects_invalid_status() -> None:
with pytest.raises(ValidationError):
ActionDecision(
status="delete", suggested_name="doc.pdf", suggested_folder="Trash"
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants