Skip to content

Conversation

@Chibionos
Copy link
Contributor

@Chibionos Chibionos commented Jan 22, 2026

Problem

The eval flow had two issues:

  1. Inconsistency in handling resume options compared to the debug flow:

    • Debug flow (cli_debug.py): Always explicitly passes UiPathExecuteOptions(resume=resume) regardless of the resume value
    • Eval flow (_runtime.py): Only passed UiPathExecuteOptions(resume=True) when resume=True, but passed no options at all when resume=False
  2. Missing validation for the unsupported scenario of running multiple evaluations with resume mode enabled

Code Comparison (Issue #1)

Before (eval flow):

if self.context.resume:
    options = UiPathExecuteOptions(resume=True)
    result = await execution_runtime.execute(input=..., options=options)
else:
    result = await execution_runtime.execute(input=...)  # ❌ No options!

Debug flow (correct pattern):

result = await debug_runtime.execute(
    ctx.get_input(),
    options=UiPathExecuteOptions(resume=resume),  # ✅ Always explicit
)

Issue #2: Missing Validation

Resume mode relies on checkpoint discovery using a consistent thread_id. When multiple evaluations run in parallel, each creates separate runtime contexts, making it impossible to determine which checkpoint to resume from. This was not being caught, potentially leading to confusing runtime behavior.

Solutions

1. Consistent Options Passing

Made the eval flow consistent with the debug flow by:

  • Always passing UiPathExecuteOptions explicitly, regardless of resume value
  • Simplifying the if/else logic by separating input determination from execution
  • Adding a comment explaining the consistency rationale

After:

if self.context.resume:
    logger.info(f"Resuming evaluation {eval_item.id}")
    input = input_overrides if self.context.job_id is None else None
else:
    input = inputs_with_overrides

# Always pass UiPathExecuteOptions explicitly for consistency with debug flow
options = UiPathExecuteOptions(resume=self.context.resume)
result = await execution_runtime.execute(input=input, options=options)

2. Resume Mode Validation

Added early validation in initiate_evaluation() to catch the unsupported scenario:

# Validate that resume mode is not used with multiple evaluations
if self.context.resume and len(evaluation_set.evaluations) > 1:
    raise ValueError(
        f"Resume mode is not supported with multiple evaluations. "
        f"Found {len(evaluation_set.evaluations)} evaluations in the set. "
        f"Please run with a single evaluation using --eval-ids to specify one evaluation."
    )

Benefits:

  • ✅ Fails fast before any expensive operations (creating spans, loading evaluators, publishing events)
  • ✅ Provides clear, actionable error message with guidance on how to fix
  • ✅ Prevents confusing runtime behavior from the architectural limitation

Testing

  • ✅ All existing tests pass (1807 tests)
  • ✅ Added test_resume_with_multiple_evaluations_raises_error() to verify validation
  • ✅ Created multiple-evals.json test fixture with 2 evaluations
  • ✅ Functionally equivalent to previous behavior for single evaluations
  • ✅ Makes explicit what was previously implicit
  • ✅ Matches debug flow pattern

Impact

  • Low risk: No functional behavior change for valid usage patterns
  • High value:
    • Improves code consistency, maintainability, and explicitness
    • Prevents users from encountering confusing errors at runtime
    • Provides clear guidance when an unsupported scenario is attempted

@github-actions github-actions bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Jan 22, 2026
@Chibionos Chibionos force-pushed the fix/eval-resume-options-consistency branch 2 times, most recently from 4bba0d8 to a61549e Compare January 22, 2026 18:04
Chibi Vikram added 2 commits January 23, 2026 08:19
Previously, the eval flow only passed UiPathExecuteOptions when resume=True,
but passed no options at all when resume=False. This is inconsistent with
the debug flow (cli_debug.py) which always explicitly passes the options.

This change:
- Makes the eval flow always pass UiPathExecuteOptions explicitly
- Simplifies the if/else logic by separating input determination from execution
- Ensures consistency across both debug and eval commands
- Makes the resume=False intent explicit rather than relying on default behavior

While functionally equivalent (execute() accepts options=None and defaults
resume to False), this change improves code maintainability and explicitness.
Add comprehensive unit tests to verify that UiPathExecuteOptions is always
passed explicitly in the eval flow, matching the debug flow pattern.

Tests verify:
- UiPathExecuteOptions(resume=False) is passed when resume=False
- UiPathExecuteOptions(resume=True) is passed when resume=True
- Options are NEVER None, always explicit
- Behavior is consistent with debug flow

Uses mocking to directly test the execute_runtime method, ensuring the
specific code path we modified is properly tested.
@Chibionos Chibionos force-pushed the fix/eval-resume-options-consistency branch from a61549e to c336c36 Compare January 23, 2026 16:20
Copy link
Collaborator

@akshaylive akshaylive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this work when num_workers > 0? If it isn't supposed to work, we need to alert it somehow and halt execution. In other words, we may want to pass the resume to only the runtime instance that had previously suspended, and not to all runtime instances.

@smflorentino smflorentino self-requested a review January 23, 2026 16:33
@Chibionos
Copy link
Contributor Author

How would this work when num_workers > 0? If it isn't supposed to work, we need to alert it somehow and halt execution. In other words, we may want to pass the resume to only the runtime instance that had previously suspended, and not to all runtime instances.

It won't work for more than 1, its a limitation of suspend and resume. We can raise an error.

- Add ValueError when resume mode is used with multiple evaluations
- Validates early in initiate_evaluation() before expensive operations
- Provides clear error message with guidance to use --eval-ids
- Add test coverage with new multiple-evals.json fixture
- Add test_resume_with_multiple_evaluations_raises_error() test
@Chibionos Chibionos changed the title fix: always pass UiPathExecuteOptions in eval flow for consistency fix: improve eval resume flow consistency and add validation Jan 23, 2026
Copy link
Collaborator

@akshaylive akshaylive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants