Skip to content

Comments

Add sample for evaluation with inline data download#45237

Open
YoYoJa wants to merge 8 commits intomainfrom
jessli/sdk_example_download_results
Open

Add sample for evaluation with inline data download#45237
YoYoJa wants to merge 8 commits intomainfrom
jessli/sdk_example_download_results

Conversation

@YoYoJa
Copy link
Contributor

@YoYoJa YoYoJa commented Feb 18, 2026

This sample demonstrates how to create, retrieve, and list evaluations and eval runs using inline dataset content, including downloading output items with pagination.

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

This sample demonstrates how to create, retrieve, and list evaluations and eval runs using inline dataset content, including downloading output items with pagination.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new sample file demonstrating how to create and run evaluations with inline data while downloading output items using manual pagination. The sample builds on existing inline data evaluation samples by adding explicit pagination logic and file download functionality to showcase how to retrieve large result sets page-by-page.

Changes:

  • Added sample_evaluation_builtin_with_inline_data_download_output_items.py demonstrating evaluation creation, execution, and paginated output item download to a JSONL file

Comment on lines +193 to +196
with open(download_data_file, 'w') as f:
for item in all_output_items:
item_dict = item.to_dict() if hasattr(item, 'to_dict') else item
f.write(json.dumps(item_dict, default=str) + '\n')
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error handling for file write operations: The code writes to a file without handling potential IOError exceptions (e.g., disk full, permission denied, directory doesn't exist). Consider adding try-except blocks around the file operations to provide better error messages and graceful failure handling.

Copilot uses AI. Check for mistakes.
Comment on lines 160 to 200
if run.status == "completed" or run.status == "failed":
print(f"Eval Run Report URL: {run.report_url}")

# Fetch all output items with pagination
all_output_items = []
after = None

while True:
if after:
page = client.evals.runs.output_items.list(run_id=run.id, eval_id=eval_object.id, limit=100, after=after)
else:
page = client.evals.runs.output_items.list(run_id=run.id, eval_id=eval_object.id, limit=100)

# Convert page to dict to access properties
page_dict = page.to_dict() if hasattr(page, 'to_dict') else page

# Add items from this page
page_data = page_dict.get('data', []) if isinstance(page_dict, dict) else list(page)
all_output_items.extend(page_data)

# Check if there are more pages
has_more = page_dict.get('has_more', False) if isinstance(page_dict, dict) else False
if not has_more:
break

# Get the cursor for next page
after = page_dict.get('last_id') if isinstance(page_dict, dict) else None
if not after:
break

print(f"Fetched {len(page_data)} items, continuing pagination...")

# Write all output items to JSONL file
with open(download_data_file, 'w') as f:
for item in all_output_items:
item_dict = item.to_dict() if hasattr(item, 'to_dict') else item
f.write(json.dumps(item_dict, default=str) + '\n')

print(f"All output items written to {download_data_file} ({len(all_output_items)} items)")

break
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No error handling for failed evaluation runs: When the evaluation run status is "failed" (line 160), the code still attempts to download output items and write them to a file. Consider adding specific handling for the "failed" status to inform the user about the failure and potentially skip the download, or handle it differently than completed runs.

Copilot uses AI. Check for mistakes.
Comment on lines 158 to 202
while True:
run = client.evals.runs.retrieve(run_id=eval_run_response.id, eval_id=eval_object.id)
if run.status == "completed" or run.status == "failed":
print(f"Eval Run Report URL: {run.report_url}")

# Fetch all output items with pagination
all_output_items = []
after = None

while True:
if after:
page = client.evals.runs.output_items.list(run_id=run.id, eval_id=eval_object.id, limit=100, after=after)
else:
page = client.evals.runs.output_items.list(run_id=run.id, eval_id=eval_object.id, limit=100)

# Convert page to dict to access properties
page_dict = page.to_dict() if hasattr(page, 'to_dict') else page

# Add items from this page
page_data = page_dict.get('data', []) if isinstance(page_dict, dict) else list(page)
all_output_items.extend(page_data)

# Check if there are more pages
has_more = page_dict.get('has_more', False) if isinstance(page_dict, dict) else False
if not has_more:
break

# Get the cursor for next page
after = page_dict.get('last_id') if isinstance(page_dict, dict) else None
if not after:
break

print(f"Fetched {len(page_data)} items, continuing pagination...")

# Write all output items to JSONL file
with open(download_data_file, 'w') as f:
for item in all_output_items:
item_dict = item.to_dict() if hasattr(item, 'to_dict') else item
f.write(json.dumps(item_dict, default=str) + '\n')

print(f"All output items written to {download_data_file} ({len(all_output_items)} items)")

break
time.sleep(5)
print("Waiting for eval run to complete...")
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing timeout mechanism: The polling loop (lines 158-202) has no maximum iteration count or timeout, which could cause the sample to run indefinitely if the evaluation run never reaches a terminal state. Consider adding a timeout or maximum retry count to prevent infinite loops. Other samples in the codebase have the same pattern, but for a demonstration sample, adding a timeout would be a best practice.

Copilot uses AI. Check for mistakes.
Comment on lines 167 to 190
while True:
if after:
page = client.evals.runs.output_items.list(run_id=run.id, eval_id=eval_object.id, limit=100, after=after)
else:
page = client.evals.runs.output_items.list(run_id=run.id, eval_id=eval_object.id, limit=100)

# Convert page to dict to access properties
page_dict = page.to_dict() if hasattr(page, 'to_dict') else page

# Add items from this page
page_data = page_dict.get('data', []) if isinstance(page_dict, dict) else list(page)
all_output_items.extend(page_data)

# Check if there are more pages
has_more = page_dict.get('has_more', False) if isinstance(page_dict, dict) else False
if not has_more:
break

# Get the cursor for next page
after = page_dict.get('last_id') if isinstance(page_dict, dict) else None
if not after:
break

print(f"Fetched {len(page_data)} items, continuing pagination...")
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential infinite loop: If the pagination response structure doesn't match expectations and neither 'has_more' is False nor 'last_id' is None, this loop could continue indefinitely. Consider adding a maximum iteration count or total item count check as a safety measure.

Copilot uses AI. Check for mistakes.
YoYoJa and others added 7 commits February 18, 2026 06:15
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Updated evaluation run status check to only consider 'completed' status.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant