Add sample for evaluation with inline data download#45237
Add sample for evaluation with inline data download#45237
Conversation
This sample demonstrates how to create, retrieve, and list evaluations and eval runs using inline dataset content, including downloading output items with pagination.
There was a problem hiding this comment.
Pull request overview
This PR adds a new sample file demonstrating how to create and run evaluations with inline data while downloading output items using manual pagination. The sample builds on existing inline data evaluation samples by adding explicit pagination logic and file download functionality to showcase how to retrieve large result sets page-by-page.
Changes:
- Added
sample_evaluation_builtin_with_inline_data_download_output_items.pydemonstrating evaluation creation, execution, and paginated output item download to a JSONL file
| with open(download_data_file, 'w') as f: | ||
| for item in all_output_items: | ||
| item_dict = item.to_dict() if hasattr(item, 'to_dict') else item | ||
| f.write(json.dumps(item_dict, default=str) + '\n') |
There was a problem hiding this comment.
Missing error handling for file write operations: The code writes to a file without handling potential IOError exceptions (e.g., disk full, permission denied, directory doesn't exist). Consider adding try-except blocks around the file operations to provide better error messages and graceful failure handling.
...ects/samples/evaluations/sample_evaluation_builtin_with_inline_data_download_output_items.py
Outdated
Show resolved
Hide resolved
...ects/samples/evaluations/sample_evaluation_builtin_with_inline_data_download_output_items.py
Outdated
Show resolved
Hide resolved
| if run.status == "completed" or run.status == "failed": | ||
| print(f"Eval Run Report URL: {run.report_url}") | ||
|
|
||
| # Fetch all output items with pagination | ||
| all_output_items = [] | ||
| after = None | ||
|
|
||
| while True: | ||
| if after: | ||
| page = client.evals.runs.output_items.list(run_id=run.id, eval_id=eval_object.id, limit=100, after=after) | ||
| else: | ||
| page = client.evals.runs.output_items.list(run_id=run.id, eval_id=eval_object.id, limit=100) | ||
|
|
||
| # Convert page to dict to access properties | ||
| page_dict = page.to_dict() if hasattr(page, 'to_dict') else page | ||
|
|
||
| # Add items from this page | ||
| page_data = page_dict.get('data', []) if isinstance(page_dict, dict) else list(page) | ||
| all_output_items.extend(page_data) | ||
|
|
||
| # Check if there are more pages | ||
| has_more = page_dict.get('has_more', False) if isinstance(page_dict, dict) else False | ||
| if not has_more: | ||
| break | ||
|
|
||
| # Get the cursor for next page | ||
| after = page_dict.get('last_id') if isinstance(page_dict, dict) else None | ||
| if not after: | ||
| break | ||
|
|
||
| print(f"Fetched {len(page_data)} items, continuing pagination...") | ||
|
|
||
| # Write all output items to JSONL file | ||
| with open(download_data_file, 'w') as f: | ||
| for item in all_output_items: | ||
| item_dict = item.to_dict() if hasattr(item, 'to_dict') else item | ||
| f.write(json.dumps(item_dict, default=str) + '\n') | ||
|
|
||
| print(f"All output items written to {download_data_file} ({len(all_output_items)} items)") | ||
|
|
||
| break |
There was a problem hiding this comment.
No error handling for failed evaluation runs: When the evaluation run status is "failed" (line 160), the code still attempts to download output items and write them to a file. Consider adding specific handling for the "failed" status to inform the user about the failure and potentially skip the download, or handle it differently than completed runs.
| while True: | ||
| run = client.evals.runs.retrieve(run_id=eval_run_response.id, eval_id=eval_object.id) | ||
| if run.status == "completed" or run.status == "failed": | ||
| print(f"Eval Run Report URL: {run.report_url}") | ||
|
|
||
| # Fetch all output items with pagination | ||
| all_output_items = [] | ||
| after = None | ||
|
|
||
| while True: | ||
| if after: | ||
| page = client.evals.runs.output_items.list(run_id=run.id, eval_id=eval_object.id, limit=100, after=after) | ||
| else: | ||
| page = client.evals.runs.output_items.list(run_id=run.id, eval_id=eval_object.id, limit=100) | ||
|
|
||
| # Convert page to dict to access properties | ||
| page_dict = page.to_dict() if hasattr(page, 'to_dict') else page | ||
|
|
||
| # Add items from this page | ||
| page_data = page_dict.get('data', []) if isinstance(page_dict, dict) else list(page) | ||
| all_output_items.extend(page_data) | ||
|
|
||
| # Check if there are more pages | ||
| has_more = page_dict.get('has_more', False) if isinstance(page_dict, dict) else False | ||
| if not has_more: | ||
| break | ||
|
|
||
| # Get the cursor for next page | ||
| after = page_dict.get('last_id') if isinstance(page_dict, dict) else None | ||
| if not after: | ||
| break | ||
|
|
||
| print(f"Fetched {len(page_data)} items, continuing pagination...") | ||
|
|
||
| # Write all output items to JSONL file | ||
| with open(download_data_file, 'w') as f: | ||
| for item in all_output_items: | ||
| item_dict = item.to_dict() if hasattr(item, 'to_dict') else item | ||
| f.write(json.dumps(item_dict, default=str) + '\n') | ||
|
|
||
| print(f"All output items written to {download_data_file} ({len(all_output_items)} items)") | ||
|
|
||
| break | ||
| time.sleep(5) | ||
| print("Waiting for eval run to complete...") |
There was a problem hiding this comment.
Missing timeout mechanism: The polling loop (lines 158-202) has no maximum iteration count or timeout, which could cause the sample to run indefinitely if the evaluation run never reaches a terminal state. Consider adding a timeout or maximum retry count to prevent infinite loops. Other samples in the codebase have the same pattern, but for a demonstration sample, adding a timeout would be a best practice.
| while True: | ||
| if after: | ||
| page = client.evals.runs.output_items.list(run_id=run.id, eval_id=eval_object.id, limit=100, after=after) | ||
| else: | ||
| page = client.evals.runs.output_items.list(run_id=run.id, eval_id=eval_object.id, limit=100) | ||
|
|
||
| # Convert page to dict to access properties | ||
| page_dict = page.to_dict() if hasattr(page, 'to_dict') else page | ||
|
|
||
| # Add items from this page | ||
| page_data = page_dict.get('data', []) if isinstance(page_dict, dict) else list(page) | ||
| all_output_items.extend(page_data) | ||
|
|
||
| # Check if there are more pages | ||
| has_more = page_dict.get('has_more', False) if isinstance(page_dict, dict) else False | ||
| if not has_more: | ||
| break | ||
|
|
||
| # Get the cursor for next page | ||
| after = page_dict.get('last_id') if isinstance(page_dict, dict) else None | ||
| if not after: | ||
| break | ||
|
|
||
| print(f"Fetched {len(page_data)} items, continuing pagination...") |
There was a problem hiding this comment.
Potential infinite loop: If the pagination response structure doesn't match expectations and neither 'has_more' is False nor 'last_id' is None, this loop could continue indefinitely. Consider adding a maximum iteration count or total item count check as a safety measure.
...ects/samples/evaluations/sample_evaluation_builtin_with_inline_data_download_output_items.py
Show resolved
Hide resolved
...ects/samples/evaluations/sample_evaluation_builtin_with_inline_data_download_output_items.py
Outdated
Show resolved
Hide resolved
...ects/samples/evaluations/sample_evaluation_builtin_with_inline_data_download_output_items.py
Outdated
Show resolved
Hide resolved
...ects/samples/evaluations/sample_evaluation_builtin_with_inline_data_download_output_items.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Updated evaluation run status check to only consider 'completed' status.
This sample demonstrates how to create, retrieve, and list evaluations and eval runs using inline dataset content, including downloading output items with pagination.
Description
Please add an informative description that covers that changes made by the pull request and link all relevant issues.
If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines