Add sample for evaluation with inline data download#45237

Open

YoYoJa wants to merge 8 commits intomainfrom

jessli/sdk_example_download_results

Contributor

YoYoJa commented Feb 18, 2026

This sample demonstrates how to create, retrieve, and list evaluations and eval runs using inline dataset content, including downloading output items with pagination.

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.


          Add sample for evaluation with inline data download

7f2b24e

This sample demonstrates how to create, retrieve, and list evaluations and eval runs using inline dataset content, including downloading output items with pagination.

Copilot AI review requested due to automatic review settings

February 18, 2026 13:58

YoYoJa requested review from dargilco, glharper, howieleung, kingernupur, nick863, trangevi and trrwilson as code owners

February 18, 2026 13:58

github-actions bot added the AI Projects label

Copilot started reviewing on behalf of YoYoJa

February 18, 2026 14:00

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

This PR adds a new sample file demonstrating how to create and run evaluations with inline data while downloading output items using manual pagination. The sample builds on existing inline data evaluation samples by adding explicit pagination logic and file download functionality to showcase how to retrieve large result sets page-by-page.

Changes:

Added sample_evaluation_builtin_with_inline_data_download_output_items.py demonstrating evaluation creation, execution, and paginated output item download to a JSONL file

...ects/samples/evaluations/sample_evaluation_builtin_with_inline_data_download_output_items.py

Comment on lines +193 to +196

+                          with open(download_data_file, 'w') as f:
+                              for item in all_output_items:
+                                  item_dict = item.to_dict() if hasattr(item, 'to_dict') else item
+                                  f.write(json.dumps(item_dict, default=str) + '\n')

Copilot AI Feb 18, 2026

Missing error handling for file write operations: The code writes to a file without handling potential IOError exceptions (e.g., disk full, permission denied, directory doesn't exist). Consider adding try-except blocks around the file operations to provide better error messages and graceful failure handling.

Copilot uses AI. Check for mistakes.

...ects/samples/evaluations/sample_evaluation_builtin_with_inline_data_download_output_items.py Outdated Show resolved Hide resolved

...ects/samples/evaluations/sample_evaluation_builtin_with_inline_data_download_output_items.py Outdated Show resolved Hide resolved

...ects/samples/evaluations/sample_evaluation_builtin_with_inline_data_download_output_items.py Outdated

Comment on lines 160 to 200

+                      if run.status == "completed" or run.status == "failed":
+                          print(f"Eval Run Report URL: {run.report_url}")
+                          # Fetch all output items with pagination
+                          all_output_items = []
+                          after = None
+                          while True:
+                              if after:
+                                  page = client.evals.runs.output_items.list(run_id=run.id, eval_id=eval_object.id, limit=100, after=after)
+                              else:
+                                  page = client.evals.runs.output_items.list(run_id=run.id, eval_id=eval_object.id, limit=100)
+                              # Convert page to dict to access properties
+                              page_dict = page.to_dict() if hasattr(page, 'to_dict') else page
+                              # Add items from this page
+                              page_data = page_dict.get('data', []) if isinstance(page_dict, dict) else list(page)
+                              all_output_items.extend(page_data)
+                              # Check if there are more pages
+                              has_more = page_dict.get('has_more', False) if isinstance(page_dict, dict) else False
+                              if not has_more:
+                                  break
+                              # Get the cursor for next page
+                              after = page_dict.get('last_id') if isinstance(page_dict, dict) else None
+                              if not after:
+                                  break
+                              print(f"Fetched {len(page_data)} items, continuing pagination...")
+                          # Write all output items to JSONL file
+                          with open(download_data_file, 'w') as f:
+                              for item in all_output_items:
+                                  item_dict = item.to_dict() if hasattr(item, 'to_dict') else item
+                                  f.write(json.dumps(item_dict, default=str) + '\n')
+                          print(f"All output items written to {download_data_file} ({len(all_output_items)} items)")
+                          break

Copilot AI Feb 18, 2026

No error handling for failed evaluation runs: When the evaluation run status is "failed" (line 160), the code still attempts to download output items and write them to a file. Consider adding specific handling for the "failed" status to inform the user about the failure and potentially skip the download, or handle it differently than completed runs.

Copilot uses AI. Check for mistakes.

...ects/samples/evaluations/sample_evaluation_builtin_with_inline_data_download_output_items.py

Comment on lines 158 to 202

+                  while True:
+                      run = client.evals.runs.retrieve(run_id=eval_run_response.id, eval_id=eval_object.id)
+                      if run.status == "completed" or run.status == "failed":
+                          print(f"Eval Run Report URL: {run.report_url}")
+                          # Fetch all output items with pagination
+                          all_output_items = []
+                          after = None
+                          while True:
+                              if after:
+                                  page = client.evals.runs.output_items.list(run_id=run.id, eval_id=eval_object.id, limit=100, after=after)
+                              else:
+                                  page = client.evals.runs.output_items.list(run_id=run.id, eval_id=eval_object.id, limit=100)
+                              # Convert page to dict to access properties
+                              page_dict = page.to_dict() if hasattr(page, 'to_dict') else page
+                              # Add items from this page
+                              page_data = page_dict.get('data', []) if isinstance(page_dict, dict) else list(page)
+                              all_output_items.extend(page_data)
+                              # Check if there are more pages
+                              has_more = page_dict.get('has_more', False) if isinstance(page_dict, dict) else False
+                              if not has_more:
+                                  break
+                              # Get the cursor for next page
+                              after = page_dict.get('last_id') if isinstance(page_dict, dict) else None
+                              if not after:
+                                  break
+                              print(f"Fetched {len(page_data)} items, continuing pagination...")
+                          # Write all output items to JSONL file
+                          with open(download_data_file, 'w') as f:
+                              for item in all_output_items:
+                                  item_dict = item.to_dict() if hasattr(item, 'to_dict') else item
+                                  f.write(json.dumps(item_dict, default=str) + '\n')
+                          print(f"All output items written to {download_data_file} ({len(all_output_items)} items)")
+                          break
+                      time.sleep(5)
+                      print("Waiting for eval run to complete...")

Copilot AI Feb 18, 2026

Missing timeout mechanism: The polling loop (lines 158-202) has no maximum iteration count or timeout, which could cause the sample to run indefinitely if the evaluation run never reaches a terminal state. Consider adding a timeout or maximum retry count to prevent infinite loops. Other samples in the codebase have the same pattern, but for a demonstration sample, adding a timeout would be a best practice.

Copilot uses AI. Check for mistakes.

...ects/samples/evaluations/sample_evaluation_builtin_with_inline_data_download_output_items.py Outdated

Comment on lines 167 to 190

+                          while True:
+                              if after:
+                                  page = client.evals.runs.output_items.list(run_id=run.id, eval_id=eval_object.id, limit=100, after=after)
+                              else:
+                                  page = client.evals.runs.output_items.list(run_id=run.id, eval_id=eval_object.id, limit=100)
+                              # Convert page to dict to access properties
+                              page_dict = page.to_dict() if hasattr(page, 'to_dict') else page
+                              # Add items from this page
+                              page_data = page_dict.get('data', []) if isinstance(page_dict, dict) else list(page)
+                              all_output_items.extend(page_data)
+                              # Check if there are more pages
+                              has_more = page_dict.get('has_more', False) if isinstance(page_dict, dict) else False
+                              if not has_more:
+                                  break
+                              # Get the cursor for next page
+                              after = page_dict.get('last_id') if isinstance(page_dict, dict) else None
+                              if not after:
+                                  break
+                              print(f"Fetched {len(page_data)} items, continuing pagination...")

Copilot AI Feb 18, 2026

Potential infinite loop: If the pagination response structure doesn't match expectations and neither 'has_more' is False nor 'last_id' is None, this loop could continue indefinitely. Consider adding a maximum iteration count or total item count check as a safety measure.

Copilot uses AI. Check for mistakes.

...ects/samples/evaluations/sample_evaluation_builtin_with_inline_data_download_output_items.py Show resolved Hide resolved

...ects/samples/evaluations/sample_evaluation_builtin_with_inline_data_download_output_items.py Outdated Show resolved Hide resolved

...ects/samples/evaluations/sample_evaluation_builtin_with_inline_data_download_output_items.py Outdated Show resolved Hide resolved

...ects/samples/evaluations/sample_evaluation_builtin_with_inline_data_download_output_items.py Outdated Show resolved Hide resolved

YoYoJa and others added 7 commits

February 18, 2026 06:15


          Apply suggestion from @Copilot

4bf42f6

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>


          Apply suggestion from @Copilot

7f42916

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>


          Apply suggestion from @Copilot

9606a1b

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>


          Apply suggestion from @Copilot

70e6da3

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>


          Apply suggestion from @Copilot

3f7f7d6

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>


          Apply suggestion from @Copilot

02e1712

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>


          Modify eval run status check condition

4f7d46a

Updated evaluation run status check to only consider 'completed' status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

Copilot code review Copilot Copilot left review comments

dargilco Awaiting requested review from dargilco dargilco is a code owner

glharper Awaiting requested review from glharper glharper is a code owner

howieleung Awaiting requested review from howieleung howieleung is a code owner

kingernupur Awaiting requested review from kingernupur kingernupur is a code owner

nick863 Awaiting requested review from nick863 nick863 is a code owner

trangevi Awaiting requested review from trangevi trangevi is a code owner

trrwilson Awaiting requested review from trrwilson trrwilson is a code owner

At least 1 approving review is required to merge this pull request.

Labels