Skip to content

Fix top sample data#45214

Open
YoYoJa wants to merge 2 commits intomainfrom
jessli/FixTopSampleData
Open

Fix top sample data#45214
YoYoJa wants to merge 2 commits intomainfrom
jessli/FixTopSampleData

Conversation

@YoYoJa
Copy link
Contributor

@YoYoJa YoYoJa commented Feb 16, 2026

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

@YoYoJa YoYoJa requested a review from a team as a code owner February 16, 2026 19:15
Copilot AI review requested due to automatic review settings February 16, 2026 19:15
@github-actions github-actions bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Feb 16, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes the handling of top-level sample data in evaluation output conversion. The changes address how generated sample data is parsed and stored, and ensure token counts are properly normalized to integers instead of floats.

Changes:

  • Added parsing logic for sample.generated_sample_data field to extract and populate top-level sample information
  • Modified token count fields (prompt_tokens, completion_tokens, total_tokens) to convert float values to integers
  • Updated test data files to include new sample data fields and reflect integer token counts

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py Added logic to parse and extract generated_sample_data from input, and modified token count handling to convert values to integers
sdk/evaluation/azure-ai-evaluation/tests/unittests/data/evaluation_util_convert_old_output_test.jsonl Updated test input data to include sample.generated_sample_data and output_status fields
sdk/evaluation/azure-ai-evaluation/tests/unittests/data/evaluation_util_convert_expected_output.json Updated expected output to reflect integer token counts and new top-level sample data structure

Comment on lines +2370 to +2372
except Exception as e:
logger.error(
f"Failed to parse generated_sample_data as JSON for row {row_idx}, eval_id: {eval_id}, eval_run_id: {eval_run_id}. Storing as string. Error: {e}"
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error log message may expose sensitive user content from generated_sample_data. According to the repository's security practices for log redaction, user-provided content should be redacted from log messages to prevent credentials or sensitive information from appearing in telemetry. The generated_sample_data can contain queries, responses, and other user inputs that should not be logged. Consider using logger.debug instead of logger.error, or ensure the exception message doesn't contain the actual data content.

Suggested change
except Exception as e:
logger.error(
f"Failed to parse generated_sample_data as JSON for row {row_idx}, eval_id: {eval_id}, eval_run_id: {eval_run_id}. Storing as string. Error: {e}"
except Exception:
logger.error(
"Failed to parse generated_sample_data as JSON for row %s, eval_id: %s, eval_run_id: %s. "
"Storing as string.",
row_idx,
eval_id,
eval_run_id,
exc_info=True,

Copilot uses AI. Check for mistakes.
Comment on lines +2714 to +2716
metric_dict["sample"]["usage"]["total_tokens"] = (
None if _is_none_or_nan(metric_value) else int(float(metric_value))
)
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The int(float(metric_value)) conversion can raise ValueError if metric_value is a string that cannot be converted to float. Consider adding exception handling or validation before conversion to handle edge cases gracefully. For example, if metric_value is an empty string or contains non-numeric characters, this will crash.

Copilot uses AI. Check for mistakes.
Comment on lines +2722 to +2724
metric_dict["sample"]["usage"]["prompt_tokens"] = (
None if _is_none_or_nan(metric_value) else int(float(metric_value))
)
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The int(float(metric_value)) conversion can raise ValueError if metric_value is a string that cannot be converted to float. Consider adding exception handling or validation before conversion to handle edge cases gracefully. For example, if metric_value is an empty string or contains non-numeric characters, this will crash.

Copilot uses AI. Check for mistakes.
Comment on lines +2730 to +2732
metric_dict["sample"]["usage"]["completion_tokens"] = (
None if _is_none_or_nan(metric_value) else int(float(metric_value))
)
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The int(float(metric_value)) conversion can raise ValueError if metric_value is a string that cannot be converted to float. Consider adding exception handling or validation before conversion to handle edge cases gracefully. For example, if metric_value is an empty string or contains non-numeric characters, this will crash.

Copilot uses AI. Check for mistakes.
except Exception as e:
logger.error(
f"Failed to parse generated_sample_data as JSON for row {row_idx}, eval_id: {eval_id}, eval_run_id: {eval_run_id}. Storing as string. Error: {e}"
)
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error log message states "Storing as string" but the code doesn't actually store the generated_sample_data when JSON parsing fails. The top_sample remains an empty dict, and the generated_sample_data is not preserved in input_data either. This means the data is lost rather than stored as a string. Either remove the misleading "Storing as string" text from the error message, or actually implement fallback logic to store the raw string value.

Suggested change
)
)
# Fallback: store the raw string value in the sample when JSON parsing fails
top_sample = {"generated_sample_data": top_sample_str}
# Remove the raw string from input_data to keep behavior consistent
input_data.pop("sample.generated_sample_data", None)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Evaluation Issues related to the client library for Azure AI Evaluation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments