Skip to content

Conversation

@xiaoyu-work
Copy link
Collaborator

Describe your changes

This pull request introduces compatibility updates for Hugging Face Transformers 5.0 and improves handling of dynamic cache and input formats in Olive's ONNX conversion and training utilities. It also updates tests and requirements to reflect these changes and ensure robust model export and training workflows.

Transformers 5.0 Compatibility

  • Added patching and conversion utilities for DynamicLayer.lazy_initialization, past_key_values, and dynamic shapes to support the new DynamicCache format in Transformers >= 5.0. This ensures models using dynamic cache export correctly with torch.export.
  • Updated _export_pytorch_model logic to apply the new patches and conversions only for Transformers >= 5.0, while maintaining legacy support for older versions.

Training Argument Handling

  • Improved filtering of training arguments in create_training_args to remove fields not valid for Transformers 5.0 and exclude None values, allowing Transformers to use its own defaults.

Test Suite Updates

  • Modified model loading and metadata tests to remove trust_remote_code parameter and update expected file counts and tokenizer types for Transformers 5.0. [1] [2] [3] [4]
  • Updated model output comparison in rotation tests to cast logits to float before comparison, ensuring consistency across dtypes.

Requirements Adjustments

  • Restricted onnxscript version to <0.6.1 and removed the Transformers version pin, reflecting confidence in test suite compatibility with Transformers 5.0. [1] [2]

Checklist before requesting a review

  • Add unit tests for this change.
  • Make sure all tests can pass.
  • Update documents if necessary.
  • Lint and apply fixes to your code by running lintrunner -a
  • Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

(Optional) Issue link

# Apply patches for DynamicCache / past_key_values compatibility
if version.parse(transformers.__version__) >= version.parse("5.0"):
# transformers >= 5.0: DynamicCache refactored to use DynamicLayer
from transformers.integrations.executorch import register_dynamic_cache_export_support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

register_dynamic_cache_export_support Does not have the right ordering for kvcaches. I would just use the same patch regardless of the transformers version

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean _patch_model_if_necessary? transformers 5.0 updated DynamicCache and i got error "AttributeError: 'DynamicCache' object has no attribute 'to_legacy_cache'"

Copy link
Contributor

@justinchuby justinchuby Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can update the patch code (_patch_model_if_necessary) so that it works universally. There is no need to call to_legacy_cache. The executorch integration is not reliable for our usages.

Copy link
Contributor

@justinchuby justinchuby Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@titaiwangms suggestions on what the patch logic should be?

@titaiwangms titaiwangms requested a review from xadupre February 10, 2026 22:44
# transformers >= 5.0: DynamicCache refactored to use DynamicLayer
from transformers.integrations.executorch import register_dynamic_cache_export_support

register_dynamic_cache_export_support()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xadupre Do you have any suggestion to avoid using executorch function? From Justin's reminder, it provides a wrong kv cache order?

logger.debug("Patched DynamicLayer.lazy_initialization for torch.export compatibility.")


def _convert_past_key_values_to_dynamic_cache(dummy_kwargs: dict) -> dict:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With transformers 5+, the update mechanism is not defined by the class DynamicCache but by the class of each layer. This code only works for a DynamicCache using DynamicLayer only. That won't work for a DynamicCache mixing DynamicLayer and DynamicSlidingWindowLayer. The code is fine but it is better to keep that in mind for other models using sliding windows.

soundfile
tabulate
torchvision
# Remove version pin when the tests are fixed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you add transformers>=5 unless there are some tests checking multiple versions of transformers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants