Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion dataflow/run-inference/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
apache-beam[gcp]==2.49.0
torch==2.2.2
transformers==4.38.0
transformers==5.0.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Upgrading to transformers v5 is a major version bump that introduces several breaking changes and potential dependency conflicts with the existing environment.

  • Dependency Compatibility: apache-beam[gcp]==2.49.0 is an older version (released mid-2023) that typically pins dependencies like protobuf (often < 4.24.0) and numpy. transformers v5 (released 2025) likely requires much newer versions of these libraries (e.g., protobuf >= 4.25.0). This mismatch will likely cause installation failures or runtime conflicts. Consider upgrading apache-beam to a more recent version (e.g., 2.60.0 or later).
  • Internal Refactors: The import from transformers.tokenization_utils import PreTrainedTokenizer in main.py (line 31) targets an internal module that has been significantly refactored in v5. It is recommended to import PreTrainedTokenizer directly from the top-level transformers package to ensure long-term compatibility.
  • Breaking Changes: In v5, AutoConfig.from_pretrained no longer supports loading from URLs, and tokenizer.decode behavior has been unified with batch_decode (returning a list for 2D inputs). Ensure the pipeline's input/output handling in main.py is compatible with these changes, especially the string encoding step at line 138.