-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Description
- Package Name: azureml-mlflow
- Package Version: 1.61.0.post1
- Operating System: Ubuntu 22.04.5 LTS
- Python Version: 3.12
Describe the bug
Azure ML's MLflow tracking backend filters mlflow.source.name and mlflow.source.type tags when passed to create_run(tags=...) but allows the same tags when set via set_tag() after run creation. This inconsistency breaks integrations with tools like PyTorch Lightning.
To Reproduce
Steps to reproduce the behavior:
- Create a run with source tags at creation time:
client.create_run(experiment_id="...", tags={ "mlflow.source.name": "https://github.com/org/repo", "mlflow.source.type": "PROJECT" })
Result: Tags are filtered out ❌
- Create a run and set tags afterwards
run = client.create_run(experiment_id="...")
client.set_tag(run.info.run_id, "mlflow.source.name", "https://github.com/org/repo")
client.set_tag(run.info.run_id, "mlflow.source.type", "PROJECT")
Result: Tags are preserved ✅
Impact
PyTorch Lightning MLFlowLogger broken: Lightning passes tags to create_run(), causing all source tags to be lost
Standard MLflow patterns fail: The recommended MLflow pattern uses tags at creation
Confusing behavior: No documentation explains this filtering
Expected behavior
Either:
Option A: Allow mlflow.source.* tags in both scenarios (preferred)
Option B: Filter them in both scenarios and document why
Option C: Document the current behavior and provide guidance
Additional context
Noticed this difference when planning to migrate to Mlflow in AzureML from a self-hosted version of MlFlow v2.12.1.