Skip to content

lazy-load model handlers to decouple megatron deps from tinker path#700

Open
hardik-vala wants to merge 1 commit into
OpenPipe:mainfrom
hardik-vala:tinker-megatron-fix
Open

lazy-load model handlers to decouple megatron deps from tinker path#700
hardik-vala wants to merge 1 commit into
OpenPipe:mainfrom
hardik-vala:tinker-megatron-fix

Conversation

@hardik-vala
Copy link
Copy Markdown

Summary

  • from art.tinker import TinkerBackend; await model.register(backend) previously failed with ModuleNotFoundError: No module named 'megatron' on environments without megatron-core installed (e.g. macOS, where the [megatron] extra can't install due to transformer-engine / apex / deep-ep).
  • the chain was tinker._get_service → art.dev.get_model_config → art.megatron.model_support.registry, whose top-level handler imports pulled handlers/qwen3_5.py and its top-level from megatron.core....
  • introduces HandlerMeta class in spec.py as the single source of truth for each registered handler's key, native_vllm_lora_status, and lazy-load address (module, attr). registry.py builds every ModelSupportSpec from these metas and resolves handlers lazily via importlib.import_module on the first call to get_model_support_handler*. The handler classes also read their key / native_vllm_lora_status from the same metas so the values can't drift.
  • net effect: import art.tinker, from art.dev.get_model_config import get_model_config, and TinkerBackend._get_service(model) all succeed without megatron-core. Megatron-equipped paths are unchanged — same handler instances, just imported on first lookup instead of at registry load.

Test plan

  • On a venv with pip install -e '.[tinker]' and no megatron-core:
    • python -c "import art.tinker.backend" succeeds
    • get_model_config(base_model='Qwen/Qwen3-4B-Instruct-2507', ...) returns the expected
      peft_args["target_modules"]
    • default_target_modules_for_model('Qwen/Qwen3-30B-A3B-Instruct-2507') returns [..., 'experts']
    • await TinkerBackend()._get_service(model) constructs TinkerService without importing megatron
  • uv run prek run --all-files clean

The Tinker backend's call to art.dev.get_model_config transitively imported
art.megatron.model_support.registry, which eagerly imported every handler
module — including handlers/qwen3_5.py, whose top-level
`from megatron.core...` made `megatron-core` a hard install-time dependency
for Tinker users.

This change introduces a single source of truth for handler key, status,
and lazy-load address as `HandlerMeta` records in spec.py. The registry
builds specs from these metas and dynamically imports the corresponding handler
module on the first call to get_model_support_handler*. The handler classes
themselves read their `key` / `native_vllm_lora_status` from the same metas
so the values can't drift.

Result: `art.tinker` is importable and `model.register()` runs without
megatron-core installed, while megatron-equipped paths see no behavior
change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant