feat(0506): Layerwise export: API-driven, env-var-free, opt-in flag#1047
Open
vbaddi wants to merge 3 commits into
Open
feat(0506): Layerwise export: API-driven, env-var-free, opt-in flag#1047vbaddi wants to merge 3 commits into
vbaddi wants to merge 3 commits into
Conversation
…flag Move the layerwise export+stitch+compile orchestration loop into a single internal driver gated by a new layerwise=True kwarg on .compile() and .export(). The flag is opt-in; layerwise=False remains the default and the non-layerwise compile path is unchanged byte-for-byte. The LAYERWISE_EXPORT environment variable is removed entirely; control flows purely through the API via a process-local QEFFBaseModel._layerwise_active flag toggled by an internal context manager. Supported architectures are allowlisted (qwen3_vl_moe, qwen3_5_moe, qwen3_moe); other model types raise NotImplementedError when layerwise=True. Wired on QEFFAutoModelForImageTextToText (dual-QPC) and QEFFAutoModelForCausalLM. Five existing layerwise example scripts collapse from 200-330 lines to ~60 lines each. The encapsulation module is documented as provisional and emits a one-shot DeprecationWarning. test_model_quickcheck.py: 121 -> 127 passed, 3 skipped (unchanged) with five new tests covering the windowing helpers, the supported/unsupported guard, the env-var-not-leaked invariant, and the context manager's class-flag toggle. Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
fc7a3d9 to
f842a30
Compare
- Slim per-window export: truncate sin_cached/cos_cached to ctx_len and
null embed_tokens / lm_head when unreached.
- Fix fp16 layerwise export: _export_layerwise synthesized
inputs_embeds via torch.rand without a dtype.
- Suppress confusing "An unexpected error occurred while dumping the
qconfig" message when compile short-circuits without producing a QPC
(e.g. layerwise per-window export). dump_qconfig now skips when
qpc_path is None and demotes real failures to logger.debug.
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Contributor
|
Backward compatibility is not yet guaranteed for Qwen 3.5 and Qwen3-VL-MoE. PR #1043 addresses this issue for Qwen3-VL, but Qwen 3.5 will still need additional modeling changes. We skipped the Qwen 3.5 unit tests, which is why all tests are currently passing. |
- Add layerwise=True to from_pretrained (VLM + CausalLM). When set, the
outer model is built on the meta device via from_config, so the caller's
load no longer pulls full checkpoint weights into RAM.
- Stop polluting transformers.modeling_utils.PreTrainedModel with class
vars. Window state lives in a module-local _LAYERWISE_STATE dict; the
patched HF hooks (shard filter, init nuller) close over it and behave
as no-ops when layerwise is inactive.
- Cache layerwise ONNX between runs: _export_layerwise short-circuits
when final_data/merged_*.onnx already exists, and the stitch step
reuses it.
- WIP: Hard-cap RoPE rows at 32K for now. (was ctx_len) so changing ctx_len does not
invalidate the export hash.
- Respect explicit low_cpu_mem_usage=True in from_pretrained for VLM and
CausalLM (was unconditionally forced False); used by the layerwise
factory for window-only weight materialization on sharded checkpoints.
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Encapsulate the layerwise export+stitch+compile orchestration loop (previously a 200+ line example with monkey-patches and an LAYERWISE_EXPORT env var) behind a single
layerwise=Trueflag on.compile() / .export().What's new
Backward compatibility
layerwise=Falseis the default.pytest tests/unit_test/models/test_model_quickcheck.py -n auto → 130 passed, 3 skipped (was 121 / 3 before this PR).Tests added
Usage: Enable layerwise
Disable (default)
Just don't pass layerwise. Behavior is identical to before this PR.
Test Plan