Skip to content

Add model converter module#1

Open
fuheaven wants to merge 381 commits into
mainfrom
dcu
Open

Add model converter module#1
fuheaven wants to merge 381 commits into
mainfrom
dcu

Conversation

@fuheaven

Copy link
Copy Markdown
Owner

No description provided.

helloyongyang and others added 30 commits November 10, 2025 17:12
Co-authored-by: Yang Yong (雍洋) <yongyang1030@163.com>
Co-authored-by: qinxinyi <qxy118045534@163.com>
Co-authored-by: yihuiwen <yihuiwen@sensetime.com>
Co-authored-by: qinxinyi <qxy118045534@163.com>
Co-authored-by: gushiqiao <975033167@qq.com>
Feature:
    1. added mlu590 bfloat16, single-gpu and multi-gpus inference.
    2. added mlu590 int8 inference.
Thanks to HunyuanVideo Team and ModelTC Team.

---------

Co-authored-by: gushiqiao <975033167@qq.com>
Co-authored-by: gushiqiao <77222802+gushiqiao@users.noreply.github.com>
Co-authored-by: chendingyu <chendingyu1@sensetime.com>
Co-authored-by: XHPlus <xhplus@163.com>
Co-authored-by: wangshankun <wangshankun2011@hotmail.com>
Co-authored-by: STwangyingrui <86730325+STwangyingrui@users.noreply.github.com>
Co-authored-by: root <root@pt-80f094c20fc44a8cad096e5f3dbc962e-worker-0.pt-80f094c20fc44a8cad096e5f3dbc962e.ns-devsft-3460edd0.svc.cluster.local>
Added new model links and recommendations for lightweight autoencoders.
--linear_dtype and --linear_quant_dtype unify as --linear_type
Updated README_zh.md with new features and model support.
### 单卡
```bash
python examples/simple_launch.py
```
```python
# examples/simple_launch.py
from lightx2v import LightGenerator

generator = LightGenerator(
    model_path="/path/to/Wan2.1-T2V-1.3B",
    model_cls="wan2.1",
    task="t2v",
)

video_path = generator.generate(
    prompt="Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.",
    negative_prompt="镜头晃动,色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走",
    seed=42,
    save_result_path="output.mp4",
)
```
### 多卡
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
torchrun --nproc_per_node=8 examples/multi_launch.py
```

---------

Co-authored-by: gushiqiao <975033167@qq.com>
helloyongyang and others added 22 commits January 19, 2026 20:28
ring_attn: fp8_comm & kv_fusion

---------

Co-authored-by: root <root@pt-72be2ccd01a14fa18a4b18c6c347f823-worker-0.pt-72be2ccd01a14fa18a4b18c6c347f823.ns-devsft-3460edd0.svc.cluster.local>
use_kv_fusion → use_tensor_fusion

---------

Co-authored-by: root <root@pt-72be2ccd01a14fa18a4b18c6c347f823-worker-0.pt-72be2ccd01a14fa18a4b18c6c347f823.ns-devsft-3460edd0.svc.cluster.local>
# kernel-base text encoder
集成了sgl_kernel的优化算子,同时启用了flash attention
- flash attention3
- Rmsnorm: use sgl_kernel: from sgl_kernel.elementwise import rmsnorm

# service text encoder
使用分离部署,多个推理进程可共享同一个 encoder 服务,可处理并发请求
- Triton自动调优 LIGHTLLM_TRITON_AUTOTUNE_LEVEL=1
- lightllm集成flash attention3, rmsnorm等优化算子

```
================================================================================
COMPARISON SUMMARY
================================================================================
Encoder              | Time (ms)   | Speedup  | Cosine Sim   | 端到端精度 
--------------------------------------------------------------------------------
Baseline (HF)        | 92.17       | 1.00x    | 1.0000       | PASS      
Kernel (Flash-2)     | 81.23       | 1.13x    | 0.9900       | PASS    
Service (Optimized)  | 71.21       | 1.29x    | 0.9492       | PASS    
================================================================================
```

> 上表为纯推理时间对比,service mode还需考虑上网络通信开销(约5ms), 服务router开销等
Co-authored-by: gushiqiao <975033167>
Co-authored-by: gushiqiao <975033167>
ModelTC#810)

…e tiling

---------

Co-authored-by: gushiqiao <975033167>
…TC#776)

`input_info.return_result_tensor` was ignored for all image generation.
Outputting the tensor can be useful for post-processing (such as NSFW
checking), without reloading the file from disk.

I noticed that for video models, they do not return the tensor directly;
they return a map of {"video": tensor}
[here](https://github.com/ModelTC/LightX2V/blob/38f9ac0513d0a097df1dd49e95ec4cc73ec426cb/lightx2v/models/runners/default_runner.py#L445).
I believe this is for compatibility with ComfyUI. If that's the case, we
should only return the tensor and move ComfyUI-specific patterns to the
ComfyUI wrapper codebase. What do you think?
Co-authored-by: yihuiwen <yihuiwen@sensetime.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.