Skip to content

feat(examples/hunyuanimage): add HunyuanImage3.0-80B inference&finetune#1432

Open
Dong1017 wants to merge 23 commits intomindspore-lab:masterfrom
Dong1017:hunyuan_image_3
Open

feat(examples/hunyuanimage): add HunyuanImage3.0-80B inference&finetune#1432
Dong1017 wants to merge 23 commits intomindspore-lab:masterfrom
Dong1017:hunyuan_image_3

Conversation

@Dong1017
Copy link
Copy Markdown
Contributor

@Dong1017 Dong1017 commented Nov 20, 2025

What does this PR do?

Adds

  1. Hunyuan-Image-3 models and the corresponding text-to-image pipeline.
  2. LoRA finetune script for Hunyuan-Image-3 t2i with dataset from lambdalabs/pokemon-blip-captions.

Usage

  1. Text-to-image inference
#!/bin/bash
export TOKENIZERS_PARALLELISM=False
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

# Distributed training configuration
MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
MASTER_PORT=${MASTER_PORT:-$(shuf -i 20001-29999 -n 1)}
NPROC_PER_NODE=${WORLD_SIZE:-8}

# Model configuration
MODEL_ID=${MODEL_ID:-"HunyuanImage-3/"}  # Using HuggingFace model ID

# Training entry point
entry_file="run_image_gen.py"

# Output configuration
image_path="examples/infer/image_repro.png"

# Input argument (To be filled)
image_path="image_repro.png"
prompt="A brown and white dog is running on the grass"
seed=0
verbose=1
enable_amp="True"
image_size="832x1216"

# Launch inference
msrun --worker_num=${NPROC_PER_NODE} \
    --local_worker_num=${NPROC_PER_NODE} \
    --master_addr=${MASTER_ADDR} \
    --master_port=${MASTER_PORT} \
    --log_dir="logs/infer" \
    --join=True \
    ${entry_file} \
    --model-id "${MODEL_ID}" \
    --save "${image_path}" \
    --prompt "${prompt}" \
    --seed "${seed}" \
    --verbose "${verbose}" \
    --enable-ms-amp "${enable_amp}"\
    --image-size "${image_size}"\
    --reproduce \
    --bf16
  1. Finetune
#!/bin/bash
export TOKENIZERS_PARALLELISM=False
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

# Distributed training configuration
MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
MASTER_PORT=${MASTER_PORT:-$(shuf -i 20001-29999 -n 1)}
NPROC_PER_NODE=${WORLD_SIZE:-8}

# Model configuration
MODEL_ID=${MODEL_ID:-"HunyuanImage-3/"}  # Using HuggingFace model ID

# Training entry point
entry_file="run_image_train.py"

# Output configuration
output_dir="output/train"

# Input argument (To be filled)
dataset_path="datasets/pokemon-blip-captions"
deepspeed="scripts/zero3.json"
learning_rate=1e-5
num_train_epochs=1
seed=0
save_strategy="no"
do_eval="False"

# Launch inference
msrun --worker_num=${NPROC_PER_NODE} \
    --local_worker_num=${NPROC_PER_NODE} \
    --master_addr=${MASTER_ADDR} \
    --master_port=${MASTER_PORT} \
    --log_dir="logs/train" \
    --join=True \
    ${entry_file} \
    --dataset_path "${dataset_path}" \
    --deepspeed "${deepspeed}" \
    --model_path "${MODEL_ID}" \
    --output_dir "${output_dir}" \
    --num_train_epochs "${num_train_epochs}" \
    --learning_rate "${learning_rate}" \
    --seed "${seed}" \
    --save_strategy "${save_strategy}" \
    --do_eval "${do_eval}" \
    --bf16

More information is available in the README.md

Performance

Inference experiments are tested on Ascend Atlas 800T A2 machines with MindSpore 2.7.1, using 8 NPUs.

Type Weight Loading Time Mode Speed (s/it)
Inference 6m48s pynative 28.20

Finetune experiments are tested on Ascend Atlas 800T A2 machines with MindSpore 2.7.0, using 8 NPUs.

Type Mode Traniable ratio Speed for one step in an epoch (s/it)
Finetune pynative 0.073% 54.41

Option

Use #1422 to accelerate loading model weights

Limitations

  1. MindSpore version: Inference support ms >= 2.7.0. Lora Finetune support ms 2.7.0.
  2. The App server has not been supported yet. This limitation arises because model inference relies on msrun fordistributed execution, which is currently tightly integrated with Gradio. In the current setup, each NPU node spawns an independent process responsible for loading model weights and attempting to start the server on a designated port. While different ports can be assigned to each NPU’s process, the core issue lies in Gradio’s launch method, which blocks the main thread by default (prevent_thread_lock=False). This blocking behavior leads to distributed process deadlock in a multi-NPU environment. Setting prevent_thread_lock = True allows the server to start briefly, but it immediately terminates and fails to remain active. As a potential solution, future work may involve replacing the current Gradio-based server with a lightweight web framework such as Flask.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
    documentation guidelines
  • Did you build and run the code without any errors?
  • Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@xxx

@Dong1017 Dong1017 marked this pull request as ready for review December 22, 2025 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants