Skip to content

feat: add JoyImage edit plus#14032

Open
tangyanf wants to merge 5 commits into
huggingface:mainfrom
tangyanf:add-joyimage-edit-plus
Open

feat: add JoyImage edit plus#14032
tangyanf wants to merge 5 commits into
huggingface:mainfrom
tangyanf:add-joyimage-edit-plus

Conversation

@tangyanf

@tangyanf tangyanf commented Jun 22, 2026

Copy link
Copy Markdown

Description

We are the JoyAI Team, and this is the Diffusers implementation for the JoyAI-Image-Edit-Plus model.

GitHub Repository: [https://github.com/jd-opensource/JoyAI-Image]
Hugging Face Model: [https://huggingface.co/jdopensource/JoyAI-Image-Edit-Plus-Diffusers]
Original opensource weights: [https://huggingface.co/jdopensource/JoyAI-Image-Edit-Plus]
Fixes #14049

Model Overview

JoyAI-Image-Edit-Plus extends JoyAI-Image-Edit with multi-image editing capabilities. While JoyAI-Image-Edit operates on a single reference image, Edit-Plus accepts multiple reference
images as input and performs instruction-guided editing across them — enabling tasks such as subject composition, style transfer from multiple sources, and multi-view consistent editing.

It combines an 8B Multimodal Large Language Model (MLLM) with a 16B Multimodal Diffusion Transformer (MMDiT), supporting variable-resolution reference images that are independently
encoded and jointly denoised.

Key Features

  • Multi-Image Input: Accepts multiple reference images with different resolutions, enabling complex editing scenarios that require information from multiple visual sources.
  • Subject Composition: Combine elements from separate images into a coherent output guided by text instructions (e.g., "Let the person lovingly play with the dog" given separate person
    and dog images).
  • Cross-Image Style Transfer: Apply style or attributes from one reference image to subjects in another.
  • Variable Resolution Support: Each reference image is independently resized and encoded at its optimal resolution, preserving fine-grained details regardless of input size.
  • Instruction-Guided Generation: Natural language prompts control how multiple reference images are composed and edited in the final output.

@github-actions github-actions Bot added models pipelines size/L PR with diff > 200 LOC labels Jun 22, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Hi @tangyanf, thanks for the PR! It does not appear to link an issue it fixes. If this PR addresses an existing issue, please add a closing keyword (e.g. Fixes #1234) to the PR description so the issue is linked. See the contribution guide for more details. If this PR intentionally does not fix a tracked issue, a maintainer can add the no-issue-needed label to silence this reminder.

@yiyixuxu yiyixuxu added the no-issue-needed for PRs that do not require link to an issue label Jun 22, 2026

@sergereview sergereview Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤗 Serge says:

This PR adds the JoyImage Edit Plus model and pipeline. There are several blocking issues that need to be addressed before merging.

Blocking — Debug artifacts left in production code

Multiple torch.save() calls, a print() statement, and a commented-out exit(0) are left in pipeline_joyimage_edit_plus.py. These will write files to the user's working directory and print to stdout during every inference call.

Blocking — einops dependency

Per .ai/models.md: "No new mandatory dependency without discussion (e.g. einops). Optional deps guarded with is_X_available() and a dummy in utils/dummy_*.py." The pipeline directly imports from einops import rearrange — this is the only non-comment usage of einops in src/diffusers/. The rearrange calls should be rewritten with native PyTorch (reshape, permute, unflatten).

Blocking — sglang integration code in model forward

The transformer's forward method contains sglang-specific code: list-unwrapping for "SglangXvideo CFG branches" (lines 272-276) and a try: from sglang... fallback (lines 279-287). Per .ai/AGENTS.md: "No defensive code, unused code paths, or legacy stubs — do not add fallback paths, safety checks, or configuration options 'just in case'." This code doesn't belong in the diffusers model — the pipeline always passes the required arguments.

Blocking — Missing dummy objects

JoyImageEditPlusTransformer3DModel, JoyImageEditPlusPipeline, and JoyImageEditPlusPipelineOutput are not registered in dummy_pt_objects.py / dummy_torch_and_transformers_objects.py. This will cause ImportError when torch/transformers are not installed.

Blocking — Missing tests

No test files were added for the new model or pipeline.

Blocking — Hardcoded device_type="cuda" in torch.autocast

torch.autocast(device_type="cuda", ...) is hardcoded in two places in the pipeline. This will fail on MPS, XPU, and other non-CUDA devices.

Non-blocking — Inlined scheduler sigma math

Per .ai/pipelines.md gotcha #3, the pipeline manually computes shifted sigmas and temporarily overrides self.scheduler.shift — this is exactly what FlowMatchEulerDiscreteScheduler does with its shift config. The scheduler should own this logic.

Non-blocking — Unused imports and parameters

  • import inspect in transformer_joyimage_edit_plus.py is unused.
  • enable_denormalization parameter is declared in prepare_latents and __call__ but never read.
  • retrieve_timesteps is duplicated from the existing pipeline without a # Copied from annotation.

serge v0.1.0 · model: claude-opus-4-6 · 29 LLM turns · 50 tool calls · 190.2s · 1602502 in / 7369 out tokens

Comment thread src/diffusers/pipelines/joyimage/pipeline_joyimage_edit_plus.py Outdated
Comment thread src/diffusers/pipelines/joyimage/pipeline_joyimage_edit_plus.py Outdated
Comment thread src/diffusers/pipelines/joyimage/pipeline_joyimage_edit_plus.py Outdated
Comment thread src/diffusers/pipelines/joyimage/pipeline_joyimage_edit_plus.py Outdated
Comment thread src/diffusers/models/transformers/transformer_joyimage_edit_plus.py Outdated
Comment thread src/diffusers/pipelines/joyimage/pipeline_joyimage_edit_plus.py Outdated
Comment thread src/diffusers/pipelines/joyimage/pipeline_joyimage_edit_plus.py Outdated
Comment thread src/diffusers/pipelines/joyimage/pipeline_joyimage_edit_plus.py Outdated
Comment thread src/diffusers/pipelines/joyimage/pipeline_output.py Outdated
Comment thread src/diffusers/pipelines/joyimage/pipeline_joyimage_edit_plus.py Outdated
tangyanfei.8 added 2 commits June 23, 2026 02:18
    - Remove einops dependency: replace rearrange with reshape/permute
    - Remove sglang-specific code from transformer forward
    - Remove unused import inspect from transformer
    - Fix hardcoded device_type="cuda" to use device.type
    - Simplify scheduler sigma math: delegate to retrieve_timesteps
    - Remove unused enable_denormalization parameter
    - Fix callback latents variable binding
    - Fix output_type="pt" to return stacked tensor
    - Set return_dict default to True in transformer forward
    - Add dummy objects for JoyImageEditPlus classes
    - Add transformer and pipeline test files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

models no-issue-needed for PRs that do not require link to an issue pipelines size/L PR with diff > 200 LOC tests utils

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add JoyAI-Image Edit Plus pipeline and model

2 participants