Skip to content

feat: add Res2s HQ sampler for LTX2 dev pipeline#1632

Draft
Gunther-Schulz wants to merge 22 commits intodeepbeepmeep:mainfrom
Gunther-Schulz:feature/ltx2-hq-pipeline
Draft

feat: add Res2s HQ sampler for LTX2 dev pipeline#1632
Gunther-Schulz wants to merge 22 commits intodeepbeepmeep:mainfrom
Gunther-Schulz:feature/ltx2-hq-pipeline

Conversation

@Gunther-Schulz
Copy link
Copy Markdown

@Gunther-Schulz Gunther-Schulz commented Mar 22, 2026

🤖 Generated with Claude Code

Summary

Ports the official TI2VidTwoStagesHQPipeline components from Lightricks' LTX-2 reference implementation, adding a Res2s (HQ) sampler option for the LTX2 dev pipeline.

  • Res2sDiffusionStep: Second-order sampler with SDE noise injection and variance-preserving transitions
  • res2s_audio_video_denoising_loop: Two-stage Runge-Kutta denoising loop with bong iteration for anchor stabilization
  • MultiModalGuider: Unified guidance combining CFG, STG, modality guidance, and rescaling in one pass
  • Resolution-dependent sigma scheduling: Passes latent shape to scheduler for correct sigma shifting (HQ path only, Euler path unchanged)

UI

  • Sampler dropdown in Quality tab: Euler (Standard) / Res2s (HQ)
  • Rescale Scale slider (0-1, 0.45 recommended)
  • Info note with recommended settings
  • Visible for dev pipeline only (not distilled)

Official HQ defaults

  • 15 steps, CFG 3.0, Rescale 0.45, Distilled LoRA at 0.25 strength
  • Audio rescale hardcoded to 1.0 (per official)

What's unchanged

  • Euler path is completely untouched — selecting Euler gives identical behavior to before this PR
  • All existing Quality tab settings (perturbation, APG, CFG Star, Self Refiner) work as before with Euler
  • Distilled pipeline unaffected

Known limitations

  • No per-stage LoRA strength (official uses 0.25 stage 1, 0.5 stage 2 for distilled LoRA)
  • Self Refiner not integrated with Res2s loop (only works with Euler)
  • Image CRF preprocessing not implemented (official pre-compresses conditioning images at CRF=33)

Files changed

  • models/ltx2/ltx_core/components/diffusion_steps.py — Res2sDiffusionStep
  • models/ltx2/ltx_core/components/guiders.py — MultiModalGuider + MultiModalGuiderParams
  • models/ltx2/ltx_pipelines/utils/res2s.py — phi functions and RK coefficients (new file)
  • models/ltx2/ltx_pipelines/utils/helpers.py — res2s loop, multi_modal_guider_denoising_func, SDE noise helpers
  • models/ltx2/ltx_pipelines/ti2vid_two_stages.py — HQ path branching, sigma schedule, stepper/guider selection
  • models/ltx2/ltx2.py — forward hq_sampler/rescale_scale params
  • models/ltx2/ltx2_handler.py — hq_sampler capability flag for dev models
  • wgp.py — UI controls, param threading

Test plan

  • Verify Euler (Standard) produces identical output to main branch
  • Verify Res2s (HQ) completes generation without errors (both stages)
  • Test with distilled LoRA at 0.25 strength, 15 steps, CFG 3, rescale 0.45
  • Test without distilled LoRA at 20+ steps
  • Test distilled pipeline still works (HQ option not visible)
  • Verify settings save/load correctly with hq_sampler and rescale_scale

🤖 Generated with Claude Code

Gunther-Schulz and others added 22 commits March 22, 2026 15:58
Port the official TI2VidTwoStagesHQPipeline components from Lightricks:

- Res2sDiffusionStep: second-order sampler with SDE noise injection
- res2s.py: phi functions and Runge-Kutta coefficient computation
- MultiModalGuider + MultiModalGuiderParams: unified guidance combining
  CFG, STG, modality guidance, and rescaling in one pass
- multi_modal_guider_denoising_func: denoising function for MultiModalGuider
- res2s_audio_video_denoising_loop: two-stage RK loop with bong iteration
  and SDE noise at substep and step levels

UI: Sampler dropdown (Euler Standard / Res2s HQ) with Rescale Scale slider
in the Quality tab. Visible for dev pipeline only.

When Res2s HQ is selected, both Stage 1 and Stage 2 use the second-order
sampler. The existing Euler path with all Quality tab settings (perturbation,
APG, CFG Star, Self Refiner) remains unchanged when Euler is selected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add _prewarm/_cleanup to multi_modal_guider_denoising_func for proper
  conditioning context preparation (fixes CPU tensor in timestep embedding)
- Pass v_context_n/a_context_n for negative context preparation
- Add step_index and sigma_schedule to modality_from_latent_state calls
- Cast SDE substep sigmas to float32 to prevent double dtype propagation
- Fix denoise_fn substep call to include proper zero sigma for padding

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
A1: Pass latent shape to LTX2Scheduler for resolution-dependent sigma
    shifting. Was using MAX_SHIFT_ANCHOR=4096 tokens instead of actual
    token count (e.g. 273 for 480p). Sigma schedule was completely wrong.

A2: Add final denoise step after res2s loop. When sigmas[-1]==0, the
    official pipeline does one clean denoise to fully remove residual
    noise. We were missing this.

A3: Fix substep sigma padding. Was passing [sub_sigma, 0] (2 elements)
    instead of [sub_sigma] (1 element) like the official. Extra zero
    affected sigma_schedule in modality.

A4: Don't pass noise_seed to res2s loop. Official defaults to -1.
    We were passing the user seed, causing different noise patterns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The standard Euler pipeline intentionally uses MAX_SHIFT_ANCHOR=4096
without passing latent to the scheduler. Only the official HQ pipeline
uses resolution-dependent sigma shifting. Revert A1 for Euler path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The substep evaluation was reusing the cached base_timestep computed
for the main step's sigma, giving wrong timestep embeddings. Now
creates a fresh LatentStateRuntimeCache for mid states so timestep
bases are recomputed correctly for the substep sigma.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rescale Scale and the HQ recommended settings note are only shown
when Res2s (HQ) sampler is selected. Euler users see just the
sampler dropdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add hq_sampler_options to extra_inputs so the rescale slider and
info note visibility updates when loading settings from a generated
video that used Res2s HQ.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The slider label already says 0.45 is recommended but the default was
0.0.  New users selecting Res2s HQ would get no rescaling unless they
manually set it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The official HQ pipeline uses different distilled LoRA strengths per
stage (0.25 for stage 1, 0.5 for stage 2). The existing semicolon
notation already supports this — update the info note to show the
correct 0.25;0.5 syntax so users match the official defaults.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Rescale Scale slider (Res2s-only) and Guidance Rescale slider
(Euler-only) perform identical math. Remove the separate Res2s slider
and reuse Guidance Rescale for both modes. In Res2s mode, alt_scale
now feeds into MultiModalGuiderParams.rescale_scale.

Updated the Res2s HQ note with all recommended official values:
15 steps, distilled LoRA 0.25;0.5, CFG 3, Guidance Rescale 0.45,
Audio Guidance 7, Modality Guidance 3.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The res2s denoising loop performs an extra final denoise at
step_index == n_full_steps when sigmas[-1] == 0. The LoRA schedule
was only expanded for n_full_steps entries, causing an IndexError
when using per-stage strengths (semicolon notation). Add +1 to
account for this extra step.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The hq_sampler_options visibility was gated on update_form which is
False on fresh loads, hiding the note even when Res2s was the saved
default. Remove the update_form guard.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Perturbation (STG) now works in Res2s mode via MultiModalGuider's
built-in stg_scale/stg_blocks support. Default is OFF (stg_scale=0.0)
matching the official HQ pipeline — user can enable via the
Perturbation dropdown.

APG, CFG Star, and Self-Refiner are hidden when Res2s is selected
since they are incompatible with the MultiModalGuider architecture
and only work in the Euler path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rescale is now unified with Guidance Rescale slider, no need to
show it separately in the video info section.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gr.State passes a State object rather than its value in some Gradio
contexts, causing TypeError when float() is called on it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The handler referenced apg_col, cfg_free_guidance_col, self_refiner_col
before they were defined. Move the registration to after all columns
exist within the Quality tab.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant