Welcome to the collection of research on diffusion guidance methods!
This repo curates cutting-edge papers on diffusion model guidance 🌟
Your stars fuel the updates! ⭐ This library is actively maintained, with new papers and features added regularly.
| Title | Code | Date | Publication | Summary |
|---|---|---|---|---|
| Diffusion Models Beat GANs on Image Synthesis | Code | 2021.12 | NeurIPS 2021 | Introduces classifier guidance using gradients to enhance sample quality in diffusion models, surpassing GANs on image synthesis. |
| Classifier-Free Diffusion Guidance | N/A | 2021.12 | NeurIPS 2021 Workshop | Enables guidance without classifiers by jointly training on conditional and unconditional objectives. |
| Refining Generative Process with Discriminator Guidance in Score-based Diffusion Models | Code | 2022.11 | ICML 2023 | Introduces Discriminator Guidance, which improves sample quality in pre-trained diffusion models by separately training a discriminator to assess denoising path realism and adding an auxiliary term during generation to deceive it, correcting the score to approximate the data distribution without GAN-like joint training. |
| Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis | Code | 2022.12 | ICLR 2023 | Improve the compositional skills of T2I models, specifically more accurate attribute binding and better image compositions, by incorporating linguistic structures with the diffusion guidance process. |
| Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models | N/A | 2023.01 | ACM Transactions on Graphics | Introduces Attend-and-Excite, an attention-based semantic guidance method that iteratively refines cross-attention maps to ensure text-to-image diffusion models faithfully generate all subjects in the prompt (requires new attention processor per model). |
| Diffusion Self-Guidance for Controllable Image Generation | N/A | 2023.06 | NeurIPS 2023 | Introduces self-guidance, a method that provides greater control over generated images by guiding the internal representations of diffusion models. |
| Universal Guidance for Diffusion Models | Code | 2023.06 | CVPRW2023 | Provides a training-free method to incorporate arbitrary conditions into pre-trained unguided diffusion models using energy functions. |
| Improving Sample Quality of Diffusion Models Using Self-Attention Guidance | Code | 2023.10 | ICCV2023 | Uses self-attention maps to guide diffusion models away from degraded regions, improving generated image quality without extra training. |
| Characteristic Guidance: Non-linear Correction for Diffusion Model at Large Guidance Scale | Code | 2023.12 | ICML 2024 | Applies non-linear corrections based on Fokker-Plank equations to handle over-saturation artifact at large guidance scales in diffusion models. |
| Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models | N/A | 2023.12 | AAAI 2025 | Skips evaluations upon convergence via NAS policies for training-free acceleration reducing NFEs. |
| ProtoDiffusion: Classifier-Free Diffusion Guidance with Prototype Learning | Code | 2024.02 | ACML 2024 | Integrates prototype learning into classifier-free guidance to speed up sampling and enhance quality using class-specific prototypes. |
| Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance | Code | 2024.03 | ECCV 2024 | Proposes Perturbed-Attention Guidance (PAG) to improve sample quality in diffusion models by adding random perturbations to attention maps, enabling self-rectification in both unconditional and conditional generation. |
| Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian Mixture Models | N/A | 2024.03 | ICML 2024 | Provide theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models. |
| Analysis of Classifier-Free Guidance Weight Schedulers | N/A | 2024.04 | TMLR | Investigates varying classifier-free guidance weights during diffusion, finding that simple, monotonically increasing schedulers improve performance with minimal code, while complex parametrized schedulers offer gains but lack generalizability across models and tasks. |
| Inner Classifier-Free Guidance and Its Taylor Expansion for Diffusion Models | N/A | 2024.04 | TMLR | This approach involves utilizing a single model to jointly optimize the conditional score predictor and unconditional score predictor, eliminating the need for additional classifiers. The inner classifier-free guidance (ICFG) provides an alternative perspective on the CFG method when the condition has a specific structure, demonstrating that CFG represents a first-order case of ICFG. |
| Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models | Code | 2024.04 | NeurIPS 2024 | Restricts classifier-free guidance to a limited noise level interval during sampling, improving inference speed and result quality. |
| CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models | Code | 2024.06 | ICLR 2025 | Constrains guidance to data manifold using orthogonal projections for better sample quality and invertibility. |
| Guiding a Diffusion Model with a Bad Version of Itself | Code | 2024.06 | NeurIPS 2024 | Uses a poorly trained model version for guidance to control trade-offs between sample quality and diversity. |
| No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models | N/A | 2024.07 | ICLR 2025 | Introduces independent condition guidance (ICG) and time-step guidance (TSG) using time-step encodings for training-free, invertible sampling applicable unconditionally. |
| Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention | Code | 2024.08 | NeurIPS 2024 | Presents Smoothed Energy Guidance (SEG), a method that smooths energy curvature in attention maps to enhance unconditional image generation quality while minimizing artifacts (requires new attention processor per model). |
| Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models | Code | 2024.10 | ICLR 2025 | Decompose the update term in CFG into parallel and orthogonal components with respect to the conditional model prediction and observe that the parallel component primarily causes oversaturation, while the orthogonal component enhances image quality. |
| Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling | Code | 2024.11 | CVPR 2025 | Develops Spatiotemporal Skip Guidance (STG), a training-free technique that skips specific layers in video diffusion transformers to boost sample quality without sacrificing diversity or motion dynamics. |
| TFG: Unified Training-Free Guidance for Diffusion Models | Code | 2024.09 | NeurIPS 2024 | Unifies training-free guidance methods with efficient hyper-parameter optimization for various conditional generation tasks. |
| TFG-Flow: Training-free Guidance in Multimodal Generative Flow | N/A | 2025.01 | ICLR 2025 | Introduces a novel training-free guidance method for multimodal generative flow, uniquely addressing the curse-of-dimensionality while maintaining unbiased sampling for guiding discrete variables in applications like molecular design. |
| REG: Rectified Gradient Guidance for Conditional Diffusion Models | N/A | 2025.01 | ICML 2025 | Proposes a rectified gradient guidance method that enhances conditional diffusion models by aligning practical implementations with a theoretically valid scaled joint distribution objective, improving performance over existing guidance techniques. |
| Visual Generation Without Guidance | Code | 2025.01 | ICML 2025 | Uses only a single model by reformulating the training loss as a linear interpolation between conditional and unconditional components with a randomly sampled pseudo-temperature and a stopping gradient for stability, halving inference costs while matching quality and diversity. |
| Variational Control for Guidance in Diffusion Models | Code | 2025.02 | ArXiv | Employs variational inference for terminal cost guidance in pre-trained models, unifying methods for training-free inverse problems. |
| Classifier-Free Guidance: From High-Dimensional Analysis to Generalized Guidance Forms | N/A | 2025.02 | ArXiv | Uniquely demonstrates that Classifier-Free Guidance accurately reproduces the target distribution in high and infinite dimensions, extending to a family of non-linear generalizations with improved robustness, sample fidelity, and diversity. |
| Classifier-free Guidance with Adaptive Scaling | N/A | 2025.02 | ArXiv | β-CFG introduces a novel adaptive scaling method using gradient-based normalization and time-dependent β-distribution curves to dynamically balance prompt matching and image quality during the diffusion denoising process. |
| Diffusion Models without Classifier-free Guidance | Code | 2025.02 | ArXiv | Integrates condition posteriors into objectives for accelerated processes matching CFG quality without unconditional models. |
| Improving Discriminator Guidance in Diffusion Models | N/A | 2025.03 | ArXiv | Proposes a training objective for discriminator guidance in diffusion models that minimizes Kullback-Leibler divergence, improving sample quality over the conventional method using Cross-Entropy loss. |
| Guidance Free Image Editing via Explicit Conditioning | N/A | 2025.03 | ArXiv | Explicit Proposes a technique that models noise distribution on input modalities to guide conditional diffusion models, significantly reducing computational costs and improving inference time compared to CFG in image editing tasks. |
| TCFG: Tangential Damping Classifier-free Guidance | Code | 2025.03 | CVPR 2025 | Geometrically filters tangential misalignments using SVD for better manifold alignment and quality with minimal overhead. |
| CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models | Code | 2025.03 | ArXiv | Optimizes guidance scales for flow matching to correct early velocities and zero initial steps, outperforming CFG in multimedia generation. |
| Domain Guidance: A Simple Transfer Approach for a Pre-trained Diffusion Model | N/A | 2025.04 | ICLR 2025 | Introduces a simple transfer approach that leverages pre-trained knowledge to guide the sampling process toward the target domain, sharing a formulation similar to classifier-free guidance for improved domain alignment and generation quality. |
| Entropy Rectifying Guidance for Diffusion and Flow Models | N/A | 2025.04 | ArXiv | Entropy Rectifying Guidance (ERG) is a simple and effective guidance mechanism that improves image quality, diversity, and prompt consistency in diffusion and flow models by modifying the attention mechanism during inference, extending to unconditional sampling and combining seamlessly with other guidance methods. |
| Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization | N/A | 2025.05 | ArXiv | Proposes optimizing semantic embeddings guided by attribute classifiers to steer text-to-image diffusion models for desired edits, eliminating the need for text prompts and model training or fine-tuning. |
| Diffusion Models with Double Guidance: Generate with aggregated datasets | N/A | 2025.05 | ArXiv | Enables precise conditional generation by maintaining control over multiple conditions without requiring joint annotations, even when training samples lack all conditions simultaneously. |
| Adaptive Diffusion Guidance via Stochastic Optimal Control | N/A | 2025.05 | ArXiv | Introduces a stochastic optimal control framework that dynamically adjusts guidance strength in diffusion models based on time, current sample, and conditioning class, offering a principled approach to guidance scheduling. |
| Conditional Diffusion Models with Classifier-Free Gibbs-like Guidance | Code | 2025.05 | ArXiv | Samples tilted distributions with Rényi divergence for low-noise corrections, fixing CFG's diversity issues. |
| Normalized Attention Guidance: Universal Negative Guidance for Diffusion Model | N/A | 2025.05 | NeurIPS 2025 | Introduces a training-free, universal negative guidance method for diffusion models that uses extrapolation in attention space with L1-based normalization, generalizing across architectures, sampling regimes, and modalities while maintaining fidelity. |
| Angle Domain Guidance: Latent Diffusion Requires Rotation Rather Than Extrapolation | Code | 2025.06 | ICML 2025 | Angle Domain Guidance (ADG) mitigates color distortions in text-to-image latent diffusion models by constraining magnitude variations and optimizing angular alignment, preserving enhanced text-image alignment at higher guidance weights. |
| Feedback Guidance of Diffusion Models | Code | 2025.06 | ArXiv | Self-regulates coefficients based on predicted informativeness, adapting for complex prompts and balancing diversity-quality. |
| How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models | N/A | 2025.06 | ArXiv | Step AG, a simple and universally applicable adaptive guidance strategy, restricts classifier-free guidance to the first several denoising steps, achieving high-quality, well-conditioned images with a 20% to 30% speedup. |
| Token Perturbation Guidance for Diffusion Models | Code | 2025.06 | NeurIPS 2025 | Shuffles tokens for training-free, condition-agnostic guidance improving unconditional generation. |
| Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low CFG Scales | Code | 2025.06 | ICLR 2026 | Analyzes CFG in the frequency domain and proposes frequency-decoupled guidance (FDG): separate scales on low- vs. high-frequency parts of the CFG update (structure and conditioning vs. detail), improving fidelity at low guidance while reducing high-CFG oversaturation and diversity loss. |
| S²-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models | Code | 2025.08 | ArXiv | Introduces a training-free self-guidance method using stochastic block-dropping to create sub-networks that refine suboptimal predictions, enhancing sample quality and prompt adherence beyond traditional CFG. |
| Dynamic Classifier-Free Diffusion Guidance via Online Feedback | N/A | 2025.09 | ArXiv | Introduces a dynamic CFG framework using online feedback from latent-space evaluators (e.g., CLIP, discriminators) to adaptively adjust guidance scales per timestep, achieving up to 55.5% human preference win-rate for prompt-specific image generation. |
| Rectified-CFG++ for Flow Based Models | Code | 2025.10 | NeurIPS 2025 | An adaptive predictor-corrector guidance method for rectified flow models that prevents off-manifold drift, ensuring stable, high-quality, and artifact-free image generation across a wide range of guidance scales . |
| Towards a Golden Classifier-Free Guidance Path via Foresight Fixed Point Iterations | Code | 2025.10 | NeurIPS 2025 (Spotlight) | Introduces Foresight Guidance (FSG), a unified framework that reframes CFG and its variants as fixed point iterations seeking a golden path. FSG prioritizes longer-interval subproblems in early diffusion stages with increased iterations, achieving state-of-the-art performance in both image quality and computational efficiency. |
| CFG-Ctrl: Control-Based Classifier-Free Diffusion Guidance | Code | 2026.03 | CVPR 2026 | Reframes CFG as control on the generative flow and proposes SMC-CFG (sliding mode control) with nonlinear feedback to reduce instability and overshooting at large guidance scales; Code release includes FLUX, Qwen-Image, SD3/SD3.5, and Wan video pipelines. |
| Improving Diffusion Generalization with Weak-to-Strong Segmented Guidance | Code | 2026.03 | CVPR 2026 | This paper introduces the Weak‑to‑Strong (W2S) guidance principle and Segmented Guidance (SGG), a framework that combines condition‑dependent and condition‑agnostic guidance to improve the generalization of diffusion models. |
If you have any relevant papers to add, please follow the guidelines below:
- Fork the repository.
- Create a new branch for your changes.
- Add your paper details in the paper section.
- Commit and push your changes.
- Open a pull request for review.
Will update this repository by categorizing the papers.