introduce figcaption

2026-02-06 09:11:05 +01:00 · 2026-02-06 09:11:05 +01:00 · d8ea74211f
commit d8ea74211f
parent 05dea86964
14 changed files with 81 additions and 63 deletions
--- a/content/ml-tech/one-step-diffusion-models/index.md
+++ b/content/ml-tech/one-step-diffusion-models/index.md
@ -15,7 +15,7 @@ Most diffusion models work by coupling a forward diffusion process and a reverse

 ![](diffusion-process.webp)

-> The two processes in a typical diffusion model. *Source: Ho, Jain, and Abbeel, "Denoising Diffusion Probabilistic Models."*
+{% cap() %}The two processes in a typical diffusion model. *Source: Ho, Jain, and Abbeel, "Denoising Diffusion Probabilistic Models."*{% end %}

 ### Understanding DMs

@ -23,7 +23,7 @@ There are many ways to understand how Diffusion Models (DMs) work. One of the mo

 ![](ode-sde-flow.webp)

-> Illustrated ODE and SDE flow of a diffusion model on 1-dimensional data. *Source: Song et al., "Score-Based Generative Modeling through Stochastic Differential Equations."*
+{% cap() %}Illustrated ODE and SDE flow of a diffusion model on 1-dimensional data. *Source: Song et al., "Score-Based Generative Modeling through Stochastic Differential Equations."*{% end %}

 ### DMs Scale Poorly with Few Steps

@ -37,13 +37,13 @@ Nevertheless, it is observed that their performance typically suffers catastroph

 ![](few-steps-results.webp)

-> Images generated by conventional DMs with only a few steps of reverse process. *Source: Frans et al., "One Step Diffusion via Shortcut Models."*
+{% cap() %}Images generated by conventional DMs with only a few steps of reverse process. *Source: Frans et al., "One Step Diffusion via Shortcut Models."*{% end %}

 To understand why DMs scale poorly with few reverse process steps, we can return to the vector field perspective of DMs. When the target data distribution is complex, the vector field typically contains numerous intersections. When a given $X_t$ and $t$ is at these intersections, the vector points to the averaged direction of all candidates. This causes the generated data to approach the mean of the training data when only a few reverse process steps are used. Another explanation is that the learned vector field is highly curved. Using only a few reverse process steps means attempting to approximate these curves with polylines, which is inherently difficult.

 ![](dm-scale-poorly.webp)

-> Illustration of the why DMs scale poorly with few reverse process steps. *Source: Frans et al., "One Step Diffusion via Shortcut Models."*
+{% cap() %}Illustration of the why DMs scale poorly with few reverse process steps. *Source: Frans et al., "One Step Diffusion via Shortcut Models."*{% end %}

 We will introduce two branches of methods that aim to scale DMs to few or even reverse process steps: **distillation-based**, which distillates a pre-trained DM into a one-step model; and **end-to-end-based**, which trains a one-step DM from scratch.

@ -73,7 +73,7 @@ This procedure produces increasingly straight flows that can be simulated with v

 ![](reflow-iterations.webp)

-> Illustrations of vector fields after different times of reflow processes. *Source: Liu, Gong, and Liu, "Flow Straight and Fast."*
+{% cap() %}Illustrations of vector fields after different times of reflow processes. *Source: Liu, Gong, and Liu, "Flow Straight and Fast."*{% end %}

 In practice, distillation-based methods are usually trained in two stages: first train a normal DM, and later distill one-step capabilities into it. This introduces additional computational overhead and complexity.

@ -93,7 +93,7 @@ In theory, without altering the fundamental formulation of DMs, the learnable de

 ![](consistency-model.webp)

-> A consistency model that learns to map any point on the ODE trajectory to the clean sample. *Source: Song et al., "Consistency Models."*
+{% cap() %}A consistency model that learns to map any point on the ODE trajectory to the clean sample. *Source: Song et al., "Consistency Models."*{% end %}

 Formally, CMs learn a function $f_\theta(x_t,t)$ that maps noisy data $x_t$ at time $t$ directly to the clean data $x_0$, satisfying:

@ -135,6 +135,6 @@ Based on this insight, on top of $x_t$ and $t$, shortcut models additionally inc

 ![](shortcut-training.webp)

-> Illustration of the training process of shortcut models. *Source: Frans et al., "One Step Diffusion via Shortcut Models."*
+{% cap() %}Illustration of the training process of shortcut models. *Source: Frans et al., "One Step Diffusion via Shortcut Models."*{% end %}

 Both consistency models and shortcut models can be seamlessly scaled between one-step and multi-step generation to balance quality and efficiency.