compress images into webp
This commit is contained in:
parent
50459f199d
commit
ee7245f82f
70 changed files with 67 additions and 67 deletions
|
|
@ -13,7 +13,7 @@ Diffusion models (DMs), or more broadly speaking, score-matching generative mode
|
|||
|
||||
Most diffusion models work by coupling a forward diffusion process and a reverse denoising diffusion process. The forward diffusion process gradually adds noise to the ground truth clean data $X_0$, until noisy data $X_T$ that follows a relatively simple distribution is reached. The reverse denoising diffusion process starts from the noisy data $X_T$, and removes the noise component step-by-step until clean generated data $X_0$ is reached. The reverse process is essentially a Monte-Carlo process, meaning it cannot be parallelized for each generation, which can be inefficient for a process with a large number of steps.
|
||||
|
||||

|
||||

|
||||
|
||||
> The two processes in a typical diffusion model. *Source: Ho, Jain, and Abbeel, "Denoising Diffusion Probabilistic Models."*
|
||||
|
||||
|
|
@ -21,7 +21,7 @@ Most diffusion models work by coupling a forward diffusion process and a reverse
|
|||
|
||||
There are many ways to understand how Diffusion Models (DMs) work. One of the most common and intuitive approaches is that a DM learns an ordinary differential equation (ODE) or a stochastic differential equation (SDE) that transforms noise into data. Imagine an vector field between the noise $X_T$ and clean data $X_0$. By training on sufficiently large numbers of timesteps $t\in [0,T]$, a DM is able to learn the vector (tangent) towards the cleaner data $X_{t-\Delta t}$, given any specific timestep $t$ and the corresponding noisy data $X_t$. This idea is easy to illustrate in a simplified 1-dimensional data scenario.
|
||||
|
||||

|
||||

|
||||
|
||||
> Illustrated ODE and SDE flow of a diffusion model on 1-dimensional data. *Source: Song et al., "Score-Based Generative Modeling through Stochastic Differential Equations."*
|
||||
|
||||
|
|
@ -35,13 +35,13 @@ Vanilla DDPM, which is essentially a discrete-timestep DM, can only perform the
|
|||
|
||||
Nevertheless, it is observed that their performance typically suffers catastrophic degradation when reducing the number of reverse process steps to single digits.
|
||||
|
||||

|
||||

|
||||
|
||||
> Images generated by conventional DMs with only a few steps of reverse process. *Source: Frans et al., "One Step Diffusion via Shortcut Models."*
|
||||
|
||||
To understand why DMs scale poorly with few reverse process steps, we can return to the vector field perspective of DMs. When the target data distribution is complex, the vector field typically contains numerous intersections. When a given $X_t$ and $t$ is at these intersections, the vector points to the averaged direction of all candidates. This causes the generated data to approach the mean of the training data when only a few reverse process steps are used. Another explanation is that the learned vector field is highly curved. Using only a few reverse process steps means attempting to approximate these curves with polylines, which is inherently difficult.
|
||||
|
||||

|
||||

|
||||
|
||||
> Illustration of the why DMs scale poorly with few reverse process steps. *Source: Frans et al., "One Step Diffusion via Shortcut Models."*
|
||||
|
||||
|
|
@ -71,7 +71,7 @@ $$
|
|||
|
||||
This procedure produces increasingly straight flows that can be simulated with very few steps, ideally one step after several iterations.
|
||||
|
||||

|
||||

|
||||
|
||||
> Illustrations of vector fields after different times of reflow processes. *Source: Liu, Gong, and Liu, "Flow Straight and Fast."*
|
||||
|
||||
|
|
@ -91,7 +91,7 @@ x_t = \sqrt{\bar{\alpha}_t} \, x_0 + \sqrt{1 - \bar{\alpha}_t} \, \epsilon_t
|
|||
|
||||
In theory, without altering the fundamental formulation of DMs, the learnable denoiser network can be designed to predict any of these three components. Consistency models (CMs) follow this principle by training the denoiser to specifically predict the clean sample $x_0$. The benefit of this approach is that CMs can naturally scale to perform the reverse process with few steps or even a single step.
|
||||
|
||||

|
||||

|
||||
|
||||
> A consistency model that learns to map any point on the ODE trajectory to the clean sample. *Source: Song et al., "Consistency Models."*
|
||||
|
||||
|
|
@ -133,7 +133,7 @@ Based on this insight, on top of $x_t$ and $t$, shortcut models additionally inc
|
|||
\mathbf{s}_{\text{target}} = s_\theta(x_t, t, d)/2 + s_\theta(x'_{t+d}, t + d, d)/2 \quad \text{and} \quad x'_{t+d} = x_t + s_\theta(x_t, t, d)d
|
||||
{% end %}
|
||||
|
||||

|
||||

|
||||
|
||||
> Illustration of the training process of shortcut models. *Source: Frans et al., "One Step Diffusion via Shortcut Models."*
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue