You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The ideal for generating images would be to define a probabilistic distribution containing all image content, and after that generate the image through a random process based on weighted probability as LLM does. But this is a huge task, and at the moment it is not possible.
Diffusion models propose a process to achieve image generation thoug a similar method: to develop a neural network-based model that is able to gradually remove Gaussian noise from an image. So starting from any image generaeted through a Gaussian noise distribution $\mathcal{N}(0, 1)$, converge to image space after a finite number of $\mathcal{N}(0, 1)$ steps.
Noising Process
To train this type of model, the first step is to develop an algorithm that takes any image and converts it to white noise in a finite number of steps. This will act as a starting point for our model which we will train to do the reverse process.
Initial approach
One method that could be considered is to simply add white noise step by step:
This phenomenon is called "variance exploitation".
Noising process example on an image over 500 steps. For testing if each distribution was likely a N(0, 1) distribution, it was applied Kolmogorov-Smirnov test.
Noising process example on an image over 500 steps. For testing if each distribution was likely a N(0, 1) distribution, it was applied Kolmogorov-Smirnov test.
Diffusion method
An alternative method is proposed in the original paper:
Diffusion process example on an image over 500 steps. For testing if each distribution was likely a N(0, 1) distribution, it was applied Kolmogorov-Smirnov test.
It could be shown that this distribution already converges to a normal distribution:
Diffusion process example on an image over 500 steps. For testing if each distribution was likely a N(0, 1) distribution, it was applied Kolmogorov-Smirnov test.
We have already defined a successful method that converges to a standardized normal distribution.
Markov chain notation
The forward process or diffusion process is defined as a Markov chain:
$\mathcal{N}(x, \mu, \sigma)$ refers to the density probability function of normal distribution having the value x. To obtain the probability it must be integrated over the entire image space.
The inverse process is defined as the denoising process that converts a denoised image $x_T$ into a functional image. The whole process can be defined as a Markov chain:
$\theta$ refers to the parameters of the neural network. If any element contains $\theta$ it means that it was calculated through NN and its parameters.
By definition $p_\theta$ and $q$ are inverse processes of themselves. It can be shown that the inverse process of a Gaussian is a Gaussian (reference to demonstration) so:
Now that we have defined the stochastic process, it is time to define the loss function, as we have a probability density function over images, higer the probability of a generated image it is, better the image it is. Taking this into account the loss function could be defined as:
As $x_T$ is generated thoug a normal standard distribution and no apportation of NN weights to the computation of this loss and we consider as hyperparameters $\beta_t$ in terms of model optimizaation we consider $\mathcal{L}_T$ as a constant loss function in terms of $\theta$.
With the goal of avoiding to overfit the model it is considered
Denoised image contribution
If we suppose that we are working with unit8 images, each pixel contains a number between 0 and 255, we are going to work with this values normalized into the interval [-1, 1]. If we consider a pixel discrete pixel value into a continues space we should consider:
$$
x \rightarrow [x - \frac{1}{255}, x + \frac{1}{255}]
$$
This transformation fully transforms discrete pixel values into continous space. For completion into the real space, we consider in limits intervals: $(+\infty, x + \frac{1}{255}]$ and $[x - \frac{1}{255}, +\infty)$.
So the result of conditioned probability would be the integral over each pixel (and channel) $D$: