Summary Diffusion Model

$$
\begin{aligned}
& X(data) \overset{q(z_1|x)}{\rightharpoonup} z_1 \overset{q(z_2|z_1)}{\rightharpoonup} z_2 … z_{T-1} \overset{q(z_{T}|z_{T-1})}{\rightharpoonup} z_T (N(0,I))\\
& X(data) \underset{P_\theta(z|z_1)}{\leftharpoondown} z_1 \underset{P_\theta(z_1|z_2)}{\leftharpoondown} z_2 … z_{T-1} \underset{P_\theta(z_{T-1}|z_T)}{\leftharpoondown} z_T (N(0,I))
\end{aligned}
$$

Do not know how to compute: $q(z_{t-1}|z_t)$
We only know: $q(z_{t-1}|z_t, x)$

We want to minimize the KL:
$$
KL(q(z_{t-1}|z_t,x)||P_\theta(z_{t-1}|z_t))
$$

We computed:
$$
q(z_{t-1}|z_t,x) = N(\mu_q(z_t,X,t), \sigma_q^2(t) I)
$$

We parameterized
$$
P_\theta(z_{t-1}|z_t) = N(\mu_\theta(z_t,t), \sigma_q^2(t) I)
$$

We plugging in $P_\theta$:
$$
KL(q(z_{t-1}|z_t,x)||P_\theta(z_{t-1}|z_t)) = \frac{1}{2 \sigma_q^2(t)}|\mu_q(z_t,X,t)-\mu_\theta(z_t,t)|^2
$$

Add parameterization 1:

estimate $x$ directly:
emulate the left part!
$$
\mu_\theta(z_t,t) = \mu_q(z_t, f_\theta(z_t, t), t)
$$

Add parameterization 2: based on above??

estimate the noise added to $x$ to make $z_t$:
emulate the left part!
$$
q(z_t|x) = N(\sqrt{ \bar{\alpha_t}}x, (1-\bar{\alpha_t})I) \\
Z_t \overset{d}{=} \sqrt{\bar{\alpha_t}}x + \sqrt{(1-\bar{\alpha_t})}w \quad , w \sim N(0,I) \\
x = \frac{1}{\sqrt{\bar{\alpha_t}}} [Z_t-\sqrt{(1-\bar{\alpha_t})}w]
$$
$w$ is not know
Then, same again:
$$
\begin{aligned}
\mu_\theta(z_t,t) &= \mu_q(z_t, \frac{1}{\sqrt{\bar{\alpha_t}}} [Z_t-\sqrt{(1-\bar{\alpha_t})}w] , t) \\
&= \mu_q(z_t, \frac{1}{\sqrt{\bar{\alpha_t}}} [Z_t-\sqrt{(1-\bar{\alpha_t})}f_\theta(z_t,t)] , t)
\end{aligned}
$$

Together

We want to optimize:
$$
KL(q(z_{t-1}|z_t,x)||P_\theta(z_{t-1}|z_t)) = \frac{1}{2 \sigma_q^2(t)}|\mu_q(z_t,X,t)-\mu_\theta(z_t,t)|^2
$$

$$
\underset{q(z_t|x)}{\mathbb{E}} \left[ KL(q(z_{t-1}|z_t,x)||P_\theta(z_{t-1}|z_t)) \right]= \frac{1-\alpha_t}{2 (1-\bar{\alpha}_{t-1}) \alpha_t} \underset{w \sim N(0, I)}{\mathbb{E}} |w-f_\theta(\sqrt{\bar{\alpha}_t}x+\sqrt{1-\bar{\alpha}_t}w,t)|^2
$$

Remember that:
$$
\begin{aligned}
logP_\theta(x) &\geq \underset{q(z_{1:T}|x)}{\mathbb{E}} log \frac{P_\theta(x, z_{1:T})}{q(z_{1:T}|x)} \\
&= \underset{q(z_1|x)}{\mathbb{E}} log P_\theta(x|z_1)-KL(q(z_T|x_0)||P_\theta(z_T))-\sum_{t=1}^{T-1} \underset{q(z_{t-1}, z_t, z_{t+1}|x)}{\mathbb{E}} \left[ log \frac{q(z_t|z_{t-1})}{P_\theta(z_t|z_{t+1})}\right] \\
&= \underset{q(z_1|x)}{\mathbb{E}} log P_\theta(x|z_1)-KL(q(z_T|x_0)||P_\theta(z_T))-\sum_{t=2}^{T} \underset{q(z_{t-1}, z_t|x)}{\mathbb{E}} \left[ log \frac{q(z_{t-1}|z_t,x)}{P_\theta(z_{t-1}|z_t)}\right] \\
&= \underset{q(z_1|x)}{\mathbb{E}} log P_\theta(x|z_1)-KL(q(z_T|x_0)||P_\theta(z_T))-\sum_{t=2}^{T} \underset{q(z_t|x)}{\mathbb{E}} KL(q(z_{t-1}|z_t,x)||P_\theta(z_{t-1}|z_t)) \\
&= \underset{q(z_1|x)}{\mathbb{E}} log P_\theta(x|z_1)-KL(q(z_T|x_0)||P_\theta(z_T))-\sum_{t=2}^{T} \frac{1-\alpha_t}{2 (1-\bar{\alpha}_{t-1}) \alpha_t} \underset{w \sim N(0, I)}{\mathbb{E}} |w-f_\theta(\sqrt{\bar{\alpha}_t}x+\sqrt{1-\bar{\alpha}_t}w,t)|^2 \\
\end{aligned}
$$

$q(z_T|x_0) \approx N(0,I)$ and $P_\theta(z_T) = N(0,I)$, Thus, no need to care second part!
Ignore the first part if $T$ is large enough, then $P_\theta(x|z_1)$ term can be ignored!

If we don ignore the first part, we can model it!!
$$P_\theta(x|z_1) = N(\mu_0(z_1, 1), \sigma_1^2I)$$
What we pick for $\mu_0(z_1, 1)$?
we can just pick same model in parameterization 2, then get the similar answer!
$P_\theta(z_{t-1}|z_t)$ from above for $\mu_0(z_1, 1)$

Again and again sum!!
$$
\underset{q(z_{1:T}|x)}{\mathbb{E}} log \frac{P_\theta(x, z_{1:T})}{q(z_{1:T}|x)} = \sum_{t=1}^{T} \frac{1-\alpha_t}{2 \beta_t \alpha_t} \underset{w_t \sim N(0, I)}{\mathbb{E}} |w_t-f_\theta(\sqrt{\bar{\alpha}_t}x+\sqrt{1-\bar{\alpha}_t}w_t,t)|^2 +const
$$

$\beta_t = \sigma_1^2$ if $t=1$
$\beta_t = 1-\bar{\alpha}_{t-1}$ if $t \geq 2$
above equation is only for single $x$, which is $\leq logP_\theta(x)$

Now, take expection respect to $P(.)$:
$$
\underset{x\sim P(.)}{\mathbb{E}} [logP_\theta(x)] \geq \sum_{t=1}^{T} \psi_t \underset{(x,w_t) \sim (P(x),N(0, I))}{\mathbb{E}} |w_t-f_\theta(\sqrt{\bar{\alpha}_t}x+\sqrt{1-\bar{\alpha}_t}w_t,t)|^2
$$

$\psi_t$ is scaling factor from before, which is that $\frac{…}{…}$

Consider the finite sample!!!

start with $x_1,…,x_n \sim p(.)$
$$
\hat{Ln}(\theta) = \frac{1}{n} \sum_{i=1}^n \sum_{t=1}^{T} \psi_t \underset{(w_t) \sim (N(0, I))}{\mathbb{E}} |w_t-f_\theta(\sqrt{\bar{\alpha_t}}x_i+\sqrt{1-\bar{\alpha_t}}w_t,t)|^2
$$
draw $w_t^i \sim N(0, I)$
$$
\hat{Ln}(\theta) = \frac{1}{n} \sum_{i=1}^n \left[ \sum_{t=1}^{T} \psi_t |w_t^i-f_\theta(\sqrt{\bar{\alpha}_t}x_i+\sqrt{1-\bar{\alpha}_t}w_t^i,t)|^2 \right]
$$

Decrease computation!!

$\rho$ is a dist over ${1,…,T}$
$\rho(t) \propto \psi_t$
draw $t_i \sim \rho(.), \quad i = 1,…,n, \quad w^i \sim N(0,I)$
$$
\hat{Ln}(\theta) = \frac{1}{n} \sum_{i=1}^n |w^i-f_\theta(\sqrt{\bar{\alpha_{t_i}}}x_i+\sqrt{1-\bar{\alpha_{t_i}}}w^i,t_i)|^2
$$

Dirty scret??

In practice, set $\psi_t = 1$. Prof said it will work, okey~

Real meaning

We just draw a normal distribution, then put in the the trained model, then get new data!!

Sampling

We have a $z_T \sim N(0,I)$

$$
z_{t-1} = \frac{1}{\sqrt{\alpha_t}} \left( z_t - \frac{1-\alpha_t}{\sqrt{1-\bar{\alpha_t}}} f_\theta (z_t, t) + \sqrt{\frac{(1-\bar{\alpha}_{t-1})(1-\alpha_t)}{1-\bar{\alpha}_t}} \cdot w_t\right) \quad, t \in {T,…,1}
$$

$w_t \sim N(0, I)$
$w_{t_1} \perp w_{t_2}$ for $t_1 \neq t_2$

Return $z_0$ as the sample!!