Denoising Score matching
Idea
Add Gaussion noise to cur original distribution, run score matching on the noised data.
key trick
Rewrite SM loss on noised data to not involove divergence
Notation
Let $\mathbb{D}$ denote origi dist on $\mathbb{R}^d$
Let $P_\sigma$ denote the noised distribution:
$$P_\sigma = Law(\tilde{X}), \quad \tilde{X} = X+\sigma w$$
$$X \sim \mathbb{D}, \quad \sigma \gt 0$$
- $X$ is orgi data
- Nice property: For any $\sigma \gt 0$, $P_\sigma(\tilde{X})$ has a density on $\mathbb{R}^d$, no matter what $\mathbb{D}$ is.
Therefore, we can run SM on $P_\sigma(\tilde{X})$.
population level:
$$L[\theta] = \underset{\tilde{X} \sim P_\sigma(.)}{\mathbb{E}} [|S_\theta(\tilde{X})|^2+2 \cdot \nabla \cdot S_\theta(\tilde{X})]$$
sample level:
$$\hat{L}[\theta] = [\frac{1}{n} \sum_{i=1}^n |S_\theta(\tilde{X})|^2]+2 \cdot \nabla \cdot S_\theta(\tilde{X})$$
- $x_1,…x_n \sim \mathbb{D}$
- $w_1,…w_n \sim N(0,I)$,—this is a normal dist noise
- $\tilde{x_i} = x_i+\sigma \cdot w_i$
Gaussion structure
$$
\begin{aligned}
P_\sigma(\tilde{X}) &= \underset{X \sim D}{\mathbb{E}}[q_\sigma(\tilde{X}|X)] \\
&=\int q_\sigma(\tilde{X}|X) \cdot p(x)dx
\end{aligned}
$$
- $q_\sigma(\tilde{X}|X)$ is the desity of $N(X, \sigma^2I_d)$
- $\mathbb{D}$ has a density $p(x)$
$$
\underset{\tilde{X} \sim P_\sigma}{\mathbb{E}}|S_\sigma(\tilde{X})-\nabla_x logP_\sigma(\tilde{X})|^2 = \\
\underset{\tilde{X} \sim P_\sigma}{\mathbb{E}}|S_\sigma(\tilde{X})|^2+\underset{\tilde{X} \sim P_\sigma}{\mathbb{E}}|\nabla_x logP_\sigma(\tilde{X})|^2-2\underset{\tilde{X} \sim P_\sigma}{\mathbb{E}}<S_\sigma(\tilde{X}), \nabla_x logP_\sigma(\tilde{X})>
$$
Expend the middle without using IBP
$$
\underset{\tilde{X} \sim P_\sigma}{\mathbb{E}}<S_\sigma(\tilde{X}), \nabla_{\tilde{x}} logP_\sigma(\tilde{X})> \\
= \int<S_\sigma(\tilde{X}), \nabla_{\tilde{x}} logP_\sigma(\tilde{X})> \cdot P_\sigma(\tilde{X}) d \tilde{x}
$$
small trick:
$$
\nabla_{\tilde{x}} log P_\sigma = \frac{\nabla_{\tilde{x}} P_\sigma(\tilde{x})}{P_\sigma} \\
P_\sigma \cdot \nabla_{\tilde{x}} log P_\sigma = \nabla_{\tilde{x}} P_\sigma(\tilde{x})
$$
Now, we can get next step without using IBP, but in SM, we used it.
$$
\begin{aligned}
&\underset{\tilde{X} \sim P_\sigma}{\mathbb{E}}<S_\sigma(\tilde{X}), \nabla_{\tilde{x}} logP_\sigma(\tilde{X})> \\
&= \int<S_\sigma(\tilde{X}), \nabla_{\tilde{x}} P_\sigma(\tilde{X})> d \tilde{x}\\
&=\int<S_\sigma(\tilde{X}), \nabla_{\tilde{x}} \underset{X \sim D}{\mathbb{E}}[q_\sigma(\tilde{X}|X)]> d \tilde{x} \\
&=\int<S_\sigma(\tilde{X}), \underset{X \sim D}{\mathbb{E}}[\nabla_{\tilde{x}} q_\sigma(\tilde{X}|X)]> d \tilde{x} \\
&=go \quad down
\end{aligned}
$$
small trick:
$$
\nabla_{\tilde{x}}[h_1(\tilde{x})+h_2(\tilde{x})] = \nabla_{\tilde{x}}h_1(\tilde{x}) + \nabla_{\tilde{x}}h_2(\tilde{x}) \\
\nabla_{\tilde{x}}[\sum_{k=1}^H h_k(\tilde{x})]=\sum_{k=1}^H[\nabla_{\tilde{x}}h_k(\tilde{x})]
$$
$$
\begin{aligned}
& from \quad abve \\
&= \underset{X \sim D}{\mathbb{E}} \int<S_\sigma(\tilde{X}), \nabla_{\tilde{x}} q_\sigma(\tilde{X}|X)> d \tilde{x} \\
\end{aligned}
$$
small trick:
we can swap order of integration under some regularity condition??? why?
$$
\begin{aligned}
& from \quad abve \\
&= \underset{X \sim D}{\mathbb{E}} \int<S_\sigma(\tilde{X}), \nabla_{\tilde{x}} log q_\sigma(\tilde{X}|X)>q_\sigma(\tilde{X}|X) d \tilde{x} \\
&= \underset{X \sim D}{\mathbb{E}} \underset{\tilde{X}|X}{\mathbb{E}}<S_\sigma(\tilde{X}), \nabla_{\tilde{x}} log q_\sigma(\tilde{X}|X)> \\
&= \underset{\tilde{X}|X}{\mathbb{E}}<S_\sigma(\tilde{X}), \nabla_{\tilde{x}} log q_\sigma(\tilde{X}|X)> \\
\end{aligned}
$$
From very far beginning:
$$
\underset{\tilde{X} \sim P_\sigma}{\mathbb{E}}|S_\sigma(\tilde{X})-\nabla_{\tilde{x}} logP_\sigma(\tilde{X})|^2 = \\
\underset{\tilde{X} \sim P_\sigma}{\mathbb{E}}|S_\sigma(\tilde{X})+\nabla_{\tilde{x}} logP_\sigma(\tilde{X})|^2-2\underset{(\tilde{X},X)}{\mathbb{E}}<S_\sigma(\tilde{X}), \nabla_{\tilde{x}} log q_\sigma(\tilde{X}|X)>
$$
Another start:
$$
\underset{(\tilde{X},X)}{\mathbb{E}}|S_\sigma(\tilde{X}) - \nabla_{\tilde{x}} log q_\sigma(\tilde{X}|X)|^2 = \\
\underset{(\tilde{X},X)}{\mathbb{E}}[|S_\sigma(\tilde{X})|^2 + |\nabla_{\tilde{x}} log q_\sigma(\tilde{X}|X)|^2 -2<S_\sigma(\tilde{X}), \nabla_{\tilde{x}} log q_\sigma(\tilde{X}|X)>]
$$
Combine these two equation:
$$
\underset{\tilde{X} \sim P_\sigma}{\mathbb{E}}|S_\sigma(\tilde{X})-\nabla_{\tilde{x}} logP_\sigma(\tilde{X})|^2 = \\
\underset{(\tilde{X},X)}{\mathbb{E}}|S_\sigma(\tilde{X}) - \nabla_{\tilde{x}} log q_\sigma(\tilde{X}|X)|^2 - \underset{(\tilde{X},X)}{\mathbb{E}}|\nabla_{\tilde{x}} log q_\sigma(\tilde{X}|X)|^2 + \underset{(\tilde{X},X)}{\mathbb{E}}|\nabla_{\tilde{x}} logP_\sigma(\tilde{X})|^2
$$
The last two parts do not depends on $S_\theta$! thus they are const? We need to care first part!
What is $log q_\sigma(\tilde{X}|X)$:
from above, due to normal dist:
$log q_\sigma(\tilde{X}|X) = -\frac{1}{2\sigma^2}|\tilde{X}-X|^2+c \cdot \sigma$
Thus:
$\nabla_{\tilde{x}} log q_\sigma(\tilde{X}|X) = -\frac{1}{\sigma^2}(\tilde{X}-X)$
Hence:
$$
\underset{\tilde{X} \sim P_\sigma}{\mathbb{E}}|S_\sigma(\tilde{X})-\nabla_{\tilde{x}} logP_\sigma(\tilde{X})|^2 = \\
\underset{(\tilde{X},X)}{\mathbb{E}}|S_\sigma(\tilde{X}) + \frac{1}{\sigma^2}(\tilde{X}-X)|^2 + const
$$
Above is an instance of Stein’s lemma(Gaussion integration by parts)\
Thus:
$$\hat{L}[\theta] = \frac{1}{n} \sum_{i=1}^n |S_\theta(\tilde{x_i}) +\frac{1}{\sigma^2}(\tilde{x_i}-x_i)|^2$$
Tweedie’s Formula:
$$
min_{f:X \to Y} \mathbb{E}|f(X)-Y|^2
$$
optimal solution:
- $f(X) = \mathbb{E}[Y|X]$, $Y$ is condition on $X$
Given:
$$
\underset{\tilde{X} \sim P_\sigma}{\mathbb{E}}|S_\sigma(\tilde{X})-\nabla_{\tilde{x}} logP_\sigma(\tilde{X})|^2 = \\
\underset{(\tilde{X},X)}{\mathbb{E}}|S_\sigma(\tilde{X}) - \nabla_{\tilde{x}} log q_\sigma(\tilde{X}|X)|^2 + const
$$
we know optimal solution is $S_\sigma(\tilde{X}) = \nabla_{\tilde{x}} log q_\sigma(\tilde{X}|X)$
another optimal solution is $S_\sigma(\tilde{X}) = \mathbb{E}[\nabla_{\tilde{x}} log q_\sigma(\tilde{X}|X)|\tilde{X}]$
which is $\mathbb{E}[-\frac{1}{\sigma^2}(\tilde{X}-X)|\tilde{X}]$
equal to $-\frac{1}{\sigma^2}(\tilde{X}-\mathbb{E}[X|\tilde{X}])$\
Thus, $\nabla_{\tilde{x}} log q_\sigma(\tilde{X}|X) = \frac{1}{\sigma^2}(\mathbb{E}[X|\tilde{X}] - \tilde{X})$
Another point:
$\mathbb{E}[-\frac{1}{\sigma^2}(\tilde{X}-X)|\tilde{X}] = \mathbb{E}[-\frac{1}{\sigma^2}(\sigma w)|\tilde{X}] = -\frac{1}{\sigma} \mathbb{E}[w|\tilde{X}]$
due to $\tilde{X}-X = \sigma w$