Diffusion Models and Normalizing Flows: A Unified View

1. Two paradigms, one transport perspective

Normalizing flows: learn invertible map T_θ such that x = T_θ(z), z ~ p₀, with exact likelihood via change of variables.
Diffusion models: define forward noising process and learn reverse-time denoising dynamics.

Both can be seen as transporting a simple base distribution to data.

2. Normalizing flow objective

For invertible T_θ,

log p_θ(x) = log p₀(T_θ^{-1}(x)) + log|det J_{T_θ^{-1}}(x)|

Training maximizes exact log-likelihood.

For continuous normalizing flows (CNFs), with ODE

dx_t/dt = v_θ(x_t, t)

the log-density evolves as

(d/dt) log p_t(x_t) = -\nabla\cdotv_θ(x_t, t)

3. Diffusion ELBO and score matching

Forward SDE:

dx_t = f(t)x_t dt + g(t)dW_t

Reverse-time SDE:

dx_t = [f(t)x_t - g(t)²\nabla_x log p_t(x_t)]dt + g(t)dW̄_t

Score model s_θ(x, t) ≈ ∇_x log p_t(x) is trained with denoising score matching objective

L_DSM(θ) = E_{t,x₀,ε}[w(t)||s_θ(x_t, t) - ∇_{x_t} log p(x_t|x₀)||²]

4. Theorem: probability flow ODE equivalence

For the forward SDE above, define ODE

dx_t/dt = f(t)x_t - (1/2)g(t)²\nabla_x log p_t(x_t)

Theorem 1.

The ODE has the same marginal densities p_t as the SDE.

The SDE density follows a Fokker-Planck PDE:

\partial_t p_t = -\nabla\cdot(fx p_t) + (1/2)g²Δ p_t

For the deterministic ODE with velocity

v_t(x) = f(t)x - (1/2)g(t)²\nablalog p_t(x)

the continuity equation gives

\partial_t p_t = -\nabla\cdot(v_t p_t) = -\nabla\cdot(fx p_t) + (1/2)g²\nabla\cdot(p_t\nablalog p_t) = -\nabla\cdot(fx p_t) + (1/2)g²Δ p_t

Thus PDEs match; marginals coincide. □

5. Relation to flows

Probability-flow ODE induces a continuous flow with instantaneous change-of-variables

(d/dt) log p_t(x_t) = -\nabla\cdotv_t(x_t)

which is exactly CNF machinery, but with vector field tied to score dynamics. Therefore diffusion can be interpreted as learning a flow field through score estimation rather than direct Jacobian-parameterized transport.

6. Likelihoods, speed, and inductive bias

Flows: exact likelihood, invertibility constraints, sometimes less expressive per FLOP.
Diffusion: flexible, strong sample quality, expensive sampling unless accelerated.
Hybrid methods: flow matching, rectified flows, consistency distillation reduce sampling steps while preserving transport interpretation.

7. Convergence and discretization error

If reverse dynamics are integrated with step size h, weak error of Euler-Maruyama is O(h) under regularity assumptions; higher-order solvers can improve to O(h^p) but require smoother score fields and accurate Jacobian-vector products.

Key connections:

Diffusion models can be viewed as learning continuous normalizing flows through score matching
Probability flow ODEs provide deterministic sampling paths with same marginals as stochastic SDEs
Hybrid methods combine the strengths of both paradigms for efficient high-quality generation