English (unofficial) translations of posts at kexue.fm
Source

Deriving the Continuity Equation and Fokker-Planck Equation via the Test Function Method

Translated by Gemini Flash 3.0 Preview. Translations can be inaccurate, please refer to the original post for important stuff.

In the article "Generative Diffusion Model Chat (6): ODEs in the General Framework", we derived the Fokker-Planck equation for SDEs; while in "Generative Diffusion Model Chat (12): ’Hard-Core’ Diffusion ODEs", we separately derived the continuity equation for ODEs. Both are equations describing the change in the distribution of a random variable as it evolves along an SDE/ODE, with the continuity equation being a special case of the Fokker-Planck equation. When deriving the Fokker-Planck equation previously, we applied Taylor expansion directly to the Dirac delta function. Although the result was correct, it was somewhat unconventional. In deriving the continuity equation, we combined the Jacobian determinant and Taylor expansion; while that method is more standard, it cannot be easily generalized to the Fokker-Planck equation.

In this article, we introduce the "Test Function Method." It is one of the standard methods for deriving the continuity equation and the Fokker-Planck equation. Its analytical process is formal and its applicable scenarios are quite broad.

Integration by Parts

Before the formal derivation, we first introduce a key result that will be used later—the high-dimensional generalization of integration by parts.

General tutorials usually limit the introduction of integration by parts to the one-dimensional case, namely: \begin{equation} \int_a^b uv'dx = uv|_a^b - \int_a^b vu'dx \end{equation} where u, v are functions of x, and ' denotes the derivative with respect to x. We need a high-dimensional version of this. To that end, let us first review the derivation of one-dimensional integration by parts, which relies on the product rule for differentiation: \begin{equation} (uv)' = uv' + vu' \end{equation} Integrating both sides with respect to x and rearranging terms yields the integration by parts formula. For the high-dimensional case, we consider a similar formula: \begin{equation} \nabla\cdot(u\boldsymbol{v}) = \boldsymbol{v}\cdot\nabla u + u\nabla \cdot\boldsymbol{v} \end{equation} where u is a scalar function of \boldsymbol{x}, \boldsymbol{v} is a vector function of \boldsymbol{x} (of the same dimension as \boldsymbol{x}), and \nabla denotes the gradient with respect to \boldsymbol{x}. Now we integrate both sides over a region \Omega: \begin{equation} \int_{\Omega}\nabla\cdot(u\boldsymbol{v})d\boldsymbol{x} = \int_{\Omega}\boldsymbol{v}\cdot\nabla u d\boldsymbol{x} + \int_{\Omega}u\nabla \cdot\boldsymbol{v} d\boldsymbol{x} \end{equation} According to the Gauss Divergence Theorem, the left side equals \int_{\partial\Omega}u\boldsymbol{v}\cdot\hat{\boldsymbol{n}}dS, where \partial\Omega is the boundary of \Omega, \hat{\boldsymbol{n}} is the outward unit normal vector of the boundary, and dS is the area element. Therefore, after rearranging, we have: \begin{equation} \int_{\Omega}\boldsymbol{v}\cdot\nabla u d\boldsymbol{x} = \int_{\partial\Omega}u\boldsymbol{v}\cdot\hat{\boldsymbol{n}}dS - \int_{\Omega}u\nabla \cdot\boldsymbol{v} d\boldsymbol{x} \label{eq:int-by-parts} \end{equation} This is the high-dimensional integration by parts formula we seek. In particular, for a probability density function p, due to the constraints of non-negativity and the integral summing to 1, it must hold that p \to 0 and \nabla p \to \boldsymbol{0} at infinity. Thus, if \Omega is chosen as the entire space (where the integration region is not specifically noted, it defaults to the entire space), then by substituting u=p and \boldsymbol{v} into the above equation, we obtain: \begin{align} \int\boldsymbol{v}\cdot\nabla p d\boldsymbol{x} =&\, - \int p\nabla \cdot\boldsymbol{v} d\boldsymbol{x} \label{eq:int-by-parts-p} \\ \int u\nabla \cdot\nabla p d\boldsymbol{x} = &\,-\int\nabla p\cdot\nabla u d\boldsymbol{x} \label{eq:int-by-parts-gp} \end{align} To further formalize the above conclusions, one could assume that p has compact support. However, this is purely for mathematical rigor; for general understanding, it is sufficient to assume p \to 0 and \nabla p \to \boldsymbol{0} at infinity.

ODE Evolution

The principle of the test function method is that if for any function \phi(\boldsymbol{x}), it holds that: \begin{equation} \int f(\boldsymbol{x})\phi(\boldsymbol{x})d\boldsymbol{x} = \int g(\boldsymbol{x})\phi(\boldsymbol{x})d\boldsymbol{x} \end{equation} then f(\boldsymbol{x})=g(\boldsymbol{x}) holds. Here, \phi(\boldsymbol{x}) is called the test function. A more rigorous definition would require declaring the space from which \phi(\boldsymbol{x}) is chosen and the specific meaning of the equality (such as strict equality, equality almost everywhere, or equality in probability); we will not delve into these details here.

For the ODE: \begin{equation} \frac{d\boldsymbol{x}_t}{dt}=\boldsymbol{f}_t(\boldsymbol{x}_t) \label{eq:ode} \end{equation} we discretize it as: \begin{equation} \boldsymbol{x}_{t+\Delta t} = \boldsymbol{x}_t + \boldsymbol{f}_t(\boldsymbol{x}_t)\Delta t \label{eq:ode-diff} \end{equation} Then we have: \begin{equation} \phi(\boldsymbol{x}_{t+\Delta t}) = \phi(\boldsymbol{x}_t + \boldsymbol{f}_t(\boldsymbol{x}_t)\Delta t)\approx \phi(\boldsymbol{x}_t) + \Delta t\,\,\boldsymbol{f}_t(\boldsymbol{x}_t)\cdot\nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t) \end{equation} Taking the expectation on both sides, we get: \begin{equation} \int p_{t+\Delta t}(\boldsymbol{x}_{t+\Delta t})\phi(\boldsymbol{x}_{t+\Delta t}) d\boldsymbol{x}_{t+\Delta t}\approx \int p_t(\boldsymbol{x}_t)\phi(\boldsymbol{x}_t)d\boldsymbol{x}_t + \Delta t\int p_t(\boldsymbol{x}_t)\boldsymbol{f}_t(\boldsymbol{x}_t)\cdot\nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t)d\boldsymbol{x}_t \end{equation} Since the result of the integration does not depend on the notation of the variable of integration, replacing \boldsymbol{x}_{t+\Delta t} with \boldsymbol{x}_t on the left side is equivalent: \begin{equation} \int p_{t+\Delta t}(\boldsymbol{x}_t)\phi(\boldsymbol{x}_t) d\boldsymbol{x}_t\approx \int p_t(\boldsymbol{x}_t)\phi(\boldsymbol{x}_t)d\boldsymbol{x}_t + \Delta t\int p_t(\boldsymbol{x}_t)\boldsymbol{f}_t(\boldsymbol{x}_t)\cdot\nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t)d\boldsymbol{x}_t \label{eq:change-var} \end{equation} Moving the first term on the right to the left and taking the limit \Delta t\to 0, we obtain: \begin{equation} \int \frac{\partial p_t(\boldsymbol{x}_t)}{\partial t}\phi(\boldsymbol{x}_t) d\boldsymbol{x}_t = \int p_t(\boldsymbol{x}_t)\boldsymbol{f}_t(\boldsymbol{x}_t)\cdot\nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t)d\boldsymbol{x}_t \label{eq:dt-0} \end{equation} Using the integration by parts formula [eq:int-by-parts-p] on the right side, we get: \begin{equation} \int \frac{\partial p_t(\boldsymbol{x}_t)}{\partial t}\phi(\boldsymbol{x}_t) d\boldsymbol{x}_t = -\int \Big[\nabla_{\boldsymbol{x}_t}\cdot\big(p_t(\boldsymbol{x}_t)\boldsymbol{f}_t(\boldsymbol{x}_t)\big)\Big]\phi(\boldsymbol{x}_t)d\boldsymbol{x}_t \end{equation} According to the equality principle of the test function method, we have: \begin{equation} \frac{\partial p_t(\boldsymbol{x}_t)}{\partial t} = -\nabla_{\boldsymbol{x}_t}\cdot\big(p_t(\boldsymbol{x}_t)\boldsymbol{f}_t(\boldsymbol{x}_t)\big) \end{equation} This is known as the "continuity equation."

SDE Evolution

For the SDE: \begin{equation} d\boldsymbol{x}_t = \boldsymbol{f}_t(\boldsymbol{x}_t) dt + g_t d\boldsymbol{w} \label{eq:sde} \end{equation} we discretize it as: \begin{equation} \boldsymbol{x}_{t+\Delta t} = \boldsymbol{x}_t + \boldsymbol{f}_t(\boldsymbol{x}_t) \Delta t + g_t \sqrt{\Delta t}\boldsymbol{\varepsilon},\quad \boldsymbol{\varepsilon}\sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{I}) \label{eq:sde-diff} \end{equation} Then: \begin{equation} \begin{aligned} \phi(\boldsymbol{x}_{t+\Delta t}) =&\, \phi(\boldsymbol{x}_t + \boldsymbol{f}_t(\boldsymbol{x}_t) \Delta t + g_t \sqrt{\Delta t}\boldsymbol{\varepsilon}) \\ \approx&\, \phi(\boldsymbol{x}_t) + \left(\boldsymbol{f}_t(\boldsymbol{x}_t) \Delta t + g_t \sqrt{\Delta t}\boldsymbol{\varepsilon}\right)\cdot \nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t) + \frac{1}{2} \left(g_t\sqrt{\Delta t}\boldsymbol{\varepsilon}\cdot \nabla_{\boldsymbol{x}_t}\right)^2\phi(\boldsymbol{x}_t) \end{aligned} \end{equation} Taking the expectation on both sides, note that the right side must be averaged over both \boldsymbol{x}_t and \boldsymbol{\varepsilon}. The expectation over \boldsymbol{\varepsilon} can be calculated first, yielding: \begin{equation} \phi(\boldsymbol{x}_t) + \Delta t\,\,\boldsymbol{f}_t(\boldsymbol{x}_t)\cdot \nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t) + \frac{1}{2} \Delta t\,g_t^2\nabla_{\boldsymbol{x}_t}\cdot\nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t) \end{equation} Thus: \begin{equation} \begin{aligned} &\,\int p_{t+\Delta t}(\boldsymbol{x}_{t+\Delta t})\phi(\boldsymbol{x}_{t+\Delta t}) d\boldsymbol{x}_{t+\Delta t}\\ \approx&\, \int p_t(\boldsymbol{x}_t)\phi(\boldsymbol{x}_t)d\boldsymbol{x}_t + \Delta t\int p_t(\boldsymbol{x}_t)\boldsymbol{f}_t(\boldsymbol{x}_t)\cdot\nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t)d\boldsymbol{x}_t + \int\frac{1}{2} \Delta t\,g_t^2 p_t(\boldsymbol{x}_t)\nabla_{\boldsymbol{x}_t}\cdot\nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t) d\boldsymbol{x}_t \end{aligned} \end{equation} Similar to equations [eq:change-var] and [eq:dt-0], taking the limit \Delta t\to 0 gives: \begin{equation} \int \frac{\partial p_t(\boldsymbol{x}_t)}{\partial t}\phi(\boldsymbol{x}_t) d\boldsymbol{x}_t = \int p_t(\boldsymbol{x}_t)\boldsymbol{f}_t(\boldsymbol{x}_t)\cdot\nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t)d\boldsymbol{x}_t + \int\frac{1}{2} \,g_t^2 p_t(\boldsymbol{x}_t)\nabla_{\boldsymbol{x}_t}\cdot\nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t) d\boldsymbol{x}_t \end{equation} Applying equation [eq:int-by-parts-p] to the first term on the right, and applying equation [eq:int-by-parts-gp] followed by [eq:int-by-parts-p] to the second term on the right, we get: \begin{equation} \int \frac{\partial p_t(\boldsymbol{x}_t)}{\partial t}\phi(\boldsymbol{x}_t) d\boldsymbol{x}_t = \int \left[-\nabla_{\boldsymbol{x}_t}\cdot\big(p_t(\boldsymbol{x}_t)\boldsymbol{f}_t(\boldsymbol{x}_t)\big)+\frac{1}{2}g_t^2 \nabla_{\boldsymbol{x}}\cdot\nabla_{\boldsymbol{x}}p_t(\boldsymbol{x})\right]\phi(\boldsymbol{x}_t)d\boldsymbol{x}_t \end{equation} According to the equality principle of the test function method: \begin{equation} \frac{\partial p_t(\boldsymbol{x}_t)}{\partial t} = -\nabla_{\boldsymbol{x}_t}\cdot\big(p_t(\boldsymbol{x}_t)\boldsymbol{f}_t(\boldsymbol{x}_t)\big)+\frac{1}{2}g_t^2 \nabla_{\boldsymbol{x}}\cdot\nabla_{\boldsymbol{x}}p_t(\boldsymbol{x}) \end{equation} This is the "Fokker-Planck equation."

Summary

This article introduced the test function method for deriving certain probability equations. The main content included the high-dimensional generalization of integration by parts, and the derivation process for the continuity equation of ODEs and the Fokker-Planck equation of SDEs.

Original Address: https://kexue.fm/archives/9461

For more details on reprinting, please refer to: "Scientific Space FAQ"