Markov Chain Monte Carlo - nakaly’s notes

One of the goal of the current my study about statistics is understanding Markov Chain Monte Carlo(MCMC).

I would like to write about one of the popular algorithms of MCMC, Metropolis–Hastings algorithm. It’s a kind of rejection sampling algorithm.

Let $P(x)$ be a probability distribution of a target distribution( $P$ ) that you would like to draw samples from.

Let $g(x)$ be a probability distribution of a proposal distribution( $G$ that you already know how to draw samples from.

Draw a sample $x_{n+1}$ from $P$ (The previous sample is $x_n$ )
Accept the sample with a probability $\min\left(1,\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)}\right)$ . Otherwise, reject it.

It means that you always accept the sample when $\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)} \geq 1$ . And you accept the sample with the probability $\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)}$ when it’s not.
Repeat 1 and 2 util the sample’s probability distribution is close enough to the target.

Let’s think about why this algorithm works.

Reversible Markov chain that I explained in the previous post plays an important role of this algorithm.

Let’s assume that samples from the target probability distribution are generated based on a reversible Markov chain which has a stationary distribution $\pi(x) = P(x)$ .

Since the Markov chain is reversible, each transition( $x_n$ -> $x_{n+1}$ ) of the Markov chain fulfils the following condition:

$P(x_n | x_{n+1})P(x_{n+1}) = P(x_{n+1} | x_n)P(x_n)$

When $\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)} \geq 1$ is true, we always accept a sample.

$\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)} \geq 1 \Leftrightarrow P(x_{n+1})g(x_n | x_{n+1}) \geq P(x_n)g(x_{n+1} | x_n)$

If $P(x) = g(x)$ , then $P(x_{n+1})g(x_n | x_{n+1}) =P(x_n)g(x_{n+1} | x_n)$ , but it’s not. It means the proposal distribution $g(x_{n+1} | x_n)$ causes the chain to move from $x_n$ to $x_{n+1}$ too rarely and from $x_{n+1}$ to $x_n$ too often. Even though this condition, we get $x_{n+1}$ from G. Then we accept the sample $x_{n+1}$ .

On the other hand, when $\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)}$ < 1, we accept the sample with the probability $\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)}$ .

$\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)} \Leftrightarrow P(x_{n+1})g(x_n | x_{n+1})$ < $P(x_n)g(x_{n+1} | x_n)$

It means the proposal distribution $g(x_{n+1} | x_n)$ causes the chain to move from $x_n$ to $x_{n+1}$ too often and from $x_{n+1}$ to $x_n$ too rarely.

We need to adjust the transition by using acceptance distribution $A(x_{n+1} | x_n)$ such that:

$P(x_{n+1})g(x_n | x_{n+1}) =A(x_{n+1} | x_n) P(x_n)g(x_{n+1} | x_n)$

$A(x_{n+1} | x_n) = \frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)}$

This is my intuitive understanding of Metropolis–Hastings algorithm.

More formal way of explanation from wikipedia is as follows:

$P(x_n | x_{n+1})P(x_{n+1}) = P(x_{n+1} | x_n)P(x_n)$ can be re-written as

$\frac{P(x_{n+1} | x_n)}{P(x_n | x_{n+1})} = \frac{P(x_{n+1})}{P(x_n)}$

We draw a sample from the proposal distribution G and accept it with a certain probability $A(x_{n+1} | x_n)$ to simulate the target distribution P.

$P(x_{n+1} | x_n) = A(x_{n+1} | x_n) g(x_{n+1} | x_n)$

Let’s insert this to the previous equation.

$\frac{A(x_{n+1} | x_n) g(x_{n+1} | x_n)}{A(x_ | x_{n+1}) g(x_n | x_{n+1})} = \frac{P(x_{n+1})}{P(x_n)}$

$\Leftrightarrow \frac{A(x_{n+1} | x_n) }{A(x_ | x_{n+1}) } = \frac{P(x_{n+1})g(x_n | x_{n+1})}{P(x_n)g(x_{n+1} | x_n)}$

Then we need to choose a acceptance distribution $A(x_{n+1} | x_n)$ which fulfils this condition. The choice is used in the Metropolis–Hastings algorithm is as follows:

$A(x_{n+1} | x_n) = \min\left(1,\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)}\right)$

This fulfils the previous condition because

When $\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)} \geq 1$ , then $A(x_{n+1} | x_n) = 1$ .

When $\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)}$ < 1 $\Leftrightarrow \frac{P(x_n)}{P(x_{n+1})}\frac{g(x_{n+1} | x_n)}{g(x_n | x_{n+1})}$ > 1, then $A(x_ | x_{n+1}) = 1$ .