Multiple-Column Indexes on MySQL

MySQL can use multiple-column indexes for queries that use all the columns, the first two columns, the first three columns and so on.

MySQL :: MySQL 5.7 Reference Manual :: 9.3.5 Multiple-Column Indexes

The following index is equivalent to

INDEX(columnA, columnB, columnC)

the following inedex

INDEX(columnA, columnB, columnC)
INDEX(columnA, columnB)
INDEX(columnA)

See the rows of the following execution plans:

mysql> CREATE TABLE test (
    ->     id         INT NOT NULL,
    ->     last_name  CHAR(30) NOT NULL,
    ->     first_name CHAR(30) NOT NULL,
    ->     PRIMARY KEY (id),
    ->     INDEX name (last_name,first_name)
    -> );
Query OK, 0 rows affected (0.04 sec)

mysql> INSERT test (id, last_name, first_name) VALUES (1, 'last', 'first'), (2, 'last2','first2');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select * from test where last_name = 'last' and first_name = 'first';
+----+-----------+------------+
| id | last_name | first_name |
+----+-----------+------------+
|  1 | last      | first      |
+----+-----------+------------+
1 row in set (0.00 sec)

mysql> explain select * from test where last_name = 'last' and first_name = 'first';
+----+-------------+-------+------------+------+---------------+------+---------+-------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref         | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+-------------+------+----------+-------------+
|  1 | SIMPLE      | test  | NULL       | ref  | name          | name | 180     | const,const |    1 |   100.00 | Using index |
+----+-------------+-------+------------+------+---------------+------+---------+-------------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select * from test where last_name = 'last' ;
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | test  | NULL       | ref  | name          | name | 90      | const |    1 |   100.00 | Using index |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select * from test where first_name = 'first';
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | test  | NULL       | index | NULL          | name | 180     | NULL |    2 |    50.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

mysql>

Reversible Markov Chain

In the previous post, I introduced some of important Markov chain properties.

I would like to write about a Markov chain property that is used in Markov Chain Monte Carlo(mcmc), which is reversibility.

Reversibility

Let  X_n is a variable of a Markov chain. If there is a probability distribution  \pi such that:

 \displaystyle \pi_i P(X_{n+1} = j \mid X_{n} = i) = \pi_j P(X_{n+1} = i \mid X_{n} = j)

Then the Markov chain is said to be reversible.

One of trivial examples of reversible Markov chain is as follows:

f:id:nakaly:20170221030823p:plain:w300

A reversible Markov chain has a stationary distribution

When defining  p_{ij} as follows:

 \displaystyle p_{ij} = P(X_{n+1} = i \mid X_{n} = j)

The equation of revesibility can be written as follows:

 \displaystyle \pi_i p_{ij} = \pi_j p_{ji} for all states i,j of a Markov chain

Since the equation is true for all  i, when we sum up the left- and right-hand sides of the equation.

We get follows:

 \displaystyle \sum_i \pi_i p_{ij} = \sum_i \pi_j p_{ji} = \pi_j \sum_i p_{ji} = \pi_j

It means that we get the following equation.

 \displaystyle \pi_j = \sum_i \pi_i p_{ij}

This is the definition of a stationary distribution  \pi.

Markov Chain Properties

In the previous post, I gave the definition of Markov chain and as you can see in the page that I introduced in the previous post, a Markov chain can be represented by a directed graph. Each node represents a state and each edge of the graph represents a probability from one state to another state.

f:id:nakaly:20170218030945p:plain:w200

Imagine that there are various directed graphs. It means that there are various Markov chains. To understand Markov chain more deeply, you want to categorise Markov chain by statistic characteristics.

One of the most important properties that represents Markov chain statistic characteristics is ergodicity. Before understanding ergodicity, there are some other properties that you need to understand.

Reducibility

A Markov chain is said to be irreducible if it is possible to get to any state from any state.

f:id:nakaly:20170218032246p:plain:w700

Transience and recurrence

When you are given a state of a Markov chain, if there is a non-zero probability that you will never return to the state, the state is said to be transient.

If a state is not transient, it’s said to be recurrent.

f:id:nakaly:20170218040341p:plain:w200

The state A is transient even though it has a self loop. There is a probability that you will return to A when you start from A. But you will never return to A once you go to the state B. B, C, D are recurrent.

If the expected return time of a state is finite, the state is said to be positive recurrent.

To understand an expected return time, we need to define the first return time of a state i.

 \displaystyle T_i =  \inf \left\{ n\ge1: X_n = i \mid X_0 = i \right\}

And we define the probability of the first return time after  n steps as follows:

 \displaystyle f_{ii}^{(n)} = P(T_i = n)

Then we can define the expected return time  M_i as follows:

 \displaystyle M_i = \sum_{n=1}^{\infty} n\cdot f_{ii}^{(n)}

Periodicity

When you are given a state of a Markov chain, if you return to the same state after a multiples of k steps, k is said to be the period of the state.

f:id:nakaly:20170218042209p:plain:w500

If a state has 1 period, the state is said to be aperiodic.

Ergodicity

If a state is aperiodic and positive recurrent, the state is said to be ergodic. If all states in an irreducible Markov chain are ergodic, the Markov chain is said to be ergodic.

Stationarity

One of the remarkable characteristics of a ergodic Markov chain is that a ergodic Markov chain has a stationary distribution.

It means that regardless of the initial state, the proportion of time the chain spends in a state is going to be converge to a finite value.

Let’s give a formal definition of a stationary distribution of Markov chain.

Let  p_{ij} is a probability of a transition from a state  i to another state  j.

A Markov chain can be represented by a transition matrix.

  P = (p_{ij}) =

\begin{equation} \begin{bmatrix} p_{11} & p_{12} & \cdots & p_{1n} \\ p_{21} & p_{22} & \cdots & p_{2n} \\ \vdots & \vdots & p_{ij} & \vdots \\ p_{m1} & p_{m2} & \cdots & p_{mn} \end{bmatrix} \end{equation}

If a Markov chain is ergodic, there is a vector  \pi such as:

 \pi = \pi P

 \pi is said to be the stationary distribution of the Markov chain.

 \pi_i , which is the proportion of time the chain spends in a state i, is related to the expected return time  M_i.

 \pi_i = \frac{C}{M_I} ( C is the normalizing constant.)

Markov Chain

Markov Chain is a sequence of random variables ( X_1, X_2, X_3, \cdots) which has a property that the probability of moving to the next step depends only on the current state not on the previous states.

The probability of moving to the next step  P(X_{n+1}) which is defined as follows:

 \displaystyle
    P(X_{n+1}=x \mid X_1=x_1, X_2=x_2, \ldots, X_n=x_n)

depends only on the current state ( P(X_{n} = x_n)).

 \displaystyle
   =  P(X_{n+1}=x \mid X_n = x_n)

You can find a visual explanation in this page.

Bayes' theorem

When I learned Bayesian Inference, it recalls me the proof of Bayes' theorem.

It’s one of the simplest proof which derives an important theorem.

Let  \displaystyle A, B be an event. The following formula are the definition of conditional probability.

 \displaystyle P(A\mid B)=\frac{P(A \cap B)}{P(B)}, \text{ if } P(B) \neq 0

 \displaystyle P(B\mid A) = \frac{P(A \cap B)}{P(A)}, \text{ if } P(A) \neq 0

Do a simple formula transformation as follows:

 \displaystyle P(A \cap B) = P(A\mid B)\, P(B) = P(B\mid A)\, P(A)

Then we get the formula of the Bayes' theorem.

 \displaystyle P(A\mid B) = \frac{P(B\mid A)\,P(A)}{P(B)}, \text{ if } P(B) \neq 0

Bayesian Inference

Bayesian Inference is a method for estimating a true probability distribution from samples of the distribution.

When I read the article in the wikipedia, I didn’t get the point.

But I found a better article that gives me more intuitive understanding.

My understanding of Bayesian Inference consists of the following steps:

  1. Come up with a hypothesis of a target distribution, which is called a prior probability distribution. Use uniform distribution as a prior probability distribution if you have no hypotheses.
  2. Update the hypothesis based on samples (which are called evidence) from the target distribution. The updated hypothesis is called a posterior probability.

When updating the hypothesis, we use Bayes' theorem.

\displaystyle P(H\mid E) = \frac{P(E\mid H) \cdot P(H)}{P(E)} = \frac{P(E\mid H)}{P(E)} \cdot P(H)

 \displaystyle P(H) : the prior probability

 \displaystyle P(H\mid E) : the posterior probability

We can get the posterior probability  \displaystyle P(H\mid E) by multiplying the prior probability  \displaystyle P(H) by the value  \displaystyle \frac{P(E\mid H)}{P(E)}

The difficult part of Bayesian Inference is calculating  \displaystyle P(E).

It can be calculated as follows:

 \displaystyle P(E) = \sum_{H}P(E|H) P(H) when the target distribution is discrete.

 \displaystyle P(E) =  \int_{H} p(\mathbf{X} \mid H) p(H) when the target distribution is continuous.

MCMC (Markov Chain Monte Carlo) is sometimes used to calculate it.

Implementing Box-Muller Transform

After writing [the previous post, I think I understand Box-Muller transform.

I’m implementing it in this post.

Original form of Box-Muller transform is as follows:

 U_1, U_2 : random variables from uniform distribution (0,1)
 y_1 \, = \, \sqrt{-2 ln U_1} cos(2\pi U_2)
 y_2 \, = \, \sqrt{-2 ln U_1} sin(2\pi U_2)

However it includes square root, logarithm, trigonometric functions(sin/cos), which are costly and complex, might includes errors after calculation. J. Bell proposed a more robust and fast way of sampling, which is called polar form.

Unlike original form, polar form is a rejection sampling.

  1. Draw samples ( x_1, x_2) from uniform distribution(-1, 1)
  2. Calculate  s \, = x_1 * x_1 \, + x_2 * x_2
  3. If  s \lt 1 , go to 4, else go to 1
  4. Get samples  y_1, y_2 by calculating:  y_1 = x_1 \,  \sqrt{\frac{-2 \ln s}{s}}  y_2 = x_2 \,  \sqrt{\frac{-2 \ln s}{s}}

Python implementation

import random as rd
import math as ma
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

data = []
for i in range(0, 500):
    while True:
        x1 = rd.uniform(-1,1);
        x2 = rd.uniform(-1,1);
        s = x1 ** 2 + x2 **2
        if s < 1.0:
            break
    w = ma.sqrt((-2.0 *  ma.log(s)) / s)
    y1 = x1 * w
    y2 = x2 * w
    data.append(y1)
    data.append(y2)

sns.distplot(data)
plt.show()

The result:

f:id:nakaly:20170213050259p:plain:w500