2017-07-09

Start learning Rust

Rust

I don’t know the reason, but I want to do something different from what I’m usually doing. And I want to have a familiar programming language which does not have garbage collection (GC). There are two candidates for me, swift and Rust.

I think swift is more popular and will be getting more popular. But I started to learn rust because its memory management mechanism is interesting for me. The official web site says that the following.

guaranteed memory safety
threads without data races

Even though we use a language which has GC such as Java, we need to care about memory management especially in operation phase. I think it’s more difficult to write an application in Rust than other modern programming languages which have GC. But once it’s done, it’s easier to operate the application wrrtten in Rust.

I think I would feel that I can use a sharper knife when I get used to Rust.

2017-03-03

Cast a List type object without iteration

There are three choices as follows:

        List<String> list = Arrays.asList("1","2");

        List<Object> objectList = (List) list;

        List<Object> objectList2 = (List<Object>)(List<?>) list;

        List<Object> objectList3 = new ArrayList<Object>(list);

But the choice 1 and 2 seem to be dangerous because the exception will be thrown when the item of the list is fetched if the list is not a List<Object>.

2017-03-02

Job Scheduler

Since scheduled tasks of my project has some dependencies, I’m thinking whether I should use a job scheduler. There are lots of job schedulers and lots of comparison of job schedulers.

My requirements are 1. Manages DAGs which define job dependency 2. Has scheduler 3. Web UI with authentication 4. Has low maintenance cost

I’m trying airflow now.

2017-02-23

HttpClient in Java

Both Apache HttpClient and OkHttpClient are thread-safe.

Apache HttpClient

HttpClient - HttpClient Performance Optimization Guide

Chapter 2. Connection management

OkHttpClient

Concurrency · square/okhttp Wiki · GitHub

OkHttp3 does not use a global singleton thread pool. Each OkHttpClient object can have its own private connection pool.

okhttp/CHANGELOG.md at master · square/okhttp · GitHub

2017-02-22

Markov Chain Monte Carlo

statistics

One of the goal of the current my study about statistics is understanding Markov Chain Monte Carlo(MCMC).

I would like to write about one of the popular algorithms of MCMC, Metropolis–Hastings algorithm. It’s a kind of rejection sampling algorithm.

Let $P(x)$ be a probability distribution of a target distribution( $P$ ) that you would like to draw samples from.

Let $g(x)$ be a probability distribution of a proposal distribution( $G$ that you already know how to draw samples from.

Draw a sample $x_{n+1}$ from $P$ (The previous sample is $x_n$ )
Accept the sample with a probability $\min\left(1,\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)}\right)$ . Otherwise, reject it.

It means that you always accept the sample when $\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)} \geq 1$ . And you accept the sample with the probability $\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)}$ when it’s not.
Repeat 1 and 2 util the sample’s probability distribution is close enough to the target.

Let’s think about why this algorithm works.

Reversible Markov chain that I explained in the previous post plays an important role of this algorithm.

Let’s assume that samples from the target probability distribution are generated based on a reversible Markov chain which has a stationary distribution $\pi(x) = P(x)$ .

Since the Markov chain is reversible, each transition( $x_n$ -> $x_{n+1}$ ) of the Markov chain fulfils the following condition:

$P(x_n | x_{n+1})P(x_{n+1}) = P(x_{n+1} | x_n)P(x_n)$

When $\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)} \geq 1$ is true, we always accept a sample.

$\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)} \geq 1 \Leftrightarrow P(x_{n+1})g(x_n | x_{n+1}) \geq P(x_n)g(x_{n+1} | x_n)$

If $P(x) = g(x)$ , then $P(x_{n+1})g(x_n | x_{n+1}) =P(x_n)g(x_{n+1} | x_n)$ , but it’s not. It means the proposal distribution $g(x_{n+1} | x_n)$ causes the chain to move from $x_n$ to $x_{n+1}$ too rarely and from $x_{n+1}$ to $x_n$ too often. Even though this condition, we get $x_{n+1}$ from G. Then we accept the sample $x_{n+1}$ .

On the other hand, when $\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)}$ < 1, we accept the sample with the probability $\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)}$ .

$\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)} \Leftrightarrow P(x_{n+1})g(x_n | x_{n+1})$ < $P(x_n)g(x_{n+1} | x_n)$

It means the proposal distribution $g(x_{n+1} | x_n)$ causes the chain to move from $x_n$ to $x_{n+1}$ too often and from $x_{n+1}$ to $x_n$ too rarely.

We need to adjust the transition by using acceptance distribution $A(x_{n+1} | x_n)$ such that:

$P(x_{n+1})g(x_n | x_{n+1}) =A(x_{n+1} | x_n) P(x_n)g(x_{n+1} | x_n)$

$A(x_{n+1} | x_n) = \frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)}$

This is my intuitive understanding of Metropolis–Hastings algorithm.

More formal way of explanation from wikipedia is as follows:

$P(x_n | x_{n+1})P(x_{n+1}) = P(x_{n+1} | x_n)P(x_n)$ can be re-written as

$\frac{P(x_{n+1} | x_n)}{P(x_n | x_{n+1})} = \frac{P(x_{n+1})}{P(x_n)}$

We draw a sample from the proposal distribution G and accept it with a certain probability $A(x_{n+1} | x_n)$ to simulate the target distribution P.

$P(x_{n+1} | x_n) = A(x_{n+1} | x_n) g(x_{n+1} | x_n)$

Let’s insert this to the previous equation.

$\frac{A(x_{n+1} | x_n) g(x_{n+1} | x_n)}{A(x_ | x_{n+1}) g(x_n | x_{n+1})} = \frac{P(x_{n+1})}{P(x_n)}$

$\Leftrightarrow \frac{A(x_{n+1} | x_n) }{A(x_ | x_{n+1}) } = \frac{P(x_{n+1})g(x_n | x_{n+1})}{P(x_n)g(x_{n+1} | x_n)}$

Then we need to choose a acceptance distribution $A(x_{n+1} | x_n)$ which fulfils this condition. The choice is used in the Metropolis–Hastings algorithm is as follows:

$A(x_{n+1} | x_n) = \min\left(1,\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)}\right)$

This fulfils the previous condition because

When $\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)} \geq 1$ , then $A(x_{n+1} | x_n) = 1$ .

When $\frac{P(x_{n+1})}{P(x_n)}\frac{g(x_n | x_{n+1})}{g(x_{n+1} | x_n)}$ < 1 $\Leftrightarrow \frac{P(x_n)}{P(x_{n+1})}\frac{g(x_{n+1} | x_n)}{g(x_n | x_{n+1})}$ > 1, then $A(x_ | x_{n+1}) = 1$ .

2017-02-21

Multiple-Column Indexes on MySQL

MySQL can use multiple-column indexes for queries that use all the columns, the first two columns, the first three columns and so on.

MySQL :: MySQL 5.7 Reference Manual :: 9.3.5 Multiple-Column Indexes

The following index is equivalent to

INDEX(columnA, columnB, columnC)

the following inedex

INDEX(columnA, columnB, columnC)
INDEX(columnA, columnB)
INDEX(columnA)

See the rows of the following execution plans:

mysql> CREATE TABLE test (
    ->     id         INT NOT NULL,
    ->     last_name  CHAR(30) NOT NULL,
    ->     first_name CHAR(30) NOT NULL,
    ->     PRIMARY KEY (id),
    ->     INDEX name (last_name,first_name)
    -> );
Query OK, 0 rows affected (0.04 sec)

mysql> INSERT test (id, last_name, first_name) VALUES (1, 'last', 'first'), (2, 'last2','first2');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select * from test where last_name = 'last' and first_name = 'first';
+----+-----------+------------+
| id | last_name | first_name |
+----+-----------+------------+
|  1 | last      | first      |
+----+-----------+------------+
1 row in set (0.00 sec)

mysql> explain select * from test where last_name = 'last' and first_name = 'first';
+----+-------------+-------+------------+------+---------------+------+---------+-------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref         | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+-------------+------+----------+-------------+
|  1 | SIMPLE      | test  | NULL       | ref  | name          | name | 180     | const,const |    1 |   100.00 | Using index |
+----+-------------+-------+------------+------+---------------+------+---------+-------------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select * from test where last_name = 'last' ;
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | test  | NULL       | ref  | name          | name | 90      | const |    1 |   100.00 | Using index |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select * from test where first_name = 'first';
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | test  | NULL       | index | NULL          | name | 180     | NULL |    2 |    50.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

mysql>