Sampling

When I learned about machine learning (especially deep learning), I sometimes see the word “sampling” such as gibbs sampling, MCMC.

However I don’t think I fully understand sampling. I start learning it.

When I think sampling, the first thing that I recall is getting a subset from an set of a large number of items.

One simple strategy to get the subset is picking items from the original set util the subset’s statistical characteristics is almost the same as the original set.

In this case, I assume the statistical characteristics its probability distribution.

Let’s imagine the result of coin flipping.

A original set:

{HEADS, TAILS, TAILS, TAILS, HEADS, HEADS, HEADS, HEADS, HEADS, HEADS, TAILS, TAILS, HEADS, HEADS, HEADS, TAILS …}

P(HEADS) = 3/ 4

P(TAILS) = 1 / 4

The probability of HEADS is not the same as TAILS due to the shape of the coin.

Assuming that I pick items from the top, after the getting six items, the subset is below:

{HEADS, TAILS, TAILS, TAILS, HEADS, HEADS}

P(HEADS) = 3 / 6

P(TAILS) = 3 / 6

The probability distribution of this subset is completely different from the original one.

It means the size of subset is not enough.

After getting the first 10 items,

{HEADS, TAILS, TAILS, TAILS, HEADS, HEADS, HEADS, HEADS, HEADS, HEADS}

P(HEADS) = 7 / 10

P(TAILS) = 3 / 10

The probability distribution of this subset is closer.

When you get more and more items, the probability distribution will convergent to the original one.

How can I get samples, I don’t have the original set but its probability distribution (P(HEADS) = 3/ 4, P(TAILS) = 1/ 4).

The most naive method is below:

  1. Generate HEADS/TAILS randomly

  2. If the generate value is HEADS, it is considered as a sample with the probability P(HEADS) = 3/ 4. if the value is TAILS, it is considered as a sample with the probability P(TAILS) = 1/ 4.

  3. Repeat 1, 2 util the sample’s probability distribution is close enough to the original.

I will write about more effective sampling methods next time.