BOOTSTRAPPING

How does bootstrapping work?

In my CS109 class, the idea of bootstrapping took awhile to settle in, so here's an explorable explanation that tries to illustrate some of the intuition behind the beautiful and wildly popular statistical technique known as bootstrapping!

By Swee Kiat (SK), Lim
MSCS 1st year, Stanford

Consider, two rivers filled with trout of different sizes.

Small fingerlings and big giants swimming in rivers of life, rivers so abundant that all you see were frothing waters and shiny scales that shimmered just below the surface.

Two fishermen each went to a river and fished for a day. Being scientific people of an inquisitive nature, they each weighed their catches and sketched the histogram below.

Bonus Mode!

Bonus Mode!

Mean: {{ formatDp(meanA) }}
Mean: {{ formatDp(meanB) }}

Difference: {{ formatDp(diffMean) }}

Ah! There's a difference of {{ formatDp(diffMean) }} between the averages of the two catches.

But could this difference be by chance? Could both rivers actually be the same?

Edit this!
Edit this!

Is there really a difference?

We have a difference of {{ formatDp(diffMean) }} between the averages of the two catches. It could be that the rivers are the same (a.k.a the null hypothesis) and we just so happened to get this difference of {{ formatDp(diffMean) }}.

What bootstrapping tries to find out:

What is the probability that we just so happened to get this difference of {{ formatDp(diffMean) }}, assuming the rivers are the same?

Or in more technical terms:

What is the likelihood of the observed data, given that the null hypothesis is true (a.k.a. the p-value)?

If this probability is high, then there is a good chance that the rivers are the same. If this probability is low, then the two rivers are probably significantly different.

So, let us assume for a minute that both rivers are actually the same and part of a giant long river. Then we can combine both catches, treating them as a single sample from this large river.

Think of this combined distribution as an approximation of the large river's distribution.

We want to estimate the probability that a difference of {{ formatDp(diffMean) }} happens when we randomly sample two catches from this combined distribution.

Let us run a thought experiment. Suppose both fishermen fished from this sample distribution, what would their two catches look like? (Assuming they both fish the same amount as the original distributions.)

Mean: {{ formatDp(meanSampleA) }}
Mean: {{ formatDp(meanSampleB) }}

Difference: {{ formatDp(diffMeanSample) }}

Rinse and Repeat

We can run this many times and record the difference in sample means for each experiment. Then, we can estimate the probability that the difference of {{ formatDp(diffMean) }} (or more) arose out of pure chance!

If the chance of getting the difference from this combined sampling is high, then the difference is probably not significant. Typically, we define high as more than 5%.

Probability of getting a difference of {{ formatDp(diffMean) }} or more if we assume that both rivers have the same distribution:

{{ pValue }}%

{{ pValueComment }}

Summary

And that, in short, is bootstrapping! We want to test if a difference between two sample distributions could happen purely by chance. So...

Step 1
Assume that the samples both came from the same distribution and combine both samples to get an estimate of this combined distribution
Step 2
Draw randomly from the combined distribution to get two new samples, then calculate the difference between sample means
Step 3
Repeat Step 2 lots of times and calculate the probability that the original difference (or more) could have happened by chance i.e. the p-value

As thanks for checking this out, here's a bonus version of this explorable that allows you to change the input distributions! Just click the button below and head back up to the first chart.

Some things to try!

What if there is no difference between the means of the input distributions? What if each input distribution has only 1 element? What if the input distributions have different sizes?