How many Monte Carlo are enough?

Here an example of the The birthday problem solution via Monte Carlo. Suppose you’re in a classroom with 22 people. If we assume this is a randomly selected group, what is the chance that at least two people have the same birthday?

This is a problem of discrete probability.

All right, first, note that birthdays can be represented as numbers between 1 and 365.
For some calculations, 10,000 experiments might not be computationally feasible, and it might be more than we need. In practice, we won’t know what the answer is, so we don’t know if our Monte Carlo estimate is accurate. One practical approach we will describe here is to check for the stability of the estimate.

b <- 10^seq(1, 5, len = 100)
compute_prob <- function(b, n=22){
  some_day <- replicate(b, {
  bdays <- sample(1:365, n, replace=TRUE)
  any(duplicated(bdays))
  })
mean(some_day)
}

So we’re going to run a simulation where we compute or estimate the probability of two people having a certain birthday using different sizes of the Monte Carlo simulations. We compute the simulation, and now we look at the values that we get for each simulation.

prob <- sapply(b, compute_prob) # compute compute_prob b number of type
plot(log10(b), prob, type = "l")

What we can see from the graph is that it’s wiggling up and down. That’s because the estimate is not stable yet. It’s not such a great estimate. But as b gets bigger and bigger, eventually it starts to stabilize.