My question is related to this question.
I am wondering if we have a large crowd of $2000$ people: what is the expected minimum number days of the year we have to pick in order for at least half of the birthdays of the crowd to be in one of those days?
To be clear: with $2000$ people the average number of birthdays for any day is between $5$ and $6$, but there surely will be days where $10$ or even $12$ people have their birthday. So, knowing the birthdays of all people, I want to pick as many of those days, so that with as few days as possible I do cover half of the birthdays. That should be a good bit below $183$ I would think.
In fact, my intuition is that for a crowd of $2000$, this would be $100$ or so, but I could be way off, so I would like to know.
I also have the feeling that with a crowd of this size, we can make a fairly precise prediction as to how many days are needed. That is, if the expected number of days we need to cover at least $1000$ of the $2000$ birthdays is $100$, I would guess that the actual number of days with an actual crowd will be fairly close that that expected number. That is, I would like to have a result that says something like: There is a $95$% chance that the actual number of days lies within the interval $[X-Y,X+Y]$.
I am just guessing: $X=100$ and $Y=5$
How far am I off?
I welcome any kind of mathematical analysis, but I also welcome computer simulated approximations for this. And of course, while I am particularly interested in a crowd size of $2000$, feel free to provide a general answer in terms of $n$.
Answer
From computer simulation it would seem that 124-125 days is the expected number of days required.
My Python 2.7 code is here on pastebin. If you want to use it you can change the number of people with birthdays from 2000 to something else. Also you can change the sample size but 1000 samples seems to give a narrow confidence interval suggesting that the simulation is accurate.
I have repeated this test for a range of number of birthdays up to 10000. I graphed how many days are required to cover at least half of all birthdays (Note the logarithmic scale).
For a small number of birthdays the average number of days required is pretty much half of the number of birthdays.
For "medium" number of birthdays it appears that there is relationship like $\text{[Average Days Required]} = a + b\log(\text{[Number Birthdays]})$
However, this relationship can't hold for larger numbers of birthdays because the average days required can't go above 183.
I don't show the variances in the graph but one interesting observation is that the variance peaks at around 183 birthdays
No comments:
Post a Comment