My question is related to this question.
I am wondering if we have a large crowd of 2000 people: what is the expected minimum number days of the year we have to pick in order for at least half of the birthdays of the crowd to be in one of those days?
To be clear: with 2000 people the average number of birthdays for any day is between 5 and 6, but there surely will be days where 10 or even 12 people have their birthday. So, knowing the birthdays of all people, I want to pick as many of those days, so that with as few days as possible I do cover half of the birthdays. That should be a good bit below 183 I would think.
In fact, my intuition is that for a crowd of 2000, this would be 100 or so, but I could be way off, so I would like to know.
I also have the feeling that with a crowd of this size, we can make a fairly precise prediction as to how many days are needed. That is, if the expected number of days we need to cover at least 1000 of the 2000 birthdays is 100, I would guess that the actual number of days with an actual crowd will be fairly close that that expected number. That is, I would like to have a result that says something like: There is a 95% chance that the actual number of days lies within the interval [X−Y,X+Y].
I am just guessing: X=100 and Y=5
How far am I off?
I welcome any kind of mathematical analysis, but I also welcome computer simulated approximations for this. And of course, while I am particularly interested in a crowd size of 2000, feel free to provide a general answer in terms of n.
Answer
From computer simulation it would seem that 124-125 days is the expected number of days required.
My Python 2.7 code is here on pastebin. If you want to use it you can change the number of people with birthdays from 2000 to something else. Also you can change the sample size but 1000 samples seems to give a narrow confidence interval suggesting that the simulation is accurate.
I have repeated this test for a range of number of birthdays up to 10000. I graphed how many days are required to cover at least half of all birthdays (Note the logarithmic scale).
For a small number of birthdays the average number of days required is pretty much half of the number of birthdays.
For "medium" number of birthdays it appears that there is relationship like [Average Days Required]=a+blog([Number Birthdays])
However, this relationship can't hold for larger numbers of birthdays because the average days required can't go above 183.
I don't show the variances in the graph but one interesting observation is that the variance peaks at around 183 birthdays
No comments:
Post a Comment