Thursday, 22 August 2013

The probability that x birthdays lie within n days of each other



This is a question that has bugged me for quite some time: what is the chance that x people happen to have their birthdays within n days of each other?



A bit more specific, since this is how a colleague one phrased it: what is the probability that 5 people have their birthdays within 40 days?



Both the birthdays and the "distance" are supposed to be random: there is no fixed time span (e.g., April 1 to May 10) that the birthdays are to lie within. The birthdays should be such, that two birthdays are always within 40 days of each other.




The thing that bugs me, is that it seems to be some kind of recursive calculation, and that I can't find a way to put it into a straightforward mathematical formulation.



To explain that, consider 2 people: the first person is free to have his or her birthday $b_1$ any day of the year, and the second person has 81 days to pick a birthday date $b_2$ from (the 40 day timespan is inclusive, so up to 40 days before $b_1$, plus up to days after $b_1$, plus one on $b_1$ itself. This may be more logically phrased as 41 days for some; I don't know what is best, so please be clear about it in your answer).



Now, for the third person, the number of birthdays he or she can have, is limited by the second person's birthday: if $b_2 = b_1$, then $b_3$ can be among 81 days, but if $b_2 = b_1 + 1$ or $b_2 = b_1 - 1$, there are only 80 days for each option, and 79 for $\|b_1 - b_2\| = 2$, etc.



For the fourth person, the limitation is given by person 2 and 3, complicating things; the fifth person makes things even more complicated.



I've also tried to go the "exclusion" way (what is the chance that 5 people do not share their birthdays within 40 days of each other), but I didn't get anywhere that way.




But perhaps I'm going entirely the wrong way about this.



By now, I've computed it in various way, and I'm quite confident of the answer, but I'm still looking for the mathematical formulation of the general (x birthdays, n days) problem.



The answer I've got, btw, is $7.581428 \cdot 10^{-4}$, or $\frac{13456201}{365^4}$.



NB: this obviously assumes no leap years.
NB2: Extension of the Birthday Problem appears related, though I can't readily see if I can use any of that formulation here.


Answer



Let $B_1,\dots,B_x$ be the birthdays of persons $1,\dots,x$ and consider for all $m \in [0,364]$ the event "all birthdays happen between days $m$ and $m+n$, and $m$ is one of the birthdays", that is

$$
E_m = \{\forall i,\; B_i\in [m, m + n]\}\cap \{\exists i,\; B_i = m\}.
$$
(interpret $m$ as some kind of "minimal" birthday)



If $n < 365/2$, the events $E_m$ are mutually exclusive: at most one of them can happen. On the other hand, the probability that all $x$ birthdays are contained in a block of $n+1$ consecutive days is the probability that at least one of these events $E_m$ happens. This probability is therefore
$$
\sum_{m=0}^{364} \Pr(E_m) = \sum_{m=0}^{364} \left[\left(\frac{n+1}{365}\right)^x- \left(\frac{n}{365}\right)^x\right] = \frac{(n+1)^x - n^x}{365^{x-1}}.
$$




Indeed, $\Pr(E_m)$ is obtained by a simple inclusion-exclusion counting: it is the probability that the birthdays are contained in the block $[m,m+n]$ of size $n+1$ but not all of them in the subblock $[m+1,m+n]$ of size $n$.






What can be said when $n \geq \frac{365}{2}$? The events $E_m$ are not mutually exclusive anymore so we should use the Inclusion–exclusion formula:
$$
\sum_m \Pr(E_m) - \sum_{m_1 < m_2} \Pr(E_{m_1}\cap E_{m_2})+ \dots + (-1)^n \sum_{m_1 < \dots < m_n} \Pr(E_{m_1}\cap \dots\cap E_{m_n}).
$$


No comments:

Post a Comment

real analysis - How to find $lim_{hrightarrow 0}frac{sin(ha)}{h}$

How to find $\lim_{h\rightarrow 0}\frac{\sin(ha)}{h}$ without lhopital rule? I know when I use lhopital I easy get $$ \lim_{h\rightarrow 0}...