I've seen a lot of questions on here that answer the birthday paradox, e.g. the probability $n$ people in a room share the same birthday, where a year is 365 days. So I won't rehash the proof that the formula is
$$p(n) = 364/365 * 363/365 * ... * (365-(n-1)) / 365$$
But I was wondering about the exact probability $p(n)$ that two people either share the same birthday or have a birthday one day apart.
I was able to figure things roughly. Each day you choose most likely wipes out two other days, so my initial guess q(n) was
$$q(n) =~ 362/365 * 359/365 * ... * (365-3(n-1)) / 365$$
The problem with this is that person $A$ may have a birthday on January 3, and if person B has one on January 1, then the 2nd term of $q(n) is 360/365 and not 359/365.
Now if we just want to see the number of people to make p(n) be about .5, this won't matter much. I can do a bit of hand waving to show that if $Z$ is the number such that $p(Z) < .5 < p(Z-1)$, then $Z'$ so that $q(Z') < .5 < q(Z'-1)$ is roughly $Z'/sqrt(3)$.
So if we write the formula for $q'(n)$ as
$$q'(n) = \prod_{i=0}^n r(n)$$
where $r(n)$ is the probability that person $n$ is the first one with an adjacent birthday to the previous $n-1$, there are a few observations.
$r(n)$ must be close to $\frac{368-3n}{365}$ and it clearly must be more than $\frac{369-2n}{365}$ because, except for December 31, each date takes out the next day...or there's been a date ahead of it that takes out the next and previous days.
For the purposes of finding $Z'$, $r(n)$ seems much closer to $\frac{368-3n}{365}$ as the possibility of day X being 2 apart from any of days $1...X-1$ overlap at step $Z'$ is quite low for the purposes of finding the halfway point, less than $\frac{3Z'-3}{365}$, in fact.
But in the general case, the estimation doesn't work so well, and I am wondering about an exact value. Because I got lost in the calculations. Any help? Thanks!
Answer
Comment: Here are simulations of the original (matching) birthday problem and this 'adjacent-birthday' variation. I will choose to simulate using $n = 30$.
Method of Simulation. Successful simulation requires an 'automated' way to count relevant events.
If $X$ is the number of birthday matches, then it can found in R as follows.
The function unique
removes "redundant" birthdays. (If three people share
a birthday, we count that as two "matches".) Here is a feasibility
demonstration.
b = sample(1:365, 30, rep=T); x = 30 - length(unique(b)); sort(b); x
[1] 4 14 16 18 34 47 53 70 74 107 123 124 125 159 159 164 198
[18] 218 220 221 250 254 258 270 272 273 299 326 336 344
[1] 1 # One match: day 159
b = sample(1:365, 30, rep=T); x = 30 - length(unique(b)); sort(b); x
[1] 21 33 54 60 62 64 77 79 105 116 131 144 145 164 182 188 191
[18] 199 211 212 214 231 234 236 271 275 278 307 311 360
[1] 0 # No matches: all 30 unique
b = sample(1:365, 30, rep=T); x = 30 - length(unique(b)); sort(b); x
[1] 4 4 22 48 51 67 68 75 104 115 122 149 158 179 190 197 216
[18] 281 282 282 302 331 335 338 344 351 355 359 363 365
[1] 2 # Two matches: Days 4 and 282
If $Y$ is the number of matches-within-a-day, then it can be found by sorting
and taking the difference of sorted birthdays and looking for differences of 1 or less. Feasibility demonstration:
b = sample(1:365, 30, rep=T); y = sum((diff(sort(b))) <= 1); sort(b); diff(sort(b)); y
[1] 17 40 45 50 73 79 80 100 125 165 168 175 197 208 208 214 224
[18] 231 234 239 243 247 263 269 290 308 311 317 345 363
[1] 23 5 5 23 6 1 20 25 40 3 7 22 11 0 6 10 7 3 5 4 4 16 6
[24] 21 18 3 6 28 18
[1] 2 # One exact match (day 208), and one adjacency (days 79 and 80)
Simulation for rooms of 30 people. Then here is a simulation of distributions of $X$ and $Y$ in 100,000 rooms, each with $n = 30$ people. With 100,000 iterations, results should be accurate to
a couple of decimal places--perhaps enough to check the validity of an analytic solution.
m = 10^5; x = y = numeric(m)
days = 1:365; n = 30
for (i in 1:m) {
b = sort(sample(days, n, rep=T))
x[i] = n - length(unique(b))
y[i] = sum(diff(b) <= 1) }
mean(x >= 1); mean(y >= 1)
## 0.70647 # aprx P(X >= 1)
## 0.97811 # aprx P(Y >= 1)
table(x)/m
x
0 1 2 3 4 5 6 7
0.29353 0.38061 0.22437 0.07931 0.01818 0.00360 0.00037 0.00003
table(y)/m
y
0 1 2 3 4 5 6 7 8
0.02189 0.09649 0.19679 0.24642 0.21068 0.13070 0.06335 0.02413 0.00743
9 10 11
0.00170 0.00034 0.00008
Results for smaller rooms. For rooms of $n = 20$ probabilities of one or more desired events are:
mean(x >= 1); mean(y >= 1)
## 0.41067
## 0.80315
And for $n = 15:$
mean(x >= 1); mean(y >= 1)
## 0.25264
## 0.58777
mean(x==1); mean(y==1) # exactly one event
## 0.22333
## 0.38811
Notes: (1) A Poisson approximation for the traditional birthday problem (essentially ignoring triads) works
reasonably well in certain cases; it would be interesting to know
whether there are cases where $Y$ is approximately Poisson.
(2) By simulation of the traditional problem, it is known that $P(X = 0)$ changes only by about one
digit in the second decimal place if the distribution of birthdays throughout
the year varies by no more than is typical in the US (very roughly, up to 8% higher in summer,
and 8% lower in winter). I have not tried such simulations for $Y$.
No comments:
Post a Comment