Suppose I have 10^9 distinct elements, and an equal probability of picking each one in a given trial. How many trials must be conducted for the probability of having picked every element at least once over the course of the trials to be $\geq$90%?
I've already done some preliminary work on this and found that the exact answer for the probability of picking all of $k$ elements in $n$ trials is $P(n,k) = k!S(n,k)/n^k$, where $S$ is the Stirling number of the second kind. However, this is impossible to calculate exactly for such a large $k$, and, more direly, none of the asymptotic approximations I've found are valid when $n$ is within a few orders of magnitude of $k$, which is exactly the regime I'm looking at. Hence, the best answer will be given to anyone who can give a numerical approximation for $n$ when $k$ = 10^9 and $P(n,k)$ = 90% that's correct within 10% or so, and explain their reasoning. If you have to write code or Mathematica equations to solve the problem, please post what you wrote in your answer.
Answer
For large $k$ and $n$, getting at least one of one element is almost independent of getting getting at least one of another element.
If we assume as an approximation that they are actually independent, then the probability of getting at least one of every element is $$\left(1-\left(1-\frac1k\right)^n\right)^k \approx \left(1-\exp\left(-\frac{n}{k}\right)\right)^k\approx \exp\left(-k\exp\left(-\frac{n}{k}\right)\right).$$
If this probability is $p$ then this suggests $$n \approx k(\log_e(k) -\log_e(-\log_e(p))$$ similar to a result from Erdős and Rényi.
In your example with $10^9$ elements to choose and a target probability of $0.9$, it suggests about $22973633164$ attempts, almost $23$ times as many.
No comments:
Post a Comment