Say I sample with replacement from a set of N unique elements s.t. elements are selected with uniform probability. If I sample with replacement M times from this set, what is the exact probability P(x) that I have observed at least x unique elements?
I believe different variants of this question have been asked on this site, however, I haven't seen a form that asks for an explicit probability P(x)?
For example, Ross Rogers asks a variant of this question here: probability distribution of coverage of a set after `X` independently, randomly selected members of the set, and Henry calculates the mean number of unique elements, x, and variance for the coverage of a set of N elements after sampling with replacement M times (we switch M and x here to fit with our variable specification).
Reproducing Henry's derivation here:
Mean[x] = N∗(1−(1−1N)M)
Var[x] = N(1−1N)M+N2(1−1N)(1−2N)M−N2(1−1N)2M
(I'll note that I don't quite understand the derivation for Var[x]...)
How can we translate this variance into our P(x)?
No comments:
Post a Comment