Monday, 18 May 2015

The exact probability of observing $x$ unique elements after sampling with replacement from a set with $N$ elements

Say I sample with replacement from a set of $N$ unique elements s.t. elements are selected with uniform probability. If I sample with replacement $M$ times from this set, what is the exact probability $P(x)$ that I have observed at least $x$ unique elements?



I believe different variants of this question have been asked on this site, however, I haven't seen a form that asks for an explicit probability $P(x)$?




For example, Ross Rogers asks a variant of this question here: probability distribution of coverage of a set after `X` independently, randomly selected members of the set, and Henry calculates the mean number of unique elements, $x$, and variance for the coverage of a set of $N$ elements after sampling with replacement $M$ times (we switch $M$ and $x$ here to fit with our variable specification).



Reproducing Henry's derivation here:



Mean[x] = $N * (1 - (1 - \frac{1}{N})^M)$



Var[x] = $N\left(1-\dfrac{1}{N}\right)^M + N^2 \left(1-\dfrac{1}{N}\right)\left(1-\dfrac{2}{N}\right)^M - N^2\left(1-\dfrac{1}{N}\right)^{2M}$



(I'll note that I don't quite understand the derivation for Var[x]...)




How can we translate this variance into our $P(x)$?

No comments:

Post a Comment

real analysis - How to find $lim_{hrightarrow 0}frac{sin(ha)}{h}$

How to find $\lim_{h\rightarrow 0}\frac{\sin(ha)}{h}$ without lhopital rule? I know when I use lhopital I easy get $$ \lim_{h\rightarrow 0}...