Thursday, 6 October 2016

Probability distribution for sampling an element $k$ times after $x$ independent random samplings with replacement



In an earlier question ( probability distribution of coverage of a set after `X` independently, randomly selected members of the set ), Ross Rogers asked for the probability distribution for the coverage of a set of $n$ elements after sampling with replacement $x$ times, with uniform probability for every element in the set. Henry provided a very nice solution.




My question is a slight extension of this earlier one: What is the probability distribution (mean and variance) for the number of elements that have been sampled at least $k$ times after sampling a set of $n$ elements with replacement, and with uniform probability, $x$ times?


Answer



What is not as apparent as it could be in the solution on the page you refer to is that the probability distributions in this kind of problems are often a mess while the expectations and variances can be much nicer. (By the way, you might be confusing probability distributions on the one hand, and expectations and variances on the other hand.)



To see this in the present case, note that the number of elements sampled at least $k$ times is
$$
N=\sum\limits_{i=1}^n\mathbf 1_{A_i},
$$
where $A_i$ is the event that element $i$ is sampled at least $k$ times.
Now, $p_x=\mathbb P(A_i)$ does not depend on $i$ and is the probability that one gets at least $k$ heads in a game of $x$ heads-and-tails such that probability of heads is $u=\frac1n$. Thus,

$$
p_x=\sum\limits_{s=k}^x{x\choose s}u^s(1-u)^{x-s}.
$$
This yields
$$
\mathbb E(N)=np_x.
$$
Likewise, $r_x=\mathbb P(A_i\cap A_j)$ does not depend on $i\ne j$ and is the probability that one gets at least $k$ times result A and $k$ times result B in $x$ games where each game yields the results A, B and C with respective probabilities $u$, $u$ and $1-2u$. Thus, $r_x$ can be written down using multinomial distributions instead of the binomial distributions involved in $p_x$.



This yields

$\mathbb E(N^2)=np_x+n(n-1)r_x$, hence $\mathrm{var}(N)=\mathbb E(N^2)-(np_x)^2$ is
$$
\mathrm{var}(N)=np_x+n(n-1)r_x-n^2p_x^2=np_x(1-p_x)+n(n-1)(r_x-p_x^2).
$$


No comments:

Post a Comment

real analysis - How to find $lim_{hrightarrow 0}frac{sin(ha)}{h}$

How to find $\lim_{h\rightarrow 0}\frac{\sin(ha)}{h}$ without lhopital rule? I know when I use lhopital I easy get $$ \lim_{h\rightarrow 0}...