Sunday 12 January 2014

Coupon Collector Problem (Almost)



Here's a variation of the coupon collector problem from Ross that I'm stuck on:



There are $n$ types of coupons. Each newly obtained coupon is, independently, type $i$ with probability $p_i, i = 1,...,n$. Find the expected number and the variance of the number of distinct types obtained in a collection of k coupons.



I thought maybe I would start with an indicator:




Let $X$ be the total number of distinct coupons and $X_i$ the probability that $i^{th}$ is a new coupon. Then $\sum_{i=1}^{n} X_i = X$. Each indicator is 1 w.p. $x/k$. I don't think this is correct though.


Answer



The solution for the expectation has already been given in the comments. The calculation is slightly simplified by considering the number $Y=n-X$ of coupon types not collected, with $\mathbb E[X]=N-\mathbb E[Y]$ and $\operatorname{Var}(X)=\operatorname{Var}(Y)$. Let $Y_i$ denote the indicator variable of the event that coupon type $i$ is not obtained. Its expectation is the probability $(1-p_i)^k$ of not obtaining a coupon of type $i$, so the expected number of coupons not obtained is



$$\mathbb E[Y]=\mathbb E\left[\sum_iY_i\right]=\sum_i(1-p_i)^k\;.$$



The variance is calculated analgously by expressing it in terms of expectations:



\begin{align}

\operatorname{Var(Y)}&=\mathbb E\left[\left(\sum_iY_i\right)^2\right]-\mathbb E\left[\sum_iY_i\right]^2\\
&=\sum_{i,j}\mathbb E[Y_iY_j]-\mathbb E[Y_i]\mathbb E[Y_j]\\
&=\sum_i(1-p_i)^k\left(1-(1-p_i)^k\right)+\sum_{i\ne j}\left((1-p_i-p_j)^k-(1-p_i)^k(1-p_j)^k\right)\;.
\end{align}


No comments:

Post a Comment

real analysis - How to find $lim_{hrightarrow 0}frac{sin(ha)}{h}$

How to find $\lim_{h\rightarrow 0}\frac{\sin(ha)}{h}$ without lhopital rule? I know when I use lhopital I easy get $$ \lim_{h\rightarrow 0}...