Thursday, 14 August 2014

probability - Variation of coupon collector problem with distinct coupons in each type



I am interested in yet another variation of the coupon collector problem. Specifically, we have an urn with ni "distinct" coupons of type i, where i=1,,K (for example, a type can be thought of as a color but within that color the coupons are numbered). The goal is to sample sufficiently with replacement to get a collection having at least mi distinct coupons from type i for i=1,,K (any mi out of the ni different coupons within that type would work). Note that this is different from the variation in:



N. Shank and H. Yang, "Coupon Collector Problem for Non-Uniform Coupons and Random Quotas," Electronic J. of Combinatorics 20(2) 2013




since the coupons here are numbered and the analysis needs to ensure the mi coupons from type i are distinct. So the question is:



For a given number of samples n, what is the probability that we achieve the requirement above (i.e., obtain mi distinct coupons from the types i=1,,K). Upper and lower bounds would also be useful. Also, what is the expected value of the number of samples E[N] to achieve this requirement?


Answer



I use the same notation as the cited article (so that the results can be directly compared), so
consider an urn which for i{1,n} contains xi1 distinguishable coupons of type i, altogether
Xn:=x1++xn coupons.



Coupons are drawn with replacement from this urn until (for each i) mi (where 1mixi) mutually different coupons

of type i have been drawn.



Let m:=(m1,,mn) and T(m) the random time at which this happens, and let p_{x_i}(m_i,t):=\sum_{k=0}^{m_i-1}{x_i \choose k}\,t^k.



Let Y_1(k),\ldots,Y_n(k) be the random variables Y_i(k):= number of different coupons of type i'' that have been drawn
at "time" k.



I use generating functions and start from the following basic



Proposition

The generating function of (the joint distribution of)
Y_1(k),\ldots,Y_n(k) is given by:

\begin{equation*} \mathbb{E}\, t_1^{Y_1(k)}\ldots t_n^{Y_n(k)}=\frac{k!} {X_n^k}[t^k] (1+(e^t-1)\,t_1)^{x_1}\cdot\ldots\cdot(1+(e^t-1)\,t_n)^{x_n} \end{equation*}



Proof Let j_1+\ldots +j_m\leq k.
Clearly
\mathbb{P}(Y_1(k)=j_1,\ldots,Y_n(k)=j_n)=\frac{1}{X_n^k}\cdot {x_1 \choose j_1}\cdots {x_m \choose j_m}\cdot Sur(k,j_1+\ldots+j_m)
where Sur(k,r) denotes the number of surjective mappings from \{1,\ldots,k\} onto \{1,\ldots,r\}. It is known that

Sur(k,r)=k!\,[t^k]\,(e^t-1)^r (since a such a surjective mapping corresponds uniquely to an ordered partition of
\{1,\ldots,k\} into r non-empty subsets, and e^t-1 is the exponential generating function for non-empty sets). The assertion
about the g.f. follows. End of proof.



(I) The distribution of T(m)



Since \{\,T(m)\leq k\}=\{\,Y_1(k)\geq m_1,\ldots,Y_n(k)\geq m_n\,\} the above gives
\begin{equation*} \mathbb{P}(T(m)\leq k) =\frac{k!} {X_n^k}[t^k] (e^{tx_1} -p_{x_1}(m_1,e^t-1))\cdot\ldots\cdot(e^{tx_n} -p_{x_n}(m_n,(e^t-1)\,) \end{equation*}




From g.f. to probability:
we have to compute \begin{align*}\mathbb{P}(\,Y_1(k)\geq m_1,\ldots,Y_n(k)\geq m_n\,) &=\sum_{j_1\geq m_1,\ldots, j_n\geq m_n} \mathbb{P}(Y_1(k)=j_1,\ldots,Y_n(k)=j_n)\\ &=\sum_{j_1\geq m_1,\ldots, j_n\geq m_n} [t_1^{j_1}\ldots t_n^{j_n}]\mathbb{E}\, t_1^{Y_1(k)}\ldots t_n^{Y_n(k)} \end{align*}
Since the function after [t^k] is a product of factors each containing only one of the t_i variables we can treat these factors individually.
E.g. to account for \mathbb{P}(Y_1(k)\geq m_1,...) we have to sum up all the coefficients [t_1^j] with j\geq m_1 of the first factor.
We may equivalently sum up all coefficients (i.e. put t_1=1) and subtract the sum of the first m_1 coefficients. Doing that
"inside" (and leaving the t^k extraction "outside" (coefficients may be extracted in arbitrary order)) we arrive at the factor e^{tx_1}-p_{x_1}(m_1,e^t-1), etc..




(II) The expectation of T(m)



Finally, using \mathbb{E} T(m)=\sum_{k\geq 0} \mathbb{P}(T(m)>k) and writing {k! \over X_n^k} =X_n\,\int_0^\infty s^k e^{-X_ns}\,ds leads to



\mathbb{E}(T(m))=X_n\int_0^\infty \big(1-\prod_{i=1}^n \left(1-p_{x_i}(m_i,e^s-1)\,e^{-x_i s}\right)\big)\,ds


No comments:

Post a Comment

real analysis - How to find lim_{hrightarrow 0}frac{sin(ha)}{h}

How to find \lim_{h\rightarrow 0}\frac{\sin(ha)}{h} without lhopital rule? I know when I use lhopital I easy get $$ \lim_{h\rightarrow 0}...