I am interested in yet another variation of the coupon collector problem. Specifically, we have an urn with ni "distinct" coupons of type i, where i=1,…,K (for example, a type can be thought of as a color but within that color the coupons are numbered). The goal is to sample sufficiently with replacement to get a collection having at least mi distinct coupons from type i for i=1,…,K (any mi out of the ni different coupons within that type would work). Note that this is different from the variation in:
N. Shank and H. Yang, "Coupon Collector Problem for Non-Uniform Coupons and Random Quotas," Electronic J. of Combinatorics 20(2) 2013
since the coupons here are numbered and the analysis needs to ensure the mi coupons from type i are distinct. So the question is:
For a given number of samples n, what is the probability that we achieve the requirement above (i.e., obtain mi distinct coupons from the types i=1,…,K). Upper and lower bounds would also be useful. Also, what is the expected value of the number of samples E[N] to achieve this requirement?
Answer
I use the same notation as the cited article (so that the results can be directly compared), so
consider an urn which for i∈{1,…n} contains xi≥1 distinguishable coupons of type i, altogether
Xn:=x1+…+xn coupons.
Coupons are drawn with replacement from this urn until (for each i) mi (where 1≤mi≤xi) mutually different coupons
of type i have been drawn.
Let m:=(m1,…,mn) and T(m) the random time at which this happens, and let p_{x_i}(m_i,t):=\sum_{k=0}^{m_i-1}{x_i \choose k}\,t^k.
Let Y_1(k),\ldots,Y_n(k) be the random variables Y_i(k):= number of different coupons of type i'' that have been drawn
at "time" k.
I use generating functions and start from the following basic
Proposition
The generating function of (the joint distribution of)
Y_1(k),\ldots,Y_n(k) is given by:
\begin{equation*} \mathbb{E}\, t_1^{Y_1(k)}\ldots t_n^{Y_n(k)}=\frac{k!} {X_n^k}[t^k] (1+(e^t-1)\,t_1)^{x_1}\cdot\ldots\cdot(1+(e^t-1)\,t_n)^{x_n} \end{equation*}
Proof Let j_1+\ldots +j_m\leq k.
Clearly
\mathbb{P}(Y_1(k)=j_1,\ldots,Y_n(k)=j_n)=\frac{1}{X_n^k}\cdot {x_1 \choose j_1}\cdots {x_m \choose j_m}\cdot Sur(k,j_1+\ldots+j_m)
where Sur(k,r) denotes the number of surjective mappings from \{1,\ldots,k\} onto \{1,\ldots,r\}. It is known that
Sur(k,r)=k!\,[t^k]\,(e^t-1)^r (since a such a surjective mapping corresponds uniquely to an ordered partition of
\{1,\ldots,k\} into r non-empty subsets, and e^t-1 is the exponential generating function for non-empty sets). The assertion
about the g.f. follows. End of proof.
(I) The distribution of T(m)
Since \{\,T(m)\leq k\}=\{\,Y_1(k)\geq m_1,\ldots,Y_n(k)\geq m_n\,\} the above gives
\begin{equation*} \mathbb{P}(T(m)\leq k) =\frac{k!} {X_n^k}[t^k] (e^{tx_1} -p_{x_1}(m_1,e^t-1))\cdot\ldots\cdot(e^{tx_n} -p_{x_n}(m_n,(e^t-1)\,) \end{equation*}
From g.f. to probability:
we have to compute \begin{align*}\mathbb{P}(\,Y_1(k)\geq m_1,\ldots,Y_n(k)\geq m_n\,) &=\sum_{j_1\geq m_1,\ldots, j_n\geq m_n} \mathbb{P}(Y_1(k)=j_1,\ldots,Y_n(k)=j_n)\\ &=\sum_{j_1\geq m_1,\ldots, j_n\geq m_n} [t_1^{j_1}\ldots t_n^{j_n}]\mathbb{E}\, t_1^{Y_1(k)}\ldots t_n^{Y_n(k)} \end{align*}
Since the function after [t^k] is a product of factors each containing only one of the t_i variables we can treat these factors individually.
E.g. to account for \mathbb{P}(Y_1(k)\geq m_1,...) we have to sum up all the coefficients [t_1^j] with j\geq m_1 of the first factor.
We may equivalently sum up all coefficients (i.e. put t_1=1) and subtract the sum of the first m_1 coefficients. Doing that
"inside" (and leaving the t^k extraction "outside" (coefficients may be extracted in arbitrary order)) we arrive at the factor e^{tx_1}-p_{x_1}(m_1,e^t-1), etc..
(II) The expectation of T(m)
Finally, using \mathbb{E} T(m)=\sum_{k\geq 0} \mathbb{P}(T(m)>k) and writing {k! \over X_n^k} =X_n\,\int_0^\infty s^k e^{-X_ns}\,ds leads to
\mathbb{E}(T(m))=X_n\int_0^\infty \big(1-\prod_{i=1}^n \left(1-p_{x_i}(m_i,e^s-1)\,e^{-x_i s}\right)\big)\,ds
No comments:
Post a Comment