Tuesday, 26 April 2016

probability - Coupon Collector Problem with multiple copies and X amount of coupons already collected

I have a variant of the coupon collector problem where there are $n$ different coupons being collected, which are being drawn with equal probability and with replacement, but 2 copies of each coupon is needed. Now say I have a total of $X$ number of relevant coupons collected of the $2n$ total number of coupons for a full set, and want to know how many extra unneeded coupons you should have. What would be an equation for that?
I have found on Wikipedia an equation for the number of draws needed to get a full set of one of each coupon, and number of draws needed to get a full set of multiple copies of each coupon.



$E(T) = n\log{n} + γn + {\frac{1}{2}} + O({\frac{1}{n}})$, where $\gamma \approx 0.5772156649$



$E(Tm) = n\log{n} + (m-1)n\log\log{n} + O(n)$, as $n→∞$




Where $m$ is the number of copies of each coupon to be collected, so 2 in this case, and $Tm$ is the first time $m$ copies of each coupon are collected.



I also found this from a previous question. Coupon Collector's Problem with X amount of coupons already collected.




The probability $p_i$ of selecting a new coupon thus equals $\frac{n-i+1}{n}$, and the expected number of draws needed to draw a new coupon equals $\frac{1}{p_i} = \frac{n}{n-i+1}$. As such, the expected value for the time needed to draw all $n$ coupons can be calculated as:



$$E[T] = \frac{n}{n} + \frac{n}{n-1} + \ldots + \frac{n}{1} = n \sum_{k=1}^{n}{\frac{1}{k}}$$



In this case, however, we have already drawn $X$ unique coupons. As such, the estimated number of draws needed to find all $n$ coupons equals:




$$E[T] = E[t_{X+1}] + E[t_{X+2}] + \ldots + E[t_n] = n \sum_{k=1}^{n-X} \frac{1}{k}$$




So to find the total number of drawn coupons upon collecting the $X^{th}$ unique coupon, the equation would be
$$n \sum_{k=1}^{n}{\frac{1}{k}}-n \sum_{k=1}^{n-X} \frac{1}{k} $$



The total number of just unneeded duplicate coupons would be
$$n \sum_{k=1}^{n}{\frac{1}{k}}- n \sum_{k=1}^{n-X} \frac{1}{k} - X $$




I'm not sure how to combine these two equations, $E(Tm) = n\log{n} + (m-1)n\log\log{n} + O(n)$, as $n→∞$ and $n \sum_{k=1}^{n}{\frac{1}{k}}-n \sum_{k=1}^{n-X} \frac{1}{k}$, to get the total number of coupons drawn upon having X number of relevant coupons collected towards a full collection of 2 of each coupon. Then with that equation just subtract X (the number of relevant coupons collected) from the total number to get the total unneeded coupons. I will admit I’m not in a math profession or have done any higher level math lately, but if I understand correctly the first equation is more of an approximation of the value, while the second equation is more exact. So I'm not sure how to combine them or if they can be easily combined.



Also I somewhat understand $O(n)$ and its use, I’m not sure how to input it with the rest of the equation into wolframalpha or even excel, the end goal of where I want to us this equation. The maximum number of coupons would be about 100 if that helps. If it would be easier it, the total number of coupons that have only 1 copy and number of coupons that have 2 copies collected could be used as an input instead of the total number of relevant coupons collected.

No comments:

Post a Comment

real analysis - How to find $lim_{hrightarrow 0}frac{sin(ha)}{h}$

How to find $\lim_{h\rightarrow 0}\frac{\sin(ha)}{h}$ without lhopital rule? I know when I use lhopital I easy get $$ \lim_{h\rightarrow 0}...