What is the probability distribution for the number of N-sided die rolls needed to get M unique results?

Saturday, 20 December 2014

What is the probability distribution for the number of N-sided die rolls needed to get M unique results?

Suppose you have a fair $N$ -sided die. You decide to roll it until $M$ unique values have been produced (i.e. you re-roll all previously rolled values). How many times will you roll the die? (Given $2 <= M <= N$ .)

I know that for the special case of $M=2$ it's simply a matter of how many times you have to re-roll your attempt for the second value, so the distribution is: $P^N_2(X=u) = \left(\frac{1}{N}\right)^{u-2}\left(1-\frac{1}{N}\right) = \frac{N-1}{N^{u-1}}$

And that for any $M$ the probability of the lowest possible outcome $X=M$ (i.e. no re-rolls): $P^N_M(X=M) = \prod_{i=0}^{M-1}\left(1-\frac{i}{N}\right) = \frac{1}{N^M}\prod_{i=0}^{M-1}\left(N-i\right) = \frac{N!}{N^M(N-M)!}$

The final clue I've got is that the probability distributions for subsequent values of $M$ can be defined using the probability distribution of the previous, like so:

$P^N_{M}(X=u) = \sum_{i=1}^{u-M+1}\left(P^N_{M-1}(X=u-i)\left(\frac{M-1}{N}\right)^{i-1}\left(1-\frac{M-1}{N}\right)\right)$

With that I can determine the probability distribution for any value of $M$ I want, for instance $M=3$ :

$P^N_3(X=u) = \sum_{i=1}^{u-3+1}\left(P^N_2(X=u-i)\left(\frac{3-1}{N}\right)^{i-1}\left(1-\frac{3-1}{N}\right)\right)$

$= \sum_{i=1}^{u-2}\left(\left(\frac{N-1}{N^{u-i-1}}\right)\left(\frac{2}{N}\right)^{i-1}\left(1-\frac{2}{N}\right)\right)$

$= \sum_{i=1}^{u-2}\left(\left(\frac{N-1}{N^{u-1}}\right)N^i\left(\frac{N}{2}\right)\left(\frac{2}{N}\right)^i\left(\frac{N-2}{N}\right)\right)$

$= \frac{(N-1)(N-2)}{2 \cdot N^{u-1}}\sum_{i=1}^{u-2}\left(2^i\right)$

$= \frac{(N-1)(N-2)}{N^{u-1}}\sum_{i=0}^{u-1}\left(2^i\right)$

$= \left(\frac{(N-1)(N-2)}{N^{u-1}}\right)\left(\frac{1-2^u}{1-2}\right)$
$= \frac{(N-1)(N-2)(2^u-1)}{N^{u-1}}$

However, I have no idea how to turn this into a generic formula that will allow me to calculate the probability for any $N$ , $M$ , and $u$ without going through the process of figuring out the PMF of every value of $M$ leading up to the one I want.

Blog

Saturday, 20 December 2014