Saturday, 20 December 2014

What is the probability distribution for the number of N-sided die rolls needed to get M unique results?

Suppose you have a fair N-sided die. You decide to roll it until M unique values have been produced (i.e. you re-roll all previously rolled values). How many times will you roll the die? (Given 2<=M<=N.)



I know that for the special case of M=2 it's simply a matter of how many times you have to re-roll your attempt for the second value, so the distribution is: PN2(X=u)=(1N)u2(11N)=N1Nu1




And that for any M the probability of the lowest possible outcome X=M (i.e. no re-rolls): PNM(X=M)=M1i=0(1iN)=1NMM1i=0(Ni)=N!NM(NM)!



The final clue I've got is that the probability distributions for subsequent values of M can be defined using the probability distribution of the previous, like so:



PNM(X=u)=uM+1i=1(PNM1(X=ui)(M1N)i1(1M1N))



With that I can determine the probability distribution for any value of M I want, for instance M=3:



PN3(X=u)=u3+1i=1(PN2(X=ui)(31N)i1(131N))




=u2i=1((N1Nui1)(2N)i1(12N))



=u2i=1((N1Nu1)Ni(N2)(2N)i(N2N))



=(N1)(N2)2Nu1u2i=1(2i)



=(N1)(N2)Nu1u1i=0(2i)



=((N1)(N2)Nu1)(12u12)
=(N1)(N2)(2u1)Nu1




However, I have no idea how to turn this into a generic formula that will allow me to calculate the probability for any N, M, and u without going through the process of figuring out the PMF of every value of M leading up to the one I want.

No comments:

Post a Comment

real analysis - How to find limhrightarrow0fracsin(ha)h

How to find lim without lhopital rule? I know when I use lhopital I easy get $$ \lim_{h\rightarrow 0}...