Thursday 9 April 2015

Different solution of probability problem from textbook




It is the problem 1.2.3 of Karlin's book An introduction to stochastic modeling:




A population having $N$ distinct elements is sampled with replacement. Because of repetitions a sample of size r can have fewer than r different elements. Let $S_r$ be the sample size necessary to have r different elements. Show that



$$\Bbb E[S_r]=N\left(\frac1N+\frac{1}{N-1}+...+\frac{1}{N-r+1}\right)$$




But I have a different solution. I did




$$\Pr[S_r=r]=\frac NN\frac{N-1}{N}\dots\frac{N-r+1}{N}=\frac{(N)_r}{N^r}$$



$$\Pr[S_r=r+1]=\frac{(N)_r}{N^r}\frac rN$$



And in general



$$\Pr[S_r=r+k]=\frac{(N)_r}{N^r}\left(\frac rN\right)^k$$



Now




$$\Bbb E[S_r]=\sum_{k\ge0}(r+k)\Pr[S_r=r+k]$$



Doing some algebra I get the solution



$$\Bbb E[S_r]=\sum_{k\ge0}(r+k)\Pr[S_r=r+k]=\frac{(N)_r}{N^r}\frac{(N-r+1)Nr}{(N-r)^2}$$



This expression dont seem equivalent to the textbook solution. Can someone show me my mistake or the way to get the solution of the book? Thank you in advance.







EDITION: I see a weird mistake. I assumed that for $S_r>r$ the others values may repeat... but this is not true, they can be different too. Anyway this dont seem going in the good direction cause it complicate a lot the expression. Anyway I will see what I get fixing that.


Answer



Re-thinking the problem and seeing the "population" and "sample" as a dice of $N$ sides and $S_r$ throws respectively I get the correct answer.



Then the reformulated problem is: what is the expected number of throws to get r different results for a dice of $N$ sides?



This is some kind of Markov chain that I solved some time ago. We thrown a die a undetermined number of times (the quantity doesnt matter) and we have seen $d$ different faces of the die, so for the next throw the probability to see a new face (the probability to change from state $d$ to state $d+1$) is $p=(N-d)/N$, and the probability that this doesnt happen is $q=1-p$.



Hence the expected value of the number of throws to see a new face of the die (i.e. to change from state $d$ to state $d+1$) is




$$\Bbb E[d\to d+1]=\sum_{k\ge 1} kq^{k-1}p=p\sum_{k\ge 1}kq^{k-1}=\\=p\frac{\rm d}{{\rm d}k}\sum_{k\ge 0}q^k=p\frac{\rm d}{{\rm d}k}\frac{1}{1-q}=\frac{p}{(1-q)^2}=\frac1p=\frac{N}{N-d}$$



Then the expected number of throws, i.e. $\Bbb E[S_r]$, will be the sum of the expected number of throws for each consecutive change of state



$$\Bbb E[S_r]=\sum_{d=0}^{r-1}\frac{N}{N-d}=N\left(\frac1N+...+\frac{1}{N-r+1}\right)=N(H_N-H_{N-r})$$



being $H_k$ the $k$-th harmonic number.


No comments:

Post a Comment

real analysis - How to find $lim_{hrightarrow 0}frac{sin(ha)}{h}$

How to find $\lim_{h\rightarrow 0}\frac{\sin(ha)}{h}$ without lhopital rule? I know when I use lhopital I easy get $$ \lim_{h\rightarrow 0}...