It is the problem 1.2.3 of Karlin's book An introduction to stochastic modeling:
A population having N distinct elements is sampled with replacement. Because of repetitions a sample of size r can have fewer than r different elements. Let Sr be the sample size necessary to have r different elements. Show that
E[Sr]=N(1N+1N−1+...+1N−r+1)
But I have a different solution. I did
Pr
\Pr[S_r=r+1]=\frac{(N)_r}{N^r}\frac rN
And in general
\Pr[S_r=r+k]=\frac{(N)_r}{N^r}\left(\frac rN\right)^k
Now
\Bbb E[S_r]=\sum_{k\ge0}(r+k)\Pr[S_r=r+k]
Doing some algebra I get the solution
\Bbb E[S_r]=\sum_{k\ge0}(r+k)\Pr[S_r=r+k]=\frac{(N)_r}{N^r}\frac{(N-r+1)Nr}{(N-r)^2}
This expression dont seem equivalent to the textbook solution. Can someone show me my mistake or the way to get the solution of the book? Thank you in advance.
EDITION: I see a weird mistake. I assumed that for S_r>r the others values may repeat... but this is not true, they can be different too. Anyway this dont seem going in the good direction cause it complicate a lot the expression. Anyway I will see what I get fixing that.
Answer
Re-thinking the problem and seeing the "population" and "sample" as a dice of N sides and S_r throws respectively I get the correct answer.
Then the reformulated problem is: what is the expected number of throws to get r different results for a dice of N sides?
This is some kind of Markov chain that I solved some time ago. We thrown a die a undetermined number of times (the quantity doesnt matter) and we have seen d different faces of the die, so for the next throw the probability to see a new face (the probability to change from state d to state d+1) is p=(N-d)/N, and the probability that this doesnt happen is q=1-p.
Hence the expected value of the number of throws to see a new face of the die (i.e. to change from state d to state d+1) is
\Bbb E[d\to d+1]=\sum_{k\ge 1} kq^{k-1}p=p\sum_{k\ge 1}kq^{k-1}=\\=p\frac{\rm d}{{\rm d}k}\sum_{k\ge 0}q^k=p\frac{\rm d}{{\rm d}k}\frac{1}{1-q}=\frac{p}{(1-q)^2}=\frac1p=\frac{N}{N-d}
Then the expected number of throws, i.e. \Bbb E[S_r], will be the sum of the expected number of throws for each consecutive change of state
\Bbb E[S_r]=\sum_{d=0}^{r-1}\frac{N}{N-d}=N\left(\frac1N+...+\frac{1}{N-r+1}\right)=N(H_N-H_{N-r})
being H_k the k-th harmonic number.
No comments:
Post a Comment