Different solution of probability problem from textbook

Thursday, 9 April 2015

Different solution of probability problem from textbook

It is the problem 1.2.3 of Karlin's book An introduction to stochastic modeling:

A population having $N$ distinct elements is sampled with replacement. Because of repetitions a sample of size r can have fewer than r different elements. Let $S_r$ be the sample size necessary to have r different elements. Show that

$\Bbb E[S_r]=N\left(\frac1N+\frac{1}{N-1}+...+\frac{1}{N-r+1}\right)$

But I have a different solution. I did

$\Pr[S_r=r]=\frac NN\frac{N-1}{N}\dots\frac{N-r+1}{N}=\frac{(N)_r}{N^r}$

$\Pr[S_r=r+1]=\frac{(N)_r}{N^r}\frac rN$

And in general

$\Pr[S_r=r+k]=\frac{(N)_r}{N^r}\left(\frac rN\right)^k$

Now

$\Bbb E[S_r]=\sum_{k\ge0}(r+k)\Pr[S_r=r+k]$

Doing some algebra I get the solution

$\Bbb E[S_r]=\sum_{k\ge0}(r+k)\Pr[S_r=r+k]=\frac{(N)_r}{N^r}\frac{(N-r+1)Nr}{(N-r)^2}$

This expression dont seem equivalent to the textbook solution. Can someone show me my mistake or the way to get the solution of the book? Thank you in advance.

EDITION: I see a weird mistake. I assumed that for $S_r>r$ the others values may repeat... but this is not true, they can be different too. Anyway this dont seem going in the good direction cause it complicate a lot the expression. Anyway I will see what I get fixing that.

Answer

Re-thinking the problem and seeing the "population" and "sample" as a dice of $N$ sides and $S_r$ throws respectively I get the correct answer.

Then the reformulated problem is: what is the expected number of throws to get r different results for a dice of $N$ sides?

This is some kind of Markov chain that I solved some time ago. We thrown a die a undetermined number of times (the quantity doesnt matter) and we have seen $d$ different faces of the die, so for the next throw the probability to see a new face (the probability to change from state $d$ to state $d+1$ ) is $p=(N-d)/N$ , and the probability that this doesnt happen is $q=1-p$ .

Hence the expected value of the number of throws to see a new face of the die (i.e. to change from state $d$ to state $d+1$ ) is

$\Bbb E[d\to d+1]=\sum_{k\ge 1} kq^{k-1}p=p\sum_{k\ge 1}kq^{k-1}=\\=p\frac{\rm d}{{\rm d}k}\sum_{k\ge 0}q^k=p\frac{\rm d}{{\rm d}k}\frac{1}{1-q}=\frac{p}{(1-q)^2}=\frac1p=\frac{N}{N-d}$

Then the expected number of throws, i.e. $\Bbb E[S_r]$ , will be the sum of the expected number of throws for each consecutive change of state

$\Bbb E[S_r]=\sum_{d=0}^{r-1}\frac{N}{N-d}=N\left(\frac1N+...+\frac{1}{N-r+1}\right)=N(H_N-H_{N-r})$

being $H_k$ the $k$ -th harmonic number.

Blog

Thursday, 9 April 2015