Sunday, 26 August 2018

probability - Making 400k random choices from 400k samples seems to always end up with 63% distinct choices, why?



I have a very simple simulation program, the sequence is:




  • Create an array of 400k elements


  • Use a PRNG to pick an index, and mark the element (repeat 400k times)

  • Count number of marked elements.



An element may be picked more than once, but counted as only one "marked element".



The PRNG is properly seeded. No matter how many times I run the simulation, I always end up getting around 63% (252k) marked elements.



What is the math behind this? Or was there a fault in my PRNG?


Answer




No, your program is correct. The probability that a particular element is not marked
at all, is about $\frac{1}{e}$. This comes from the poisson-distribution which is
a very well approximation for large samples (400k is very large). So $1-\frac{1}{e}$
is the fraction of marked elements.


No comments:

Post a Comment

real analysis - How to find $lim_{hrightarrow 0}frac{sin(ha)}{h}$

How to find $\lim_{h\rightarrow 0}\frac{\sin(ha)}{h}$ without lhopital rule? I know when I use lhopital I easy get $$ \lim_{h\rightarrow 0}...