probability - Reciprocal of a normal variable with non-zero mean and small variance

Friday, 10 April 2015

probability - Reciprocal of a normal variable with non-zero mean and small variance

$X$ is a normal random variable:

$X \sim \mathcal{N}(\mu,\sigma^2)$

Then $Y = 1/X$ has the following probability density function (see wiki):

$f(y) = \frac{1}{y^2\sqrt{2\sigma^2\pi}}\, \exp\left(-\frac{(\frac{1}{y} - \mu)^2}{2 \sigma^2}\right)$

This distribution of $Y$ does not have moments since (stackExchange):

$\int_{-\infty}^{+\infty}|x|f(x)\,dx = \infty$

An intuitive explanation of this is that the distributions tails are too heavy and consequently the law of large number fails. The more samples that are drawn and averaged the less stable this average is. The non-zero probability density that $X = 0$ means that $Y$ will not have finite moments since there is a non-zero probability that $Y = \infty$ .

However in simulation for a non-zero mean and small variance $X \sim \mathcal{N}(1,0.1)$ the distribution of $Y$ is seemingly well-behaved with $E[Y] = 0.1010$ and $\text{Var}(Y) = 0.0109$ . However theoretically these moments are not finite. As the variance of $X$ increases the mean and variance of $Y$ become both larger and increasingly unstable. As the variance of $X$ continues to increase the moments of $Y$ become increasingly unstable as the number of samples of $Y$ increases. This is contrary to the typical expectation that the variance of the mean estimate should decrease as the number of samples increases.

Could you offer any insights into this strange behaviour? Why did the moments of $Y$ appear stable for $X \sim \mathcal{N}(1,0.1)$ ?

Answer

This behaviour is an example of conditional convergence. A perhaps simpler example to start with is $\int_{-1}^1 \frac{\mathrm{d}x}{x} \text{.}$ This integrand is going to infinity as it approaches $0$ form the right an negative infinity as it approaches zero from the left. If we successively approximate this integral as $\int_{-1}^{-M} \frac{\mathrm{d}x}{x} + \int_{M}^{1} \frac{\mathrm{d}x}{x} \text{,}$ we have arranged to carefully cancel the positive and negative contributions. No matter what positive value we pick for $M$ , these two integrals are finite and their values cancel. But this is not how we evaluate improper integrals. (An integral that only gave useful answers if you sneak up on infinities in just the right way is too treacherous to trust. Taking this very special way of sneaking up on the infinities gives the Cauchy principal value.) Instead, we require that the two integrals can squeeze in towards zero independently and that regardless of how they do so we get the same answer. Contrast the above with $\int_{-1}^{-M} \frac{\mathrm{d}x}{x} + \int_{M/2}^{1} \frac{\mathrm{d}x}{x} \text{,}$ which is always positive (because we include more of the positive part than of the negative part) and diverges to infinity. We can arrange to diverge to negative infinity by moving the " ${}/2$ " to the other integral.

The same thing is happening in your integral for the mean, $\int_{-\infty}^{\infty} y f(y) \,\mathrm{d}y$ because for very large positive $y$ , $1/y - 1 \approx -1$ , so this integrand is approximately $1/y$ and similarly for very large negative $y$ where the integrand is approximately $1/y$ . If you restrict your integral to be symmetric around $1$ , $\int_{1-M}^{1+M} \dots$ , these two tails will cancel (not exactly, but the cancellation improves the farther away from $1$ you go). However, if you use $\int_{1-M}^{1+2M} \dots$ , they won't cancel as well and the uncancelled part will grow asymptotically like $\log M$ .

This slow growth of the discrepancy has a consequence for numerical experiments. You will have to take many samples at very large values of $y$ and very carefully sum them so that the contributions from large $y$ s aren't swamped by rounding the values from small $y$ s. Your plot shows the integrand for the mean, $y f(y)$ is about $3$ for $y=1$ . It's about $10^{-23}$ for $y \approx 100$ . This means to retain necessary precision just over the range $y \in [-100,100]$ , you will have to numerically integrate with at least $23$ digits of precision (plus several guard digits). Even more numerically difficult, because of the logarithmic rate of divergence of the right-hand and left-hand integrals, you will have to integrate over stupendous ranges of $y$ for any not carefully balanced cancellation to accumulate measurably. For instance, $\begin{align} \int_{-10}^{-9} y f(y) \,\mathrm{d}y + \int_{9}^{10} y f(y) \,\mathrm{d}y &\approx 1.84\dots \times 10^{-18} \\ \int_{-100}^{-9} y f(y) \,\mathrm{d}y + \int_{9}^{100} y f(y) \,\mathrm{d}y &\approx 3.13\dots \times 10^{-18} \text{.} \end{align}$

(I've ignored the contribution from $-9$ to $9$ in both integrals so that the result isn't swamped by the $18$ orders of magnitude larger contribution from that region.) These will always be positive because the contribution from the positive $y$ s is closer to the peak at $y=1$ than is the contribution from the negative $y$ s.

So, in your simulations, were you sampling over truly stupendous ranges of $y$ s so that you might observe the divergence? And, while doing so, were you keeping truly stupendous precision so that the very slow divergence wouldn't be swamped by rounding errors in the contribution from $y$ s near $1$ ? Doing both of these things is computationally very challenging. Once you have that ironed out, you'll need to take the limits to infinity in different ways to avoid calculating some sort of principal value and when you do this, you'll discover that the integrals do not converge and your estimates are not stable. For instance $\begin{align} \int_{1- 2 \cdot 10}^{1-9} y f(y) \,\mathrm{d}y + \int_{1+9}^{1+1 \cdot 10} y f(y) \,\mathrm{d}y &\approx 6.644\dots \times 10^{-19} \\ \int_{1- 2 \cdot 100}^{1-9} y f(y) \,\mathrm{d}y + \int_{1+9}^{1+1 \cdot 100} y f(y) \,\mathrm{d}y &\approx 1.292\dots \times 10^{-18} \end{align}$ and each time we increment the power of $10$ used for the outermost endpoints, the discrepancy will approximately double. Once we reach $10^{53}$ -ish, the discrepancy will finally be as large as the contribution from $y \in [1-9,1+9]$ .

To sum up, numerically detecting the divergence of this integral will require some very careful programming and numerical analysis. (Alternatively, you can do what I did for the estiamtes above and use a big symbolic and numerics application like Maple or Mathematica.)

Blog

Friday, 10 April 2015