probability - What is the rationale behind the evaluation of the Expectation operator?

Monday, 29 July 2019

probability - What is the rationale behind the evaluation of the Expectation operator?

We know that the expectation operator is defined for a random variable $x$ , as such:

$\mathbb{E} \left\{x\right\} = \int_{-\infty}^{\infty} x \: p_x(x) \; \mathrm{d}x$

Where $p_x{x}$ is the PDF of the random variable $x$ .

If there is an arbitrary(?) function $f$ acting on the random variable $x$ , then the expected value of this function can also be written as:

$\mathbb{E}\left\{f(x) \right\} = \int_{-\infty}^{\infty} f(x) \: p_x(x) \: \mathrm{d}x$

My questions are: On many algorithms that I study, (statistical in nature), one often finds themselves taking the expected value of some entity, that is a function of the random variable $x$ . In the reverse case, one can also find themselves poking around and manipulating the probability distribution function of $x$ , and then we can 'take it back' into an expression using the expectation operator.

Upon evaluating the expected value of $x$ however, ( $\mathbb{E[x]})$ , I often come across this estimation formula:

$\mathbb{E}\left\{x\right\} \approx \frac{1}{N}\sum_{n=1}^{N} x[n]$

and similarly,

$\mathbb{E}\left\{f(x)\right\} \approx \frac{1}{N}\sum_{n=1}^{N} f(x[n])$
Where each

$x[n]$ is an individual realization of the random variable

$x$ .

My question is, why is this formula true, and how did it come about? Every book I read seems to just include it as if it fell from the sky one day and no explanation is given as to why it is true.

Could someone please give an intuitive and mathematical explanation for why - and more importantly, how this happens to be true? What is the history/rationale behind it?

Many thanks.

Answer

As you are aware, the expected value of a random variable $X$ is computed as:

$\mathbb{E}\{X\} = \int_{-\infty}^{\infty}z f_{X}(z)dz$ , where $f_X$ is the probability density function (PDF) of $X$ .

Note that the above integral may not converge (e.g. if $X$ is a Cauchy random variable), and the PDF may not exist as well. But let's assume we are not getting into such problems, which is indeed the case in most practical settings.

Often, we will not know the PDF, or even its functional form. In such a case, one resorts to estimating the PDF from realizations of the random variable. That is, assume we have $N$ realizations of $X$ - $\{x_i\}_{i=1}^N$ . Let us consider the following estimate:

$\hat{f}_X(z) = \frac{1}{N}\sum_{i=1}^N \delta(z-x_i)$ , where $\delta(.)$ is the Dirac delta function.

So essentially, we are treating the random variable as a uniform discrete random variable, and putting a mass of $1/N$ over each of the observed values. The estimate of the expectation of $X$ becomes:

$\hat{\mathbb{E}}_N\{X\} = \int_{-\infty}^{\infty}z \frac{1}{N}\sum_{i=1}^N \delta(z-x_i)dz$

$= \frac{1}{N}\sum_{i=1}^N \int_{-\infty}^{\infty}z \delta(z-x_i)dz$

By the sifting property of the delta function, you will note that the integral becomes $x_i$ . Hence, we arrive at the following expression for the sample expectation:

$\hat{\mathbb{E}}_N\{X\} = \frac{1}{N}\sum_{i=1}^N x_i$

Note that $x_i$ 's are themselves random, since another set of draws from $X$ is likely to give me a different set of realizations. Thus the above estimate of the expected value is itself a random variable - dependent on the draws $\{x_i\}_{1}^N$ and the number of draws $N$ . Since all $x_i$ s are distributed identically (with PDF $f_X$ ), and are independent draws, the Laws of Large Numbers tell us that $\mathbb{E}_N\{X\} \rightarrow \mathbb{E}\{X\}$ with probability $1$ (almost surely), and in probability, as $N \rightarrow \infty$ .

Blog

Monday, 29 July 2019