probability - A multinomial problem (balls, bins, etc.)

Sunday, 28 February 2016

probability - A multinomial problem (balls, bins, etc.)

Consider the well known multinomial setting: there are L balls, thrown at random at n bins so that the probability that a ball falls in bin i is $p_i$ , independent of the other balls (the $p_i$ ’s are all positive and sum to 1).

Let $X_i$ be the number of balls in the ith bin. It is straightforward to show that for any k distinct indices $i_1, i_2,\ldots, i_k$ , the events $\lbrace X_{i_1} > 0\rbrace, \lbrace X_{i_2} > 0\rbrace,\ldots, \lbrace X_{i_k} > 0\rbrace$ are negatively dependent in the sense that

$P(X_{i_1} > 0, X_{i_2} > 0,\ldots, X_{i_k} > 0) < P(X_{i_1} > 0)P(X_{i_2} > 0)\ldots P(X_{i_k} > 0)$

The proof is by induction over k: for k = 2, the probability on the LHS is $1 – [(1 - p_{i_1})^L + (1 - p_{i_2})^L - (1 - p_{i_1} - p_{i_2})^L]$ , and on the RHS it’s $[1 - (1 – p_{i_1})^L] [1 - (1 – p_{i_2})^L]$ . Elementary manipulations show that the inequality holds, and the induction step is easily proven using the multiplication rule $P(A \cap B) = P(A)P(B | A)$ .

This negative dependence is also intuitively clear: if we know that there is at least one ball in bin i, there is at least one ball that will surely not fall into bin j, so the probability of bin j being non-empty decreases.

So that was easy. However, for proving a certain bound in my research, I need to do something similar, but for random k indices: suppose that after the balls were thrown into the bins, somebody comes and chooses randomly k distinct bins (so that each set of k bins has the same probability to be chosen, namely, 1/(n choose k)). Let $i_1, i_2, \ldots, i_k$ be the chosen indices, and again, I need to show that the inequality (below the second paragraph above) holds.

I thought to repeat the outline of the previous proof, but had a problem with proving the k = 2 case (the induction step is no problem). To compute each of the sides of the inequality I conditioned on the choice of the two indices (law of total probability), but even though many terms got cancelled, I couldn’t prove the inequality.

I am quite confident that the inequality holds, both by intuition (basically the same argument as in the non-random indices case) and because of numerical calculations.

Any ideas what to do here? I’d be grateful for any help.

Blog

Sunday, 28 February 2016