Sunday, 26 May 2013

probability - Estimating the critical probabilities $mathrm{P_{c1}}$ and $mathrm{P_{c2}}$ mathematically for the infinite system case



Suppose I have an $\mathrm{N\times N}$ square matrix consisting of only $0$'s and $1$'s. (I'm considering that $0$ represents "white" and $1$ represents "black".) The probability of a certain element being $1$ is $\mathrm{p}$. At that certain probability $\mathrm{p}$, we count the number the number of white clusters and number of black clusters in the matrix. Any two elements who share a side (i.e. any one of them lies to the left, right, top or bottom of the other) or share a vertex are considered to belong to the same cluster. BUT, there's one special case i.e. if anywhere in the matrix there occurs a situation like this:



0 | 1



1 | 0



OR




1 | 0



0 | 1



That is two $1$'s share a vertex (diagonally connected) and two $0$'s also share a vertex as shown, the $50\%$ of the times, one should consider the $1$'s to belong to same cluster and other $50\%$ of the times one should consider the $0$'s to belong to the same cluster.



Suppose I am gradually increasing $\mathrm{p}$ from $0$ to $100$ in steps of $0.001$ and counting the number of black and white clusters for each $\mathrm{p}$. My aim is to find that critical probability $\mathrm{P_{c1}}$ at which number of white clusters increases from $1$ to a number greater than $1$ as $\mathrm{N}\to\infty$. Also I need to find $\mathrm{P_{c2}}$ at which number of black clusters decreases from a number greater than $1$ to $1$.





To be more clear,



Let $\mathcal{C}_0, \mathcal{C}_1$ denote the clusters of 0's and 1's
in the matrix (or graph), respectively.



We are defining $\mathrm{P_{c1}}$ as



$$ \mathcal{C}_{0} = \begin{cases} 1 & \qquad p < \mathrm{P_{c1}} \\
> 1 & \qquad p > \mathrm{P_{c1}} \end{cases} $$




and $\mathrm{P_{c2}}$ as



$$ \mathcal{C}_{1} = \begin{cases}
> 1 & \qquad p < \mathrm{P_{c2}} \\ 1 & \qquad p > \mathrm{P_{c2}} \end{cases} $$



for $N\to\infty$.




I wrote a program for the $1000\times 1000$ case, and averaged the critical probabilities over $10$ iterations. (The program basically had a loop which ran from $\mathrm{p}=0$ to $\mathrm{p}=100$ and I outputted those $\mathrm{p}$'s at which the number of white clusters increased from $1$ and number of black clusters decreased to $1$) The value I got for $\mathrm{P_{c1}}$ was $0.05$ and for $\mathrm{P_{c2}}$ it was $0.96$. However, I don't know if any purely mathematical technique exists to find the convergence value of these two critical probabilities for the infinite matrix case (i.e. $\mathrm{N}\to \infty$). Around $3$ decimal places of accuracy will be enough for my purpose. Any help will be appreciated.


Answer




At the moment it is not 100% clear how your critical probabilities are defined. In the below I consider the following definitions.



Let $\mathcal{C}_0, \mathcal{C}_1$ denote the clusters of 0's and 1's in your matrix (or graph), respectively. For a fixed $p \in [0,1]$ we can define



$$C_0(p) = \lim_{N \rightarrow \infty} \mathbf P_{p,N}[ |\mathcal C_0| > 1]$$
which is the limit of the probability that there is more than one clusters of $0$s (in your wording, the probability of more than one white cluster).



We define a critical probability $p_c$ such that
$$
C_0(p) =

\begin{cases}
0 & \qquad p < p_c \\
> 0 & \qquad p > p_c
\end{cases}
$$
That is: when $p < p_c$, in the limit $N \rightarrow \infty$ the probability of having more than $1$ cluster is $0$. Roughly speaking this would mean there is at most $1$ cluster of $0$s in the limit. Conversely when $p > p_c$ there is a positive chance (in the limit) of seeing more than 1 cluster.



For this definition of $p_c$, I claim that $p_c = 0$.



To see this, suppose that $p > 0$ then the probability that a given $3 \times 3$ sub-matrix is given by

$$ B =
\left( \begin{matrix}
1 & 1 & 1 \\
1 & 0 & 1 \\
1 & 1 & 1
\end{matrix}
\right)$$
The probability of this configuration occuring is $q = p^8 (1-p) > 0$. Then suppose that we have $N = 3n$, then the matrix can be seen as $n^2$ blocks like the above. Each of these $n^2$ blocks are independent, and have probability $q$ (which does not grow with $n$) of being of the form $B$. The number of the $n^2$ blocks which equals $B$ is given by a Binomial variable $\text{Bin}(n^2, q)$; in particular, the probability that more than two such blocks exist is
$$\mathbf{P}[ \text{Bin}(n^2, q) >= 2] =1 - (1-q)^{n^2} - n^2q(1-q)^{n^2 - 1},$$




The probability of there being at least 2 clusters of 1s is greater than the probability that at least two blocks like the above exist (since this is a very special case of having two clusters), so that is



\begin{align*}
\lim_{n\rightarrow \infty} \mathbf P_{p,3n}[ |\mathcal C_1| > 1] & \geq
\lim_{n \rightarrow \infty} \mathbf{P}[ \text{Bin}(n^2, q) >= 2] \\
& = \lim_{n \rightarrow \infty} 1 - (1-q)^{n^2} - n^2q(1-q)^{n^2 - 1} \\
& = 1.
\end{align*}



That is, for any $p > 0$ we have $C(p) = 1$. Clearly $C(0) = 0$, and so it follows that $p_c =0$.




The decision to use $N = 3n$ is really not needed, but makes the proof a bit simpler. Further, with a bit more probabilistic machinery you can argue via Kolmogorov's Zero-One law that in the limit the block $B$ appears infinitely often: which ensures that in fact for any $p > 0$ the expected number of clusters is infinite.






A simplified argument.



The above argument is useful as it allows us to show that there must in fact be infinitely many clusters in the limit. In this example I use a much simpler argument which you can check computationally.



If the matrix you have is denoted $A_{i,j}$ with $1 \leq i,j \leq N$ consider just the $2 \times 2$ sub-matrix in the top left corner. If this takes the specific form




$$
C =
\left(
\begin{array}{cc}
0 & 1 \\
1 & 1
\end{array}
\right)
$$




and moreover if there is at least one more $0$ elsewhere in the matrix, then this would imply that you have two clusters of $1$s.



For any $N \geq 3$, and fixed $p > 0$ the probability that the top corner is equal to $C$ is given by
$$(1-p) p^3$$
Of the remaining $N^2 - 4$ vertices, the number of $0$s to occur is distributed according to a $\text{Bin}(N^2 - 4, (1-p))$ variable, therefore the probability that there is at least one $0$ is
$$ 1 - \mathbf P[ \text{Bin}(N^2 - 4, (1-p) ) = 0] = 1 - p^{N^2 - 4}.$$
So that means the probability of seeing the corner equal to $C$ and also there being at least one more $0$ is given by
$$q = p^3 (1-p) ( 1 - p^{N^2 - 4}).$$
Note that this is a very special case of there being at least two clusters, but for any $p > 0$ the probability $q > 0$.




So we have that with probability $q$ a given sample will have this property. Then we can ask how many samples would we need to take before seeing this event. Whilst we cannot say for certain how many, on average one would expect to have to take $1/q$ samples (this is derived from a Geometric Distribution).



For $N$ large, and $p$ small we can approximate $q \sim p^{3}$, so that $1/q \approx p^{-3}$. So for example when $p = 0.01$, you would expect to need around $10^6$ samples to see this event.



Whilst this is an approximation, it is an approximation to a very specific example of having more than $1$ cluster; so in fact I would expect you to require fewer to observe a scenario with at least two clusters.



Of course when you take into account the fact that there are four corners (and not just the top left considered) then the probability rises again, and approximately (since I assume that each of the four possible corner events are independent, which they're not) the probability at least one of the four occuring is
$$q' = 1 - (1-q)^4 \sim 1 - (1-p^3)^4,$$
which in turn means that on average it would take $1/q'$ samples before observing such a corner event. And noting that

$$ \frac1{q'} \sim \frac{1}{1 - (1-p^3)^4} \sim \frac1{4p^3} + O(1)$$
we see that actually for $p = 0.01$ we need only on the order of $250,000$ samples to see such a corner event.


No comments:

Post a Comment

real analysis - How to find $lim_{hrightarrow 0}frac{sin(ha)}{h}$

How to find $\lim_{h\rightarrow 0}\frac{\sin(ha)}{h}$ without lhopital rule? I know when I use lhopital I easy get $$ \lim_{h\rightarrow 0}...