Moran's I

From HandWiki
Short description: Measure of spatial autocorrelation


The white and black squares are perfectly dispersed so Moran's I would be −1 using a Rook neighbors definition. If the white squares were stacked to one half of the board and the black squares to the other, Moran's I approaches +1 as N increases. A random arrangement of square colors would give Moran's I a value that is close to 0.

In statistics, Moran's I is a measure of spatial autocorrelation developed by Patrick Alfred Pierce Moran.[1][2] Spatial autocorrelation is characterized by a correlation in a signal among nearby locations in space. Spatial autocorrelation is more complex than one-dimensional autocorrelation because spatial correlation is multi-dimensional (i.e. 2 or 3 dimensions of space) and multi-directional.

Global Moran's I

Global Moran's I is a measure of the overall clustering of the spatial data. It is defined as

[math]\displaystyle{ I = \frac N W \frac {\sum_{i=1}^N \sum_{j=1}^N w_{ij}(x_i-\bar x) (x_j-\bar x)} {\sum_{i=1}^N (x_i-\bar x)^2} }[/math]

where

  • [math]\displaystyle{ N }[/math] is the number of spatial units indexed by [math]\displaystyle{ i }[/math] and [math]\displaystyle{ j }[/math];
  • [math]\displaystyle{ x }[/math] is the variable of interest;
  • [math]\displaystyle{ \bar x }[/math] is the mean of [math]\displaystyle{ x }[/math];
  • [math]\displaystyle{ w_{ij} }[/math] are the elements of a matrix of spatial weights with zeroes on the diagonal (i.e., [math]\displaystyle{ w_{ii} = 0 }[/math]);
  • and [math]\displaystyle{ W }[/math] is the sum of all [math]\displaystyle{ w_{ij} }[/math] (i.e. [math]\displaystyle{ W = \sum_{i=1}^N \sum_{j=1}^N {w_{ij}} }[/math]).
A hand map with different spatial patterns. Note: p is the probability of q-statistic; * denotes statistical significant at level 0.05, ** for 0.001, *** for smaller than 10−3;(D) subscripts 1, 2, 3 of q and p denotes the strata Z1+Z2 with Z3,Z1 with Z2+Z3, and Z1 and Z2 and Z3 individually, respectively; (E) subscripts 1 and 2 of q and p denotes the strata Z1+Z2 with Z3+Z4,and Z1+Z3 with Z2+Z4, respectively.

Defining weights matrices

The value of [math]\displaystyle{ I }[/math] can depend quite a bit on the assumptions built into the spatial weights matrix [math]\displaystyle{ w_{ij} }[/math]. The matrix is required because, in order to address spatial autocorrelation and also model spatial interaction, we need to impose a structure to constrain the number of neighbors to be considered. This is related to Tobler's first law of geography, which states that Everything depends on everything else, but closer things more so—in other words, the law implies a spatial distance decay function, such that even though all observations have an influence on all other observations, after some distance threshold that influence can be neglected.

The idea is to construct a matrix that accurately reflects your assumptions about the particular spatial phenomenon in question. A common approach is to give a weight of 1 if two zones are neighbors, and 0 otherwise, though the definition of 'neighbors' can vary. Another common approach might be to give a weight of 1 to [math]\displaystyle{ k }[/math] nearest neighbors, 0 otherwise. An alternative is to use a distance decay function for assigning weights. Sometimes the length of a shared edge is used for assigning different weights to neighbors. The selection of spatial weights matrix should be guided by theory about the phenomenon in question. The value of [math]\displaystyle{ I }[/math] is quite sensitive to the weights and can influence the conclusions you make about a phenomenon, especially when using distances.

Expected value

The expected value of Moran's I under the null hypothesis of no spatial autocorrelation is

[math]\displaystyle{ E(I) = \frac{-1} {N-1} }[/math]

The null distribution used for this expectation is that the [math]\displaystyle{ x }[/math] input is permuted by a permutation [math]\displaystyle{ \pi }[/math] picked uniformly at random (and the expectation is over picking the permutation).

At large sample sizes (i.e., as N approaches infinity), the expected value approaches zero.

Its variance equals

[math]\displaystyle{ \operatorname{Var}(I) = \frac{NS_4-S_3S_5} {(N-1)(N-2)(N-3)W^2} - (E(I))^2 }[/math]

where

[math]\displaystyle{ S_1 = \frac 1 2 \sum_i \sum_j (w_{ij}+w_{ji})^2 }[/math]
[math]\displaystyle{ S_2 = \sum_i \left( \sum_j w_{ij} + \sum_j w_{ji}\right)^2 }[/math]
[math]\displaystyle{ S_3 = \frac {N^{-1} \sum_i (x_i - \bar x)^4} {(N^{-1} \sum_i (x_i - \bar x)^2)^2} }[/math]
[math]\displaystyle{ S_4 = (N^2-3N+3)S_1 - NS_2 + 3W^2 }[/math]
[math]\displaystyle{ S_5 = (N^2-N) S_1 - 2NS_2 + 6W^2 }[/math][3]

Values of I usually range from −1 to +1. Values significantly below -1/(N-1) indicate negative spatial autocorrelation and values significantly above -1/(N-1) indicate positive spatial autocorrelation. For statistical hypothesis testing, Moran's I values can be transformed to z-scores.

Moran's I is inversely related to Geary's C, but it is not identical. Moran's I is a measure of global spatial autocorrelation, while Geary's C is more sensitive to local spatial autocorrelation.

Local Moran's I

Global spatial autocorrelation analysis yields only one statistic to summarize the whole study area. In other words, the global analysis assumes homogeneity. If that assumption does not hold, then having only one statistic does not make sense as the statistic should differ over space.

Moreover, even if there is no global autocorrelation or no clustering, we can still find clusters at a local level using local spatial autocorrelation analysis. The fact that Moran's I is a summation of individual cross products is exploited by the "local indicators of spatial association" (LISA) to evaluate the clustering in those individual units by calculating Local Moran's I for each spatial unit and evaluating the statistical significance for each Ii. From the equation of Global Moran's I, we can obtain:

[math]\displaystyle{ I_i = \frac{x_i-\bar x}{m_2} \sum_{j=1}^N w_{ij} (x_j-\bar x) }[/math]

where:

[math]\displaystyle{ m_2= \frac{\sum_{i=1}^N (x_i-\bar x)^2 }{N} }[/math]

then,

[math]\displaystyle{ I= \sum_{i=1}^N \frac{I_i}{N} }[/math]

I is the Global Moran's I measuring global autocorrelation, Ii is local, and N is the number of analysis units on the map.

LISAs can for example be calculated in GeoDa, which uses the Local Moran's I,[4] proposed by Luc Anselin in 1995.[5]

Uses

Moran's I is widely used in the fields of geography and geographic information science. Some examples include:

  • The analysis of geographic differences in health variables.[6]
  • Characterising the impact of lithium concentrations in public water on mental health.[7]
  • In dialectology to measure the significance of regional language variation.[8]
  • Defining an objective function for meaningful terrain segmentation for geomorphological studies[9]

See also

References

  1. Moran, P. A. P. (1950). "Notes on Continuous Stochastic Phenomena". Biometrika 37 (1): 17–23. doi:10.2307/2332142. PMID 15420245. 
  2. Li, Hongfei; Calder, Catherine A.; Cressie, Noel (2007). "Beyond Moran's I: Testing for Spatial Dependence Based on the Spatial Autoregressive Model". Geographical Analysis 39 (4): 357–375. doi:10.1111/j.1538-4632.2007.00708.x. 
  3. Cliff and Ord (1981), Spatial Processes, London
  4. Anselin, Luc (2005). "Exploring Spatial Data with GeoDaTM: A Workbook". Spatial Analysis Laboratory. p. 138. https://www.geos.ed.ac.uk/~gisteac/fspat/geodaworkbook.pdf. 
  5. Anselin, Luc (1995). "Local Indicators of Spatial Association—LISA". Geographical Analysis 27 (2): 93–115. doi:10.1111/j.1538-4632.1995.tb00338.x. 
  6. Getis, Arthur (3 Sep 2010). "The Analysis of Spatial Association by Use of Distance Statistics". Geographical Analysis 24 (3): 189–206. doi:10.1111/j.1538-4632.1992.tb00261.x. 
  7. Helbich, M; Leitner, M; Kapusta, ND (2012). "Geospatial examination of lithium in drinking water and suicide mortality". Int J Health Geogr 11 (1): 19. doi:10.1186/1476-072X-11-19. PMID 22695110. 
  8. Grieve, Jack (2011). "A regional analysis of contraction rate in written Standard American English". International Journal of Corpus Linguistics 16 (4): 514–546. doi:10.1075/ijcl.16.4.04gri. https://lirias.kuleuven.be/bitstream/123456789/328993/1/GrieveIJCLAug312011.doc. 
  9. Alvioli, M.; Marchesini, I.; Reichenbach, P.; Rossi, M.; Ardizzone, F.; Fiorucci, F.; Guzzetti, F. (2016). "Automatic delineation of geomorphological slope units with r.slopeunits v1.0 and their optimization for landslide susceptibility modeling". Geoscientific Model Development 9: 3975–3991. doi:10.5194/gmd-9-3975-2016.