Modifiable areal unit problem: Difference between revisions
imported>NBrush change |
url |
||
| Line 1: | Line 1: | ||
{{Short description|Source of statistical bias}} | {{Short description|Source of statistical bias}} | ||
[[File:Maup rate numbers.png|alt=MAUP distortion example|thumb|298x298px|An example of the modifiable areal unit problem and the distortion of rate calculations]] | [[File:Maup rate numbers.png|alt=MAUP distortion example|thumb|298x298px|An example of the modifiable areal unit problem and the distortion of rate calculations.]] | ||
__NOTOC__ | __NOTOC__ | ||
The '''modifiable areal unit problem''' ('''MAUP''') is a source of statistical bias that can significantly impact the results of statistical hypothesis | The '''modifiable areal unit problem''' ('''MAUP''') is a source of statistical bias that can significantly impact the results of [[Statistical hypothesis test|statistical hypothesis test]]s. The MAUP affects results when point-based measures of spatial phenomena are [[Aggregate data|aggregated]] into spatial partitions or '''''areal units''''' (such as regions or districts) as in, for example, [[Earth:Population density|population density]] or [[Social:Illness rate|illness rate]]s.<ref name=Openshaw1>{{cite book |last1=Openshaw |first1=Stan |title=The Modifiable Areal Unit Problem |date=1983 |publisher=Geo Books |isbn=0-86094-134-5 |url=https://alexsingleton.files.wordpress.com/2014/09/38-maup-openshaw.pdf}}</ref><ref name=Chen1>{{cite journal |last1=Chen |first1=Xiang |last2=Ye |first2=Xinyue |last3=Widener |first3=Michael J. |last4=Delmelle |first4=Eric |last5=Kwan |first5=Mei-Po |last6=Shannon |first6=Jerry |last7=Racine |first7=Racine F. |last8=Adams |first8=Aaron |last9=Liang |first9=Lu |last10=Peng |first10=Jia |title=A systematic review of the modifiable areal unit problem (MAUP) in community food environmental research |journal=Urban Informatics |date=27 December 2022 |volume=1 |issue=1 |page=22 |doi=10.1007/s44212-022-00021-1 |s2cid=255206315 |doi-access=free |bibcode=2022UrbIn...1...22C }}</ref> The resulting summary values (e.g., totals, rates, proportions, densities) are influenced by both the shape and [[Earth:Scale (geography)|scale]] of the aggregation unit.<ref>{{Cite web|url=http://support.esri.com/other-resources/gis-dictionary/term/MAUP|title=MAUP {{!}} Definition – Esri Support GIS Dictionary|website=support.esri.com|access-date=2017-03-09}}</ref> | ||
For example, census data may be aggregated into county districts, census tracts, postcode areas, police precincts, or any other arbitrary spatial partition. Thus the results of data aggregation are dependent on the mapmaker's choice of which "modifiable areal unit" to use in their analysis. A census [[Choropleth map|choropleth map]] calculating population density using state boundaries will yield radically different results | For example, census data may be aggregated into county districts, census tracts, postcode areas, police precincts, or any other arbitrary spatial partition. Thus, the results of data aggregation are dependent on the mapmaker's choice of which "modifiable areal unit" to use in their analysis. A census [[Choropleth map|choropleth map]] calculating population density using state boundaries will yield radically different results from a map that calculates density based on county boundaries. Furthermore, census district boundaries are also subject to change over time,<ref>{{Cite web|url=https://www.census.gov/geo/reference/boundary-changes.html|title=Geographic Boundary Change Notes|last=Geography|first=US Census Bureau|website=www.census.gov|language=EN-US|access-date=2017-02-24}}</ref> meaning the MAUP must be considered when comparing past to current data. | ||
== Background == | == Background == | ||
This issue was first recognized by Gehlke and Biehl in 1934,<ref>{{Harvnb|Gehlke|Biehl|1934}}</ref> and later described in detail in an entry in the Concepts and Techniques in Modern Geography (CATMOG) series by Stan Openshaw (1984) as well as in the book by Giuseppe Arbia (1988). In particular, Openshaw (1984) observed that "the areal units (zonal objects) used in many geographical studies are arbitrary, modifiable, and subject to the whims and fancies of whoever is doing, or did, the aggregating."<ref name = openshaw3>{{Harvnb|Openshaw|1984|p=3}}</ref> The problem is especially apparent when the aggregate data are used for cluster analysis for [[Medicine:Spatial epidemiology|spatial epidemiology]], [[Spatial statistics|spatial statistics]] or [[Choropleth map|choropleth map]]ping, in which misinterpretations can easily be made without realizing it. Many fields of science, especially [[Social:Human geography|human geography]] are prone to disregard the MAUP when drawing inferences from statistics based on aggregated data.<ref name=Chen1/> MAUP is closely related to the topic of [[Philosophy:Ecological fallacy|ecological fallacy]] and ecological bias (Arbia, 1988). Stan Openshaw's work on this topic has led to Michael F. Goodchild suggesting it be referred to as the "Openshaw effect."<ref name="Goodchild2022">{{cite journal |last1=Goodchild |first1=Michael F. |title=The Openshaw effect |journal=International Journal of Geographical Information Science |date=2022 |volume=36 |issue=9 |pages=1697–1698 |doi=10.1080/13658816.2022.2102637 |bibcode=2022IJGIS..36.1697G |url=https://www.tandfonline.com/doi/full/10.1080/13658816.2022.2102637 |access-date=24 January 2024}}</ref> | |||
Ecological bias caused by MAUP has been documented as two separate effects that usually occur simultaneously during the analysis of aggregated data. First, the scale effect causes variation in statistical results between different levels of aggregation (radial distance). Therefore, the association between variables depends on the size of areal units for which data are reported. Generally, correlation increases as areal unit size increases. The | Ecological bias caused by MAUP has been documented as two separate effects that usually occur simultaneously during the analysis of aggregated data. First, the [[Earth:Scale (geography)|scale effect]] causes variation in statistical results between different levels of aggregation (radial distance). Therefore, the association between variables depends on the size of areal units for which data are reported. Generally, correlation increases as areal unit size increases. The zoning effect describes variation in correlation statistics caused by the regrouping of data into different configurations at the same scale (areal shape).<ref>{{Cite book |last1=Fotheringham |first1=A. S. |last2=Rogerson |first2=P. A |year=2008 |title=The SAGE handbook of spatial analysis |chapter=The Modifiable Areal Unit Problem (MAUP) |publisher=Sage |pages=105–124 |isbn=978-1-4129-1082-8}}</ref> | ||
Since the 1930s, research has found extra variation in statistical results because of the MAUP. The standard methods of calculating within-group and between-group variance do not account for the extra variance seen in MAUP studies as the groupings change. MAUP can be used as a methodology to calculate upper and lower limits as well as average regression parameters for multiple sets of spatial groupings. The MAUP is a critical source of error in spatial studies, whether observational or experimental. As such, unit consistency, particularly in a time-series cross-sectional (TSCS) context, is essential. | Since the 1930s, research has found extra variation in statistical results because of the MAUP. The standard methods of calculating within-group and between-group variance do not account for the extra variance seen in MAUP studies as the groupings change. The MAUP can be used as a methodology to calculate upper and lower limits as well as average regression parameters for multiple sets of spatial groupings. The MAUP is a critical source of error in spatial studies, whether observational or experimental. As such, unit consistency, particularly in a time-series cross-sectional (TSCS) context, is essential. Furthermore, robustness checks of unit sensitivity to alternative spatial aggregation should be routinely performed to mitigate associated biases on resulting statistical estimates. | ||
[[File:Q-fig2.jpg|frame|center|A hand map with different spatial patterns. Note: ''p'' is the probability of ''q''-statistic; * denotes statistical significant at level 0.05, ** for 0.001, *** for smaller than 10<sup>−3</sup>;(D) subscripts 1, 2, 3 of ''q'' and ''p'' denotes the strata Z1+Z2 with Z3, Z1 with Z2+Z3, and Z1 and Z2 and Z3 individually, respectively; (E) subscripts 1 and 2 of ''q'' and ''p'' denotes the strata Z1+Z2 with Z3+Z4, and Z1+Z3 with Z2+Z4, respectively.]] | |||
== Suggested solutions == | == Suggested solutions == | ||
Several suggestions have been made in literature to reduce aggregation bias during [[Regression analysis|regression analysis]]. A researcher might correct the variance-covariance matrix using samples from individual-level data.<ref>Holt D, Steel D, Tranmer M, Wrigley N. (1996). “Aggregation and ecological effects in geographically based data.” “Geographical Analysis” 28:244{261</ref> Alternatively, one might focus on local spatial regression rather than global regression. A researcher might also attempt to design areal units to maximize a particular statistical result.<ref name = openshaw3/> Others have argued that it may be difficult to construct a single set of optimal aggregation units for multiple variables, each of which may exhibit non-stationarity and spatial autocorrelation across space in different ways. Others have suggested developing statistics that change across scales in a predictable way, perhaps using fractal dimension as a scale-independent measure of spatial relationships. Others have suggested Bayesian hierarchical models as a general methodology for combining aggregated and individual-level data for ecological inference. | Several suggestions have been made in the literature to reduce aggregation bias during [[Regression analysis|regression analysis]]. A researcher might correct the variance-covariance matrix using samples from individual-level data.<ref>Holt D, Steel D, Tranmer M, Wrigley N. (1996). “Aggregation and ecological effects in geographically based data.” “Geographical Analysis” 28:244{261</ref> Alternatively, one might focus on local spatial regression rather than global regression. A researcher might also attempt to design areal units to maximize a particular statistical result.<ref name = openshaw3/> Others have argued that it may be difficult to construct a single set of optimal aggregation units for multiple variables, each of which may exhibit non-stationarity and spatial autocorrelation across space in different ways. Others have suggested developing statistics that change across scales in a predictable way, perhaps using fractal dimension as a scale-independent measure of spatial relationships. Others have suggested Bayesian hierarchical models as a general methodology for combining aggregated and individual-level data for ecological inference. | ||
Studies of the MAUP based on empirical data can only provide limited insight due to an inability to control relationships between multiple spatial variables. Data simulation is necessary to have control over various properties of individual-level data. Simulation studies have demonstrated that the spatial support of variables can affect the magnitude of ecological bias caused by spatial data aggregation.<ref name=swift2008 /> | Studies of the MAUP based on empirical data can only provide limited insight due to an inability to control relationships between multiple spatial variables. Data simulation is necessary to have control over various properties of individual-level data. Simulation studies have demonstrated that the spatial support of variables can affect the magnitude of ecological bias caused by spatial data aggregation.<ref name=swift2008 /> | ||
== MAUP sensitivity analysis == | == MAUP sensitivity analysis == | ||
{{Primary sources|section|date=August 2018}} | |||
Using simulations for univariate data, Larsen advocated the use of a Variance Ratio to investigate the effect of spatial configuration, spatial association, and data aggregation.<ref>Larsen, J. (2000). "The Modifiable Areal Unit Problem: A problem or a source of spatial information?" PhD thesis, Ohio State University.</ref> A detailed description of the variation of statistics due to MAUP is presented by Reynolds, who demonstrates the importance of the spatial arrangement and spatial autocorrelation of data values.<ref>Reynolds, H. (1998). "The Modifiable Area Unit Problem: Empirical Analysis By Statistical Simulation." PhD thesis, Department of Geography University of Toronto, http://www.badpets.net/Thesis</ref> Reynold’s simulation experiments were expanded by Swift, who in which a series of nine exercises began with simulated regression analysis and spatial trend, then focused on the topic of MAUP in the context of spatial epidemiology. A method of MAUP sensitivity analysis is presented that demonstrates that the MAUP is not entirely a problem.<ref name=swift2008>Swift, A., Liu, L., and Uber, J. (2008) "Reducing MAUP bias of correlation statistics between water quality and GI illness." Computers, Environment and Urban Systems 32, 134–148</ref> MAUP can be used as an analytical tool to help understand spatial heterogeneity and spatial autocorrelation. | Using simulations for univariate data, Larsen advocated the use of a Variance Ratio to investigate the effect of spatial configuration, spatial association, and data aggregation.<ref>Larsen, J. (2000). "The Modifiable Areal Unit Problem: A problem or a source of spatial information?" PhD thesis, Ohio State University.</ref> A detailed description of the variation of statistics due to MAUP is presented by Reynolds, who demonstrates the importance of the spatial arrangement and spatial autocorrelation of data values.<ref>Reynolds, H. (1998). "The Modifiable Area Unit Problem: Empirical Analysis By Statistical Simulation." PhD thesis, Department of Geography University of Toronto, http://www.badpets.net/Thesis</ref> Reynold’s simulation experiments were expanded by Swift, who in which a series of nine exercises began with simulated regression analysis and spatial trend, then focused on the topic of MAUP in the context of spatial epidemiology. A method of MAUP sensitivity analysis is presented that demonstrates that the MAUP is not entirely a problem.<ref name=swift2008>Swift, A., Liu, L., and Uber, J. (2008) "Reducing MAUP bias of correlation statistics between water quality and GI illness." Computers, Environment and Urban Systems 32, 134–148</ref> MAUP can be used as an analytical tool to help understand spatial heterogeneity and spatial autocorrelation. | ||
This topic is of particular importance because in some cases data aggregation can obscure a strong [[Correlation|correlation]] between variables, making the relationship appear weak or even negative. Conversely, MAUP can cause random variables to appear as if there is a significant association where there is not. Multivariate regression parameters are more sensitive to MAUP than correlation coefficients. Until a more analytical solution to MAUP is discovered, spatial sensitivity analysis using a variety of areal units is recommended as a methodology to estimate the uncertainty of correlation and regression coefficients due to ecological bias. An example of data simulation and re-aggregation using the ArcPy library is available.<ref>Swift, A. (2017). "Crime mapping data simulation", https://app.box.com/s/a84w16x7hffljjvkhtlr72eisj4qiene</ref> | This topic is of particular importance because in some cases data aggregation can obscure a strong [[Correlation|correlation]] between variables, making the relationship appear weak or even negative. Conversely, MAUP can cause random variables to appear as if there is a significant association where there is not. Multivariate regression parameters are more sensitive to MAUP than correlation coefficients. Until a more analytical solution to MAUP is discovered, spatial sensitivity analysis using a variety of areal units is recommended as a methodology to estimate the uncertainty of correlation and regression coefficients due to ecological bias. An example of data simulation and re-aggregation using the ArcPy library is available.<ref>Swift, A. (2017). "Crime mapping data simulation", https://app.box.com/s/a84w16x7hffljjvkhtlr72eisj4qiene</ref> | ||
<ref name="Silva">{{cite journal |last1=Viegas |first1=José Manuel |last2=Martinez |first2=L. Miguel |last3=Silva |first3=Elisabete A. |title=Effects of the Modifiable Areal Unit Problem on the Delineation of Traffic Analysis Zones |journal=Environment and Planning B: Planning and Design |date=January 2009 |volume=36 |issue=4 |pages=625–643 |doi=10.1068/b34033|s2cid=54840846 }}</ref> | <ref name="Silva">{{cite journal |last1=Viegas |first1=José Manuel |last2=Martinez |first2=L. Miguel |last3=Silva |first3=Elisabete A. |title=Effects of the Modifiable Areal Unit Problem on the Delineation of Traffic Analysis Zones |journal=Environment and Planning B: Planning and Design |date=January 2009 |volume=36 |issue=4 |pages=625–643 |doi=10.1068/b34033|bibcode=2009EnPlB..36..625V |s2cid=54840846 }}</ref> | ||
In transport planning, MAUP is associated to Traffic Analysis Zoning (TAZ). A major point of departure in understanding problems in transportation analysis is the recognition that spatial analysis has some limitations associated with the discretization of space. Among them, modifiable areal units and boundary problems are directly or indirectly related to transportation planning and analysis through the design of traffic analysis zones – most of transport studies require directly or indirectly the definition of TAZs. The modifiable boundary and the scale issues should all be given specific attention during the specification of a TAZ because of the effects these factors exert on statistical and mathematical properties of spatial patterns (ie the modifiable areal unit problem—MAUP). In the studies of Viegas, Martinez and Silva (2009, 2009b)<ref name="Silva" /> the authors propose a method where the results obtained from the study of spatial data are not independent of the scale, and the aggregation effects are implicit in the choice of zonal boundaries. The delineation of zonal boundaries of TAZs has a direct impact on the reality and accuracy of the results obtained from transportation forecasting models. In this paper the MAUP effects on the TAZ definition and the transportation demand models are measured and analyzed using different grids (in size and in origin location). This analysis was developed by building an application integrated in commercial GIS software and by using a case study (Lisbon Metropolitan Area) to test its implementabiity and performance. The results reveal the conflict between statistical and geographic precision, and their relationship with the loss of information in the traffic assignment step of the transportation planning models.<ref name="Silva" /> | In transport planning, MAUP is associated to Traffic Analysis Zoning (TAZ). A major point of departure in understanding problems in transportation analysis is the recognition that spatial analysis has some limitations associated with the discretization of space. Among them, modifiable areal units and boundary problems are directly or indirectly related to transportation planning and analysis through the design of traffic analysis zones – most of transport studies require directly or indirectly the definition of TAZs. The modifiable boundary and the scale issues should all be given specific attention during the specification of a TAZ because of the effects these factors exert on statistical and mathematical properties of spatial patterns (ie the modifiable areal unit problem—MAUP). In the studies of Viegas, Martinez and Silva (2009, 2009b)<ref name="Silva" /> the authors propose a method where the results obtained from the study of spatial data are not independent of the scale, and the aggregation effects are implicit in the choice of zonal boundaries. The delineation of zonal boundaries of TAZs has a direct impact on the reality and accuracy of the results obtained from transportation forecasting models. In this paper the MAUP effects on the TAZ definition and the transportation demand models are measured and analyzed using different grids (in size and in origin location). This analysis was developed by building an application integrated in commercial GIS software and by using a case study (Lisbon Metropolitan Area) to test its implementabiity and performance. The results reveal the conflict between statistical and geographic precision, and their relationship with the loss of information in the traffic assignment step of the transportation planning models.<ref name="Silva" /> | ||
Research has also identified the modifiable areal unit problem (MAUP) to be a factor in climate action and governance by affecting coordination between national and local actors. Data scaling issues associated with MAUP may result in mismatches in climate priorities and create inequities in the outcomes of climate action, potentially undermining the effectiveness of policies designed to address climate change at different governance levels.<ref>{{Cite journal |last=Sudmant |first=Andrew |date=2024-09-01 |title=Data Scaling: Implications for Climate Action and Governance in the UK |journal=Environmental Management |language=en |volume=74 |issue=3 |pages=414–424 |doi=10.1007/s00267-024-01991-5 |issn=1432-1009 |pmc=11306386 |pmid=38811434|bibcode=2024EnMan.tmp...83S }}</ref> | |||
==See also== | ==See also== | ||
* [[Philosophy:Arbia's law of geography|Arbia's law of geography]] | * [[Philosophy:Arbia's law of geography|Arbia's law of geography]] | ||
* | * Boundary problem (in spatial analysis) | ||
* [[Earth:Modifiable temporal unit problem|Modifiable temporal unit problem]] | |||
* [[Earth:Neighborhood effect averaging problem|Neighborhood effect averaging problem]] | |||
* [[Representation theory]] | * [[Representation theory]] | ||
* [[Spatial analysis]] | * [[Spatial analysis]] | ||
* [[Earth:Uncertain geographic context problem|Uncertain geographic context problem]] | |||
* [[Reference class problem]] | |||
;Applications | |||
* [[Social:Gerrymandering|Gerrymandering]] | * [[Social:Gerrymandering|Gerrymandering]] | ||
* Red states and blue states | |||
* [[Spatial econometrics]] | * [[Spatial econometrics]] | ||
* [[Medicine:Spatial epidemiology|Spatial epidemiology]] | * [[Medicine:Spatial epidemiology|Spatial epidemiology]] | ||
| Line 45: | Line 53: | ||
==Sources== | ==Sources== | ||
*{{cite book |last=Arbia |first=Giuseppe |year=1988 |title=Spatial data configuration in then statistical analysis of regional economic and related problems|location=Dordrecht |publisher=Kluwer Academic Publishers}} | *{{cite book |last=Arbia |first=Giuseppe |year=1988 |title=Spatial data configuration in then statistical analysis of regional economic and related problems|location=Dordrecht |publisher=Kluwer Academic Publishers}} | ||
* | * 50px This article contains quotations from [http://wiki.gis.com/wiki/index.php/Modifiable_areal_unit_problem Modifiable areal unit problem] at the GIS Wiki, which is available under the [https://creativecommons.org/licenses/by/3.0/ Creative Commons Attribution 3.0 Unported (CC BY 3.0)] license. | ||
*{{cite journal |last1=Gehlke |first1=C. E. |last2=Biehl |first2=Katherine |date=March 1934 |title=Certain effects of grouping upon the size of the correlation coefficient in census tract material |journal=Journal of the American Statistical Association |volume=29 |issue=185A |pages=169–170 |doi=10.2307/2277827 |jstor=2277827 }} | *{{cite journal |last1=Gehlke |first1=C. E. |last2=Biehl |first2=Katherine |date=March 1934 |title=Certain effects of grouping upon the size of the correlation coefficient in census tract material |journal=Journal of the American Statistical Association |volume=29 |issue=185A |pages=169–170 |doi=10.2307/2277827 |jstor=2277827 }} | ||
*{{cite book |last=Openshaw |first=Stan |year=1984 |title=The modifiable areal unit problem |location=Norwick |publisher=Geo Books |isbn=0860941345 |oclc=12052482 }} | *{{cite book |last=Openshaw |first=Stan |year=1984 |title=The modifiable areal unit problem |location=Norwick |publisher=Geo Books |isbn=0860941345 |oclc=12052482 }} | ||
| Line 55: | Line 63: | ||
==Further reading== | ==Further reading== | ||
*{{cite journal |last=Cressie |first=Noel A |year=1996 |title=Change of support and the modifiable areal unit problem |journal=Geographical Systems |volume=3 |issue=2–3 |pages=159–180 }} | *{{cite journal |last=Cressie |first=Noel A |year=1996 |title=Change of support and the modifiable areal unit problem |journal=Geographical Systems |volume=3 |issue=2–3 |pages=159–180 }} | ||
*{{cite journal |last1=Holt |first1=David |last2=Steel |first2=David |last3=Tranmer |first3=Mark |last4=Wrigley |first4=Neil |date=July 1996 |title=Aggregation and ecological effects in geographically based data |journal=Geographical Analysis |volume=28 |issue=3 |pages=244–261 |doi=10.1111/j.1538-4632.1996.tb00933.x |url=https://www.researchgate.net/publication/229876830 |doi-access=free }} | *{{cite journal |last1=Holt |first1=David |last2=Steel |first2=David |last3=Tranmer |first3=Mark |last4=Wrigley |first4=Neil |date=July 1996 |title=Aggregation and ecological effects in geographically based data |journal=Geographical Analysis |volume=28 |issue=3 |pages=244–261 |doi=10.1111/j.1538-4632.1996.tb00933.x |url=https://www.researchgate.net/publication/229876830 |doi-access=free |bibcode=1996GeoAn..28..244H }} | ||
* {{cite journal |last1=Horner |first1=Mark W. |last2=Murray |first2=Alan T. |date=January 2002 |title=Excess commuting and the modifiable areal unit problem |journal=Urban Studies |volume=39 |issue=1 |pages=131–139 |doi=10.1080/00420980220099113 |s2cid=56418131 |url=http://sustainability.water.ca.gov/documents/3380372/3384417/Excess+Commuting+and+the+Modifiable+Areal+Unit+Problem.pdf |access-date=2015-07-05 |archive-url=https://web.archive.org/web/20170422022227/https://sustainability.water.ca.gov/documents/3380372/3384417/Excess+Commuting+and+the+Modifiable+Areal+Unit+Problem.pdf |archive-date=2017-04-22 |url-status=dead }} | * {{cite journal |last1=Horner |first1=Mark W. |last2=Murray |first2=Alan T. |date=January 2002 |title=Excess commuting and the modifiable areal unit problem |journal=Urban Studies |volume=39 |issue=1 |pages=131–139 |doi=10.1080/00420980220099113 |bibcode=2002UrbSt..39..131H |s2cid=56418131 |url=http://sustainability.water.ca.gov/documents/3380372/3384417/Excess+Commuting+and+the+Modifiable+Areal+Unit+Problem.pdf |access-date=2015-07-05 |archive-url=https://web.archive.org/web/20170422022227/https://sustainability.water.ca.gov/documents/3380372/3384417/Excess+Commuting+and+the+Modifiable+Areal+Unit+Problem.pdf |archive-date=2017-04-22 |url-status=dead }} | ||
*{{cite journal |last=Kwan |first=Mei-Po |year=2012 |title=The uncertain geographic context problem |journal=Annals of the Association of American Geographers |volume=102 |issue=5 |pages=958–968 |doi=10.1080/00045608.2012.687349 |s2cid=52024592 |url=http://meipokwan.org/Paper/Kwan_UGCoP_2012.pdf }} | *{{cite journal |last=Kwan |first=Mei-Po |year=2012 |title=The uncertain geographic context problem |journal=Annals of the Association of American Geographers |volume=102 |issue=5 |pages=958–968 |doi=10.1080/00045608.2012.687349 |bibcode=2012AAAG..102..958K |s2cid=52024592 |url=http://meipokwan.org/Paper/Kwan_UGCoP_2012.pdf }} | ||
* {{cite journal |last=Menon |first=Carlo |date=March 2012 |title=The bright side of MAUP: defining new measures of industrial agglomeration |journal=[[Earth:Papers in Regional Science|Papers in Regional Science]] |volume=91 |issue=1 |pages=3–28 |doi=10.1111/j.1435-5957.2011.00350.x |url= | * {{cite journal |last=Menon |first=Carlo |date=March 2012 |title=The bright side of MAUP: defining new measures of industrial agglomeration |journal=[[Earth:Papers in Regional Science|Papers in Regional Science]] |volume=91 |issue=1 |pages=3–28 |doi=10.1111/j.1435-5957.2011.00350.x |bibcode=2012PRegS..91....3M |url=https://core.ac.uk/download/pdf/6234148.pdf }} | ||
*{{cite journal |last=Unwin |first=David J |date=December 1996 |title=GIS, spatial analysis and spatial statistics |journal=[[Earth:Progress in Human Geography|Progress in Human Geography]] |volume=20 |issue=4 |pages=540–551 |doi=10.1177/030913259602000408 |s2cid=129487607 |url=https://www.researchgate.net/publication/237238258 }} | *{{cite journal |last=Unwin |first=David J |date=December 1996 |title=GIS, spatial analysis and spatial statistics |journal=[[Earth:Progress in Human Geography|Progress in Human Geography]] |volume=20 |issue=4 |pages=540–551 |doi=10.1177/030913259602000408 |s2cid=129487607 |url=https://www.researchgate.net/publication/237238258 }} | ||
*{{cite book |last=Wong |first=David |year=2009 |chapter=The modifiable areal unit problem (MAUP) |editor1-last=Fotheringham |editor1-first=A Stewart |editor2-last=Rogerson |editor2-first=Peter |title=The SAGE handbook of spatial analysis |location=Los Angeles |publisher=Sage |pages=105–124 |isbn=9781412910828 |oclc=85898184 |chapter-url=https://books.google.com/books?id=phEgXfbCU_YC&pg=PA105 }} | *{{cite book |last=Wong |first=David |year=2009 |chapter=The modifiable areal unit problem (MAUP) |editor1-last=Fotheringham |editor1-first=A Stewart |editor2-last=Rogerson |editor2-first=Peter |title=The SAGE handbook of spatial analysis |location=Los Angeles |publisher=Sage |pages=105–124 |isbn=9781412910828 |oclc=85898184 |chapter-url=https://books.google.com/books?id=phEgXfbCU_YC&pg=PA105 }} | ||
| Line 66: | Line 74: | ||
{{DEFAULTSORT:Modifiable areal unit problem}} | {{DEFAULTSORT:Modifiable areal unit problem}} | ||
[[Category:Geographic information systems]] | [[Category:Geographic information systems]] | ||
{{Sourceattribution|Modifiable areal unit problem | {{Sourceattribution|Modifiable areal unit problem}} | ||
Latest revision as of 17:29, 25 May 2026

The modifiable areal unit problem (MAUP) is a source of statistical bias that can significantly impact the results of statistical hypothesis tests. The MAUP affects results when point-based measures of spatial phenomena are aggregated into spatial partitions or areal units (such as regions or districts) as in, for example, population density or illness rates.[1][2] The resulting summary values (e.g., totals, rates, proportions, densities) are influenced by both the shape and scale of the aggregation unit.[3]
For example, census data may be aggregated into county districts, census tracts, postcode areas, police precincts, or any other arbitrary spatial partition. Thus, the results of data aggregation are dependent on the mapmaker's choice of which "modifiable areal unit" to use in their analysis. A census choropleth map calculating population density using state boundaries will yield radically different results from a map that calculates density based on county boundaries. Furthermore, census district boundaries are also subject to change over time,[4] meaning the MAUP must be considered when comparing past to current data.
Background
This issue was first recognized by Gehlke and Biehl in 1934,[5] and later described in detail in an entry in the Concepts and Techniques in Modern Geography (CATMOG) series by Stan Openshaw (1984) as well as in the book by Giuseppe Arbia (1988). In particular, Openshaw (1984) observed that "the areal units (zonal objects) used in many geographical studies are arbitrary, modifiable, and subject to the whims and fancies of whoever is doing, or did, the aggregating."[6] The problem is especially apparent when the aggregate data are used for cluster analysis for spatial epidemiology, spatial statistics or choropleth mapping, in which misinterpretations can easily be made without realizing it. Many fields of science, especially human geography are prone to disregard the MAUP when drawing inferences from statistics based on aggregated data.[2] MAUP is closely related to the topic of ecological fallacy and ecological bias (Arbia, 1988). Stan Openshaw's work on this topic has led to Michael F. Goodchild suggesting it be referred to as the "Openshaw effect."[7]
Ecological bias caused by MAUP has been documented as two separate effects that usually occur simultaneously during the analysis of aggregated data. First, the scale effect causes variation in statistical results between different levels of aggregation (radial distance). Therefore, the association between variables depends on the size of areal units for which data are reported. Generally, correlation increases as areal unit size increases. The zoning effect describes variation in correlation statistics caused by the regrouping of data into different configurations at the same scale (areal shape).[8]
Since the 1930s, research has found extra variation in statistical results because of the MAUP. The standard methods of calculating within-group and between-group variance do not account for the extra variance seen in MAUP studies as the groupings change. The MAUP can be used as a methodology to calculate upper and lower limits as well as average regression parameters for multiple sets of spatial groupings. The MAUP is a critical source of error in spatial studies, whether observational or experimental. As such, unit consistency, particularly in a time-series cross-sectional (TSCS) context, is essential. Furthermore, robustness checks of unit sensitivity to alternative spatial aggregation should be routinely performed to mitigate associated biases on resulting statistical estimates.

Suggested solutions
Several suggestions have been made in the literature to reduce aggregation bias during regression analysis. A researcher might correct the variance-covariance matrix using samples from individual-level data.[9] Alternatively, one might focus on local spatial regression rather than global regression. A researcher might also attempt to design areal units to maximize a particular statistical result.[6] Others have argued that it may be difficult to construct a single set of optimal aggregation units for multiple variables, each of which may exhibit non-stationarity and spatial autocorrelation across space in different ways. Others have suggested developing statistics that change across scales in a predictable way, perhaps using fractal dimension as a scale-independent measure of spatial relationships. Others have suggested Bayesian hierarchical models as a general methodology for combining aggregated and individual-level data for ecological inference.
Studies of the MAUP based on empirical data can only provide limited insight due to an inability to control relationships between multiple spatial variables. Data simulation is necessary to have control over various properties of individual-level data. Simulation studies have demonstrated that the spatial support of variables can affect the magnitude of ecological bias caused by spatial data aggregation.[10]
MAUP sensitivity analysis
This section relies too much on references to primary sources. (August 2018) (Learn how and when to remove this template message) |
Using simulations for univariate data, Larsen advocated the use of a Variance Ratio to investigate the effect of spatial configuration, spatial association, and data aggregation.[11] A detailed description of the variation of statistics due to MAUP is presented by Reynolds, who demonstrates the importance of the spatial arrangement and spatial autocorrelation of data values.[12] Reynold’s simulation experiments were expanded by Swift, who in which a series of nine exercises began with simulated regression analysis and spatial trend, then focused on the topic of MAUP in the context of spatial epidemiology. A method of MAUP sensitivity analysis is presented that demonstrates that the MAUP is not entirely a problem.[10] MAUP can be used as an analytical tool to help understand spatial heterogeneity and spatial autocorrelation.
This topic is of particular importance because in some cases data aggregation can obscure a strong correlation between variables, making the relationship appear weak or even negative. Conversely, MAUP can cause random variables to appear as if there is a significant association where there is not. Multivariate regression parameters are more sensitive to MAUP than correlation coefficients. Until a more analytical solution to MAUP is discovered, spatial sensitivity analysis using a variety of areal units is recommended as a methodology to estimate the uncertainty of correlation and regression coefficients due to ecological bias. An example of data simulation and re-aggregation using the ArcPy library is available.[13] [14]
In transport planning, MAUP is associated to Traffic Analysis Zoning (TAZ). A major point of departure in understanding problems in transportation analysis is the recognition that spatial analysis has some limitations associated with the discretization of space. Among them, modifiable areal units and boundary problems are directly or indirectly related to transportation planning and analysis through the design of traffic analysis zones – most of transport studies require directly or indirectly the definition of TAZs. The modifiable boundary and the scale issues should all be given specific attention during the specification of a TAZ because of the effects these factors exert on statistical and mathematical properties of spatial patterns (ie the modifiable areal unit problem—MAUP). In the studies of Viegas, Martinez and Silva (2009, 2009b)[14] the authors propose a method where the results obtained from the study of spatial data are not independent of the scale, and the aggregation effects are implicit in the choice of zonal boundaries. The delineation of zonal boundaries of TAZs has a direct impact on the reality and accuracy of the results obtained from transportation forecasting models. In this paper the MAUP effects on the TAZ definition and the transportation demand models are measured and analyzed using different grids (in size and in origin location). This analysis was developed by building an application integrated in commercial GIS software and by using a case study (Lisbon Metropolitan Area) to test its implementabiity and performance. The results reveal the conflict between statistical and geographic precision, and their relationship with the loss of information in the traffic assignment step of the transportation planning models.[14]
Research has also identified the modifiable areal unit problem (MAUP) to be a factor in climate action and governance by affecting coordination between national and local actors. Data scaling issues associated with MAUP may result in mismatches in climate priorities and create inequities in the outcomes of climate action, potentially undermining the effectiveness of policies designed to address climate change at different governance levels.[15]
See also
- Arbia's law of geography
- Boundary problem (in spatial analysis)
- Modifiable temporal unit problem
- Neighborhood effect averaging problem
- Representation theory
- Spatial analysis
- Uncertain geographic context problem
- Reference class problem
- Applications
- Gerrymandering
- Red states and blue states
- Spatial econometrics
- Spatial epidemiology
References
- ↑ Openshaw, Stan (1983). The Modifiable Areal Unit Problem. Geo Books. ISBN 0-86094-134-5. https://alexsingleton.files.wordpress.com/2014/09/38-maup-openshaw.pdf.
- ↑ 2.0 2.1 Chen, Xiang; Ye, Xinyue; Widener, Michael J.; Delmelle, Eric; Kwan, Mei-Po; Shannon, Jerry; Racine, Racine F.; Adams, Aaron et al. (27 December 2022). "A systematic review of the modifiable areal unit problem (MAUP) in community food environmental research". Urban Informatics 1 (1): 22. doi:10.1007/s44212-022-00021-1. Bibcode: 2022UrbIn...1...22C.
- ↑ "MAUP | Definition – Esri Support GIS Dictionary". http://support.esri.com/other-resources/gis-dictionary/term/MAUP.
- ↑ Geography, US Census Bureau. "Geographic Boundary Change Notes" (in EN-US). https://www.census.gov/geo/reference/boundary-changes.html.
- ↑ Gehlke & Biehl 1934
- ↑ 6.0 6.1 Openshaw 1984, p. 3
- ↑ Goodchild, Michael F. (2022). "The Openshaw effect". International Journal of Geographical Information Science 36 (9): 1697–1698. doi:10.1080/13658816.2022.2102637. Bibcode: 2022IJGIS..36.1697G. https://www.tandfonline.com/doi/full/10.1080/13658816.2022.2102637. Retrieved 24 January 2024.
- ↑ Fotheringham, A. S.; Rogerson, P. A (2008). "The Modifiable Areal Unit Problem (MAUP)". The SAGE handbook of spatial analysis. Sage. pp. 105–124. ISBN 978-1-4129-1082-8.
- ↑ Holt D, Steel D, Tranmer M, Wrigley N. (1996). “Aggregation and ecological effects in geographically based data.” “Geographical Analysis” 28:244{261
- ↑ 10.0 10.1 Swift, A., Liu, L., and Uber, J. (2008) "Reducing MAUP bias of correlation statistics between water quality and GI illness." Computers, Environment and Urban Systems 32, 134–148
- ↑ Larsen, J. (2000). "The Modifiable Areal Unit Problem: A problem or a source of spatial information?" PhD thesis, Ohio State University.
- ↑ Reynolds, H. (1998). "The Modifiable Area Unit Problem: Empirical Analysis By Statistical Simulation." PhD thesis, Department of Geography University of Toronto, http://www.badpets.net/Thesis
- ↑ Swift, A. (2017). "Crime mapping data simulation", https://app.box.com/s/a84w16x7hffljjvkhtlr72eisj4qiene
- ↑ 14.0 14.1 14.2 Viegas, José Manuel; Martinez, L. Miguel; Silva, Elisabete A. (January 2009). "Effects of the Modifiable Areal Unit Problem on the Delineation of Traffic Analysis Zones". Environment and Planning B: Planning and Design 36 (4): 625–643. doi:10.1068/b34033. Bibcode: 2009EnPlB..36..625V.
- ↑ Sudmant, Andrew (2024-09-01). "Data Scaling: Implications for Climate Action and Governance in the UK" (in en). Environmental Management 74 (3): 414–424. doi:10.1007/s00267-024-01991-5. ISSN 1432-1009. PMID 38811434. Bibcode: 2024EnMan.tmp...83S.
Sources
- Arbia, Giuseppe (1988). Spatial data configuration in then statistical analysis of regional economic and related problems. Dordrecht: Kluwer Academic Publishers.
- 50px This article contains quotations from Modifiable areal unit problem at the GIS Wiki, which is available under the Creative Commons Attribution 3.0 Unported (CC BY 3.0) license.
- Gehlke, C. E.; Biehl, Katherine (March 1934). "Certain effects of grouping upon the size of the correlation coefficient in census tract material". Journal of the American Statistical Association 29 (185A): 169–170. doi:10.2307/2277827.
- Openshaw, Stan (1984). The modifiable areal unit problem. Norwick: Geo Books. ISBN 0860941345. OCLC 12052482.
- Unwin, D. J. (1996). "GIS, spatial analysis and spatial statistics." Progress in Human Geography. 20: 540–551.
- Cressie, N. (1996). “Change of Support and the Modifiable Areal Unit Problem.” “Geographical Systems“, 3:159–180.
- Viegas, J., E.A. Silva, L. Martinez (2009a). “Effects of the Modifiable Areal Unit Problem on the Delineation of Traffic Analysis Zones” “Environment and Planning B – Planning and Design“, 36(4): 625–643.
- Viegas, J., E.A. Silva, L. Martinez (2009a). “A traffic analysis zone definition: a new methodology and algorithm” “Transportation“. 36 (5): 6“, 36 (5): 6 .
Further reading
- Cressie, Noel A (1996). "Change of support and the modifiable areal unit problem". Geographical Systems 3 (2–3): 159–180.
- Holt, David; Steel, David; Tranmer, Mark; Wrigley, Neil (July 1996). "Aggregation and ecological effects in geographically based data". Geographical Analysis 28 (3): 244–261. doi:10.1111/j.1538-4632.1996.tb00933.x. Bibcode: 1996GeoAn..28..244H. https://www.researchgate.net/publication/229876830.
- Horner, Mark W.; Murray, Alan T. (January 2002). "Excess commuting and the modifiable areal unit problem". Urban Studies 39 (1): 131–139. doi:10.1080/00420980220099113. Bibcode: 2002UrbSt..39..131H. http://sustainability.water.ca.gov/documents/3380372/3384417/Excess+Commuting+and+the+Modifiable+Areal+Unit+Problem.pdf. Retrieved 2015-07-05.
- Kwan, Mei-Po (2012). "The uncertain geographic context problem". Annals of the Association of American Geographers 102 (5): 958–968. doi:10.1080/00045608.2012.687349. Bibcode: 2012AAAG..102..958K. http://meipokwan.org/Paper/Kwan_UGCoP_2012.pdf.
- Menon, Carlo (March 2012). "The bright side of MAUP: defining new measures of industrial agglomeration". Papers in Regional Science 91 (1): 3–28. doi:10.1111/j.1435-5957.2011.00350.x. Bibcode: 2012PRegS..91....3M. https://core.ac.uk/download/pdf/6234148.pdf.
- Unwin, David J (December 1996). "GIS, spatial analysis and spatial statistics". Progress in Human Geography 20 (4): 540–551. doi:10.1177/030913259602000408. https://www.researchgate.net/publication/237238258.
- Wong, David (2009). "The modifiable areal unit problem (MAUP)". in Fotheringham, A Stewart; Rogerson, Peter. The SAGE handbook of spatial analysis. Los Angeles: Sage. pp. 105–124. ISBN 9781412910828. OCLC 85898184. https://books.google.com/books?id=phEgXfbCU_YC&pg=PA105.
- Wrigley, Neil (1995). "Revisiting the modifiable areal unit problem and the ecological fallacy". in Cliff, Andrew D. Diffusing geography: essays for Peter Haggett. The Institute of British Geographers special publications series. 31. Oxford; Cambridge, Massachusetts: Blackwell. pp. 123–181. ISBN 0631195343. OCLC 30895028.
- Zhang, Ming; Kukadia, Nishant (January 2005). "Metrics of urban form and the modifiable areal unit problem". Transportation Research Record: Journal of the Transportation Research Board 1902: 71–79. doi:10.3141/1902-09.
