Scan statistic

From HandWiki

In statistics, a scan statistic or window statistic is a problem relating to the clustering of randomly positioned points. An example of a typical problem is the maximum size of a cluster of points on a line or the longest series of successes recorded by a moving window of fixed length.[1] Joseph Naus first published on the problem in the 1960s,[2] and has been called the "father of the scan statistic" in honour of his early contributions.[3] The results can be applied in epidemiology, public health and astronomy to find unusual clusters of events.[4]

It was extended by Martin Kulldorff to multidimensional settings and varying window sizes in a 1997 paper,[5] which is ((As of October 2015)) the most cited article in its journal, Communications in Statistics – Theory and Methods.[6] This work lead to the creation of the software SaTScan, a program trademarked by Martin Kulldorff that applies his methods to data.

Recent results have shown that using scale-dependent critical values for the scan statistic allows to attain asymptotically optimal detection simultaneously for all signal lengths, thereby improving on the traditional scan, but this procedure has been criticized for losing too much power for short signals. Walther and Perry (2022) considered the problem of detecting an elevated mean on an interval with unknown location and length in the univariate Gaussian sequence model.[7] They explain this discrepancy by showing that these asymptotic optimality results will necessarily be too imprecise to discern the performance of scan statistics in a practically relevant way, even in a large sample context. Instead, they propose to assess the performance with a new finite sample criterion. They presented three new calibration techniques for scan statistics that perform well across a range of relevant signal lengths to optimally increase performance of short signals.

The scan-statistic-based methods have been specifically developed to detect rare variant associations in the noncoding genome, especially for the intergenic region. Compared with fixed-size sliding window analysis, scan-statistic-based methods use data-adaptive size dynamic window to scan the genome continuously, and increase the analysis power by flexibly selecting the locations and sizes of the signal regions.[8] Some examples of these methods are Q-SCAN,[9] SCANG,[10] WGScan.[11]


References

  1. Naus, J. I. (1982). "Approximations for Distributions of Scan Statistics". Journal of the American Statistical Association 77 (377): 177–183. doi:10.1080/01621459.1982.10477783. 
  2. Naus, Joseph Irwin (1964). Clustering of random points in line and plane (Ph. D.). Retrieved 6 January 2014.
  3. Wallenstein, S. (2009). "Joseph Naus: Father of the Scan Statistic". Scan Statistics. pp. 1–25. doi:10.1007/978-0-8176-4749-0_1. ISBN 978-0-8176-4748-3. 
  4. Glaz, J.; Naus, J.; Wallenstein, S. (2001). "Introduction". Scan Statistics. Springer Series in Statistics. pp. 3–9. doi:10.1007/978-1-4757-3460-7_1. ISBN 978-1-4419-3167-2. 
  5. Kulldorff, Martin (1997). "A spatial scan statistic". Communications in Statistics – Theory and Methods 26 (6): 1481–1496. doi:10.1080/03610929708831995. http://www.satscan.org/papers/k-cstm1997.pdf. 
  6. "Most Cited Articles". Communications in Statistics – Theory and Methods. http://www.tandfonline.com/action/showMostCitedArticles?journalCode=lsta20&. Retrieved 11 October 2015. 
  7. Walther, Guenther; Perry, Andrew (November 2022). "Calibrating the scan statistic: Finite sample performance versus asymptotics" (in en). Journal of the Royal Statistical Society: Series B (Statistical Methodology) 84 (5): 1608–1639. doi:10.1111/rssb.12549. ISSN 1369-7412. https://onlinelibrary.wiley.com/doi/10.1111/rssb.12549. 
  8. Li, Zilin; Li, Xihao; Zhou, Hufeng; Gaynor, Sheila M.; Margaret, Sunitha Selvaraj; Arapoglou, Theodore; Qiuck, Corbin; Liu, Yaowu et al. (2022). "A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies". Nature Methods 19 (12): 1599–1611. doi:10.1038/s41592-022-01640-x. PMID 36303018. 
  9. Li, Zilin; Liu, Yaowu; Lin, Xihong (2022). "Simultaneous Detection of Signal Regions Using Quadratic Scan Statistics With Applications to Whole Genome Association Studies". Journal of the American Statistical Association 117 (538): 823–834. doi:10.1080/01621459.2020.1822849. PMID 35845434. 
  10. Li, Zilin; Li, Xihao; Liu, Yaowu; Shen, Jincheng; Chen, Han; Zhou, Hufeng; Morrison, Alanna C.; Boerwinkle, Eric et al. (2019). "Dynamic Scan Procedure for Detecting Rare-Variant Association Regions in Whole-Genome Sequencing Studies". American Journal of Human Genetics 104 (5): 802–814. doi:10.1016/j.ajhg.2019.03.002. PMID 30982610. 
  11. He, Zihuai; Xu, Bin; Buxbaum, Joseph; Ionita-Laza, Iuliana (2019). "A genome-wide scan statistic framework for whole-genome sequence data analysis". Nature Communications 10 (1): 3018. doi:10.1038/s41467-019-11023-0. PMID 31289270. 

External links

  • SaTScan free software for the spatial, temporal and space-time scan statistics