Anomaly detection

From HandWiki
Jump to: navigation, search

In various domains such as, but not limited to, statistics, signal processing, finance, econometrics, manufacturing, networking[disambiguation needed] and data mining, anomaly detection (also outlier detection[1]) is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.[1] Typically the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problems or errors in a text. Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions.[2]

In particular, in the context of abuse and network intrusion detection, the interesting objects are often not rare objects, but unexpected bursts in activity. This pattern does not adhere to the common statistical definition of an outlier as a rare object, and many outlier detection methods (in particular unsupervised methods) will fail on such data, unless it has been aggregated appropriately. Instead, a cluster analysis algorithm may be able to detect the micro clusters formed by these patterns.[3]

Three broad categories of anomaly detection techniques exist.[4] Unsupervised anomaly detection techniques detect anomalies in an unlabeled test data set under the assumption that the majority of the instances in the data set are normal by looking for instances that seem to fit least to the remainder of the data set. Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier (the key difference to many other statistical classification problems is the inherent unbalanced nature of outlier detection). Semi-supervised anomaly detection techniques construct a model representing normal behavior from a given normal training data set, and then test the likelihood of a test instance to be generated by the learnt model.


Anomaly detection is applicable in a variety of domains, such as intrusion detection, fraud detection, fault detection, system health monitoring, event detection in sensor networks, and detecting ecosystem disturbances. It is often used in preprocessing to remove anomalous data from the dataset. In supervised learning, removing the anomalous data from the dataset often results in a statistically significant increase in accuracy.[5][6]

Popular techniques

Several anomaly detection techniques have been proposed in literature.[7] Some of the popular techniques are:

The performance of different methods depends a lot on the data set and parameters, and methods have little systematic advantages over another when compared across many data sets and parameters.[31][32]

Application to data security

Anomaly detection was proposed for intrusion detection systems (IDS) by Dorothy Denning in 1986.[33] Anomaly detection for IDS is normally accomplished with thresholds and statistics, but can also be done with soft computing, and inductive learning.[34] Types of statistics proposed by 1999 included profiles of users, workstations, networks, remote hosts, groups of users, and programs based on frequencies, means, variances, covariances, and standard deviations.[35] The counterpart of anomaly detection in intrusion detection is misuse detection.


  • ELKI is an open-source Java data mining toolkit that contains several anomaly detection algorithms, as well as index acceleration for them.


See also


  1. 1.0 1.1 Zimek, Arthur; Schubert, Erich (2017), Outlier Detection, Springer New York, pp. 1–5, doi:10.1007/978-1-4899-7993-3_80719-1, ISBN 9781489979933 
  2. Hodge, V. J.; Austin, J. (2004). "A Survey of Outlier Detection Methodologies". Artificial Intelligence Review 22 (2): 85–126. doi:10.1007/s10462-004-4304-y. 
  3. Dokas, Paul; Ertoz, Levent; Kumar, Vipin; Lazarevic, Aleksandar; Srivastava, Jaideep; Tan, Pang-Ning (2002). "Data mining for network intrusion detection". Proceedings NSF Workshop on Next Generation Data Mining. 
  4. Chandola, V.; Banerjee, A.; Kumar, V. (2009). "Anomaly detection: A survey". ACM Computing Surveys 41 (3): 1–58. doi:10.1145/1541880.1541882. 
  5. Tomek, Ivan (1976). "An Experiment with the Edited Nearest-Neighbor Rule". IEEE Transactions on Systems, Man, and Cybernetics 6 (6): 448–452. doi:10.1109/TSMC.1976.4309523. 
  6. Smith, M. R.; Martinez, T. (2011). "Improving classification accuracy by identifying and removing instances that should be misclassified". The 2011 International Joint Conference on Neural Networks. pp. 2690. doi:10.1109/IJCNN.2011.6033571. ISBN 978-1-4244-9635-8. 
  7. Zimek, Arthur; Filzmoser, Peter (2018). "There and back again: Outlier detection between statistical reasoning and data mining algorithms". Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8 (6): e1280. doi:10.1002/widm.1280. ISSN 1942-4787. 
  8. Knorr, E. M.; Ng, R. T.; Tucakov, V. (2000). "Distance-based outliers: Algorithms and applications". The VLDB Journal the International Journal on Very Large Data Bases 8 (3–4): 237–253. doi:10.1007/s007780050006. 
  9. Ramaswamy, S.; Rastogi, R.; Shim, K. (2000). "Efficient algorithms for mining outliers from large data sets". Proceedings of the 2000 ACM SIGMOD international conference on Management of data – SIGMOD '00. pp. 427. doi:10.1145/342009.335437. ISBN 1-58113-217-4. 
  10. Angiulli, F.; Pizzuti, C. (2002). "Fast Outlier Detection in High Dimensional Spaces". Principles of Data Mining and Knowledge Discovery. 2431. pp. 15. doi:10.1007/3-540-45681-3_2. ISBN 978-3-540-44037-6. 
  11. Breunig, M. M.; Kriegel, H.-P.; Ng, R. T.; Sander, J. (2000). "LOF: Identifying Density-based Local Outliers". pp. 93–104. doi:10.1145/335191.335388. ISBN 1-58113-217-4. 
  12. Liu, Fei Tony; Ting, Kai Ming; Zhou, Zhi-Hua (December 2008) (in English). Isolation Forest. 413–422. doi:10.1109/ICDM.2008.17. ISBN 9780769535029. 
  13. Liu, Fei Tony; Ting, Kai Ming; Zhou, Zhi-Hua (March 2012). "Isolation-Based Anomaly Detection" (in English). ACM Transactions on Knowledge Discovery from Data 6 (1): 1–39. doi:10.1145/2133360.2133363. 
  14. Schubert, E.; Zimek, A.; Kriegel, H. -P. (2012). "Local outlier detection reconsidered: A generalized view on locality with applications to spatial, video, and network outlier detection". Data Mining and Knowledge Discovery 28: 190–237. doi:10.1007/s10618-012-0300-z. 
  15. Kriegel, H. P.; Kröger, P.; Schubert, E.; Zimek, A. (2009). "Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data". Advances in Knowledge Discovery and Data Mining. 5476. pp. 831. doi:10.1007/978-3-642-01307-2_86. ISBN 978-3-642-01306-5. 
  16. Kriegel, H. P.; Kroger, P.; Schubert, E.; Zimek, A. (2012). "Outlier Detection in Arbitrarily Oriented Subspaces". 2012 IEEE 12th International Conference on Data Mining. pp. 379. doi:10.1109/ICDM.2012.21. ISBN 978-1-4673-4649-8. 
  17. Fanaee-T, H.; Gama, J. (2016). "Tensor-based anomaly detection: An interdisciplinary survey". Knowledge-Based Systems 98: 130–147. doi:10.1016/j.knosys.2016.01.027. 
  18. Zimek, A.; Schubert, E.; Kriegel, H.-P. (2012). "A survey on unsupervised outlier detection in high-dimensional numerical data". Statistical Analysis and Data Mining 5 (5): 363–387. doi:10.1002/sam.11161. 
  19. Schölkopf, B.; Platt, J. C.; Shawe-Taylor, J.; Smola, A. J.; Williamson, R. C. (2001). "Estimating the Support of a High-Dimensional Distribution". Neural Computation 13 (7): 1443–71. doi:10.1162/089976601750264965. PMID 11440593. 
  20. 20.0 20.1 20.2 Hawkins, Simon; He, Hongxing; Williams, Graham; Baxter, Rohan (2002). "Outlier Detection Using Replicator Neural Networks". Data Warehousing and Knowledge Discovery. Lecture Notes in Computer Science. 2454. pp. 170–180. doi:10.1007/3-540-46145-0_17. ISBN 978-3-540-44123-6. 
  21. J. An and S. Cho, "Variational autoencoder based anomaly detection using reconstruction probability", 2015.
  22. Malhotra, Pankaj; Vig, Lovekesh; Shroff, Gautman; Agarwal, Puneet (22–24 April 2015). "Long Short Term Memory Networks for Anomaly Detection in Time Series" (in English). European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium). 
  23. He, Z.; Xu, X.; Deng, S. (2003). "Discovering cluster-based local outliers". Pattern Recognition Letters 24 (9–10): 1641–1650. doi:10.1016/S0167-8655(03)00003-5. 
  24. Campello, R. J. G. B.; Moulavi, D. (2015). "Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection". ACM Transactions on Knowledge Discovery from Data 10 (1): 5:1–51. doi:10.1145/2733381. 
  25. Lazarevic, A.; Kumar, V. (2005). Feature bagging for outlier detection. 157–166. doi:10.1145/1081870.1081891. ISBN 978-1-59593-135-1. 
  26. Nguyen, H. V.; Ang, H. H.; Gopalkrishnan, V. (2010). "Mining Outliers with Ensemble of Heterogeneous Detectors on Random Subspaces". Database Systems for Advanced Applications. 5981. pp. 368. doi:10.1007/978-3-642-12026-8_29. ISBN 978-3-642-12025-1. 
  27. Kriegel, H. P.; Kröger, P.; Schubert, E.; Zimek, A. (2011). "Interpreting and Unifying Outlier Scores". Proceedings of the 2011 SIAM International Conference on Data Mining. pp. 13–24. doi:10.1137/1.9781611972818.2. ISBN 978-0-89871-992-5. 
  28. Schubert, E.; Wojdanowski, R.; Zimek, A.; Kriegel, H. P. (2012). "On Evaluation of Outlier Rankings and Outlier Scores". Proceedings of the 2012 SIAM International Conference on Data Mining. pp. 1047–1058. doi:10.1137/1.9781611972825.90. ISBN 978-1-61197-232-0. 
  29. Zimek, A.; Campello, R. J. G. B.; Sander, J. R. (2014). "Ensembles for unsupervised outlier detection". ACM SIGKDD Explorations Newsletter 15: 11–22. doi:10.1145/2594473.2594476. 
  30. Zimek, A.; Campello, R. J. G. B.; Sander, J. R. (2014). "Data perturbation for outlier detection ensembles". Proceedings of the 26th International Conference on Scientific and Statistical Database Management – SSDBM '14. pp. 1. doi:10.1145/2618243.2618257. ISBN 978-1-4503-2722-0. 
  31. Campos, Guilherme O.; Zimek, Arthur; Sander, Jörg; Campello, Ricardo J. G. B.; Micenková, Barbora; Schubert, Erich; Assent, Ira; Houle, Michael E. (2016). "On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study". Data Mining and Knowledge Discovery 30 (4): 891. doi:10.1007/s10618-015-0444-8. ISSN 1384-5810. 
  32. Anomaly detection benchmark data repository of the Ludwig-Maximilians-Universität München; Mirror at University of São Paulo.
  33. Denning, D. E. (1987). "An Intrusion-Detection Model". IEEE Transactions on Software Engineering SE-13 (2): 222–232. doi:10.1109/TSE.1987.232894. 
  34. Teng, H. S.; Chen, K.; Lu, S. C. (1990). Adaptive real-time anomaly detection using inductively generated sequential patterns. 278–284. doi:10.1109/RISP.1990.63857. ISBN 978-0-8186-2060-7. 
  35. Jones, Anita K.; Sielken, Robert S. (1999). "Computer System Intrusion Detection: A Survey". Technical Report, Department of Computer Science, University of Virginia, Charlottesville, VA. 

Hostmonster hHosting Tier.Net hosting HandWiki ads