Hopkins statistic

The Hopkins statistic (introduced by Brian Hopkins and John Gordon Skellam) is a way of measuring the cluster tendency of a data set.^[1] It belongs to the family of sparse sampling tests. It acts as a statistical hypothesis test where the null hypothesis is that the data is generated by a Poisson point process and are thus uniformly randomly distributed.^[2] If individuals are aggregated, then its value approaches 1, and if they are randomly distributed along the value tends to 0.5.^[3]

Preliminaries

A typical formulation of the Hopkins statistic follows.^[2]

Let

X

be the set of

n

data points.

Generate a random sample

\tilde{X}

of

m ≪ n

data points sampled without replacement from

X

.

Generate a set

Y

of

m

uniformly randomly distributed data points.

Define two distance measures,

u_{i},

the minimum distance (given some suitable metric) of

y_{i} \in Y

to its nearest neighbour in

X

, and

w_{i},

the minimum distance of

{\tilde{x}}_{i} \in \tilde{X} \subseteq X

to its nearest neighbour

x_{j} \in X, \tilde{x_{i}} \neq x_{j} .

Definition

With the above notation, if the data is $d$ dimensional, then the Hopkins statistic is defined as:^[4]

$H = \frac{\sum_{i = 1}^{m} u_{i}^{d}}{\sum_{i = 1}^{m} u_{i}^{d} + \sum_{i = 1}^{m} w_{i}^{d}}$

Under the null hypotheses, this statistic has a Beta(m,m) distribution.

Notes and references

↑ Hopkins, Brian; Skellam, J.G. (1954). "A new method for determining the type of distribution of plant individuals". Annals of Botany (Annals Botany Co) 18 (2): 213–227. doi:10.1093/oxfordjournals.aob.a083391.
↑ ^2.0 ^2.1 Banerjee, A. (2004). "Validating clusters using the Hopkins statistic". 2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542). 1. pp. 149–153. doi:10.1109/FUZZY.2004.1375706. ISBN 0-7803-8353-2.
↑ Aggarwal, Charu C. (2015) (in en). Data Mining. Cham: Springer International Publishing. pp. 158. doi:10.1007/978-3-319-14142-8. ISBN 978-3-319-14141-1. https://link.springer.com/10.1007/978-3-319-14142-8.
↑ Cross, G.R.; Jain, A.K. (1982). "MEASUREMENT OF CLUSTERING TENDENCY**Research supported in part by NSF Grant ECS-8007106". Measurement of clustering tendency. 315–320. doi:10.1016/B978-0-08-027618-2.50054-1. ISBN 978-0-08-027618-2.

External links

https://www.sthda.com/english/wiki/assessing-clustering-tendency-a-vital-issue-unsupervised-machine-learning

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Hopkins statistic. Read more

[1] Hopkins, Brian; Skellam, J.G. (1954). "A new method for determining the type of distribution of plant individuals". Annals of Botany (Annals Botany Co) 18 (2): 213–227. doi:10.1093/oxfordjournals.aob.a083391.

[banerjee04-2] 2.0 ^2.1 Banerjee, A. (2004). "Validating clusters using the Hopkins statistic". 2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542). 1. pp. 149–153. doi:10.1109/FUZZY.2004.1375706. ISBN 0-7803-8353-2.

[3] Aggarwal, Charu C. (2015) (in en). Data Mining. Cham: Springer International Publishing. pp. 158. doi:10.1007/978-3-319-14142-8. ISBN 978-3-319-14141-1. https://link.springer.com/10.1007/978-3-319-14142-8.

[4] Cross, G.R.; Jain, A.K. (1982). "MEASUREMENT OF CLUSTERING TENDENCY**Research supported in part by NSF Grant ECS-8007106". Measurement of clustering tendency. 315–320. doi:10.1016/B978-0-08-027618-2.50054-1. ISBN 978-0-08-027618-2.

[1]

[2]

[3]

[4]

v t e Machine learning evaluation metrics
Regression	MSE MAE sMAPE MAPE MASE MSPE RMS RMSE/RMSD R² MDA MAD
Classification	F-score P4 Accuracy Precision Recall Kappa MCC AUC ROC Sensitivity and specificity Logarithmic loss
Clustering	Silhouette Calinski–Harabasz index Davies–Bouldin index Dunn index Hopkins statistic Jaccard index Rand index Similarity measure SMC DBCV index
Ranking	MRR NDCG AP
Computer vision	PSNR SSIM IoU
NLP	Perplexity BLEU MAUVE
Deep learning	Inception score FID
Recommender system	Coverage Intra-list similarity
Similarity	Cosine similarity Euclidean distance Pearson correlation coefficient
Confusion matrix

Anonymous

Search

Hopkins statistic

Namespaces

More

Page actions

Contents

Preliminaries

Definition

Notes and references

External links

Navigation

Navigation

Resources

Help

googletranslator

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Hopkins statistic

Preliminaries

Definition

Notes and references

External links

Navigation

Wiki tools

Page tools

Other projects

Categories