Biweight midcorrelation

From HandWiki
Revision as of 21:39, 6 February 2024 by AstroAI (talk | contribs) (over-write)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

In statistics, biweight midcorrelation (also called bicor) is a measure of similarity between samples. It is median-based, rather than mean-based, thus is less sensitive to outliers, and can be a robust alternative to other similarity metrics, such as Pearson correlation or mutual information.[1]

Derivation

Here we find the biweight midcorrelation of two vectors [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math], with [math]\displaystyle{ i=1,2, \ldots,m }[/math] items, representing each item in the vector as [math]\displaystyle{ x_1, x_2, \ldots, x_m }[/math] and [math]\displaystyle{ y_1, y_2, \ldots, y_m }[/math]. First, we define [math]\displaystyle{ \operatorname{med}(x) }[/math] as the median of a vector [math]\displaystyle{ x }[/math] and [math]\displaystyle{ \operatorname{mad}(x) }[/math] as the median absolute deviation (MAD), then define [math]\displaystyle{ u_i }[/math] and [math]\displaystyle{ v_i }[/math] as,

[math]\displaystyle{ \begin{align} u_i &= \frac{x_i - \operatorname{med}(x)}{9 \operatorname{mad}(x)},\\ v_i &= \frac{y_i - \operatorname{med}(y)}{9 \operatorname{mad}(y)}. \end{align} }[/math]

Now we define the weights [math]\displaystyle{ w_i^{(x)} }[/math] and [math]\displaystyle{ w_i^{(y)} }[/math] as,

[math]\displaystyle{ \begin{align} w_i^{(x)} &= \left(1-u_i^2\right)^2 I\left(1-|u_i|\right)\\ w_i^{(y)} &= \left(1-v_i^2\right)^2 I\left(1-|v_i|\right) \end{align} }[/math]

where [math]\displaystyle{ I }[/math] is the identity function where,

[math]\displaystyle{ I(x) = \begin{cases}1, & \text{if } x \gt 0\\ 0, & \text{otherwise}\end{cases} }[/math]

Then we normalize so that the sum of the weights is 1:

[math]\displaystyle{ \begin{align} \tilde{x}_i &= \frac{\left(x_i - \operatorname{med}(x)\right) w_i^{(x)}}{\sqrt{\sum_{j=1}^m \left[(x_j -\operatorname{med}(x)) w_j^{(x)}\right]^2}}\\ \tilde{y}_i &= \frac{\left(y_i - \operatorname{med}(y)\right) w_i^{(y)}}{\sqrt{\sum_{j=1}^m \left[(y_j -\operatorname{med}(y)) w_j^{(y)}\right]^2}}. \end{align} }[/math]

Finally, we define biweight midcorrelation as,

[math]\displaystyle{ \mathrm{bicor}\left(x, y\right) = \sum_{i=1}^m \tilde{x}_i \tilde{y}_i }[/math]

Applications

Biweight midcorrelation has been shown to be more robust in evaluating similarity in gene expression networks,[2] and is often used for weighted correlation network analysis.

Implementations

Biweight midcorrelation has been implemented in the R statistical programming language as the function bicor as part of the WGCNA package[3]

Also implemented in the Raku programming language as the function bi_cor_coef as part of the Statistics module.[4]

References

  1. Wilcox, Rand (January 12, 2012). Introduction to Robust Estimation and Hypothesis Testing (3rd ed.). Academic Press. p. 455. ISBN 978-0123869838. 
  2. Song, Lin (9 December 2012). "Comparison of co-expression measures: mutual information, correlation, and model based indices". BMC Bioinformatics 13 (328): 328. doi:10.1186/1471-2105-13-328. PMID 23217028. 
  3. Langfelder, Peter. "WGCNA: Weighted Correlation Network Analysis (an R package)". https://cran.r-project.org/package=WGCNA. Retrieved 2018-04-06. 
  4. Khanal, Suman. "Statistics: Raku module for doing statistics". https://github.com/sumanstats/Statistics. Retrieved 2022-03-11.