Wilcoxon signedrank test
The Wilcoxon signedrank test is a nonparametric statistical hypothesis test used either to test the location of a population based on a sample of data, or to compare the locations of two populations using two matched samples.^{[1]} The onesample version serves a purpose similar to that of the onesample Student's ttest.^{[2]} For two matched samples, it is a paired difference test like the paired Student's ttest (also known as the "ttest for matched pairs" or "ttest for dependent samples"). The Wilcoxon test can be a good alternative to the ttest when population means are not of interest; for example, when one wishes to test whether a population's median is nonzero, or whether there is a better than 50% chance that a sample from one population is greater than a sample from another population.
History
The test is named after Frank Wilcoxon (1892–1965) who, in a single paper, proposed both it and the ranksum test for two independent samples.^{[3]} The test was popularized by Sidney Siegel (1956) in his influential textbook on nonparametric statistics.^{[4]} Siegel used the symbol T for the test statistic, and consequently, the test is sometimes referred to as the Wilcoxon Ttest.
Test procedure
There are two variants of the signedrank test. From a theoretical point of view, the onesample test is more fundamental because the paired sample test is performed by converting the data to the situation of the onesample test. However, most practical applications of the signedrank test arise from paired data.
For a paired sample test, the data consists of samples [math]\displaystyle{ (X_1, Y_1), \dots, (X_n, Y_n) }[/math]. Each sample is a pair of measurements. In the simplest case, the measurements are on an interval scale. Then they may be converted to real numbers, and the paired sample test is converted to a onesample test by replacing each pair of numbers [math]\displaystyle{ (X_i, Y_i) }[/math] by its difference [math]\displaystyle{ X_i  Y_i }[/math].^{[5]} In general, it must be possible to rank the differences between the pairs. This requires that the data be on an ordered metric scale, a type of scale that carries more information than an ordinal scale but may have less than an interval scale.^{[6]}
The data for a onesample test is a set of real number samples [math]\displaystyle{ X_1, \dots, X_n }[/math]. Assume for simplicity that the samples have distinct absolute values and that no sample equals zero. (Zeros and ties introduce several complications; see below.) The test is performed as follows:^{[7]}^{[8]}
 Compute [math]\displaystyle{ X_1, \dots, X_n }[/math].
 Sort [math]\displaystyle{ X_1, \dots, X_n }[/math], and use this sorted list to assign ranks [math]\displaystyle{ R_1, \dots, R_n }[/math]: The rank of the smallest observation is one, the rank of the next smallest is two, and so on.
 Let [math]\displaystyle{ \sgn }[/math] denote the sign function: [math]\displaystyle{ \sgn(x) = 1 }[/math] if [math]\displaystyle{ x \gt 0 }[/math] and [math]\displaystyle{ \sgn(x) = 1 }[/math] if [math]\displaystyle{ x \lt 0 }[/math]. The test statistic is the signedrank sum [math]\displaystyle{ T }[/math]: [math]\displaystyle{ T = \sum_{i=1}^N \sgn(X_i)R_i. }[/math]
 Produce a [math]\displaystyle{ p }[/math]value by comparing [math]\displaystyle{ T }[/math] to its distribution under the null hypothesis.
The ranks are defined so that [math]\displaystyle{ R_i }[/math] is the number of [math]\displaystyle{ j }[/math] for which [math]\displaystyle{ X_j \le X_i }[/math]. Additionally, if [math]\displaystyle{ \sigma \colon \{1, \dots, n\} \to \{1, \dots, n\} }[/math] is such that [math]\displaystyle{ X_{\sigma(1)} \lt \dots \lt X_{\sigma(n)} }[/math], then [math]\displaystyle{ R_{\sigma(i)} = i }[/math] for all [math]\displaystyle{ i }[/math].
The signedrank sum [math]\displaystyle{ T }[/math] is closely related to two other test statistics. The positiverank sum [math]\displaystyle{ T^+ }[/math] and the negativerank sum [math]\displaystyle{ T^ }[/math] are defined by^{[9]} [math]\displaystyle{ \begin{align} T^+ &= \sum_{1 \le i \le n,\ X_i \gt 0} R_i, \\ T^ &= \sum_{1 \le i \le n,\ X_i \lt 0} R_i. \end{align} }[/math] Because [math]\displaystyle{ T^+ + T^ }[/math] equals the sum of all the ranks, which is [math]\displaystyle{ 1 + 2 + \dots + n = n(n + 1)/2 }[/math], these three statistics are related by:^{[10]} [math]\displaystyle{ \begin{align} T^+ &= \frac{n(n + 1)}{2}  T^ = \frac{n(n + 1)}{4} + \frac{T}{2}, \\ T^ &= \frac{n(n + 1)}{2}  T^+ = \frac{n(n + 1)}{4}  \frac{T}{2}, \\ T &= T^+  T^ = 2T^+  \frac{n(n + 1)}{2} = \frac{n(n + 1)}{2}  2T^. \end{align} }[/math] Because [math]\displaystyle{ T }[/math], [math]\displaystyle{ T^+ }[/math], and [math]\displaystyle{ T^ }[/math] carry the same information, any of them may be used as the test statistic.
The positiverank sum and negativerank sum have alternative interpretations that are useful for the theory behind the test. Define the Walsh average [math]\displaystyle{ W_{ij} }[/math] to be [math]\displaystyle{ \tfrac12(X_i + X_j) }[/math]. Then:^{[11]} [math]\displaystyle{ \begin{align} T^+ = \#\{W_{ij} \gt 0 \colon 1 \le i \le j \le n\}, \\ T^ = \#\{W_{ij} \lt 0 \colon 1 \le i \le j \le n\}. \end{align} }[/math]
Null and alternative hypotheses
Onesample test
The onesample Wilcoxon signedrank test can be used to test whether data comes from a symmetric population with a specified median.^{[12]} If the population median is known, then it can be used to test whether data is symmetric about its center.^{[13]}
To explain the null and alternative hypotheses formally, assume that the data consists of independent and identically distributed samples from a distribution [math]\displaystyle{ F }[/math]. If [math]\displaystyle{ X_1 }[/math] and [math]\displaystyle{ X_2 }[/math] are IID [math]\displaystyle{ F }[/math]distributed random variables, define [math]\displaystyle{ F^{(2)} }[/math] to be the cumulative distribution function of [math]\displaystyle{ \tfrac12(X_1 + X_2) }[/math]. Set [math]\displaystyle{ p_2 = \Pr(\tfrac12(X_1 + X_2) \gt 0) = 1  F^{(2)}(0). }[/math] Assume that [math]\displaystyle{ F }[/math] is continuous. The onesample Wilcoxon signedrank sum test is a test for the following null hypothesis against one of the following alternative hypotheses:^{[14]}
 Null hypothesis H_{0}
 [math]\displaystyle{ p_2 = \tfrac12 }[/math]
 Onesided alternative hypothesis H_{1}
 [math]\displaystyle{ p_2 \gt \tfrac12 }[/math].
 Onesided alternative hypothesis H_{2}
 [math]\displaystyle{ p_2 \lt \tfrac12 }[/math].
 Twosided alternative hypothesis H_{3}
 [math]\displaystyle{ p_2 \neq \tfrac12 }[/math].
The alternative hypothesis being tested depends on whether the test statistic is used to compute a onesided or twosided pvalue (and if onesided, which side). If [math]\displaystyle{ \mu }[/math] is a fixed, predetermined quantity, then the test can also be used as a test for the value of [math]\displaystyle{ \Pr(\tfrac12(X_1 + X_2) \gt \mu) }[/math] by subtracting [math]\displaystyle{ \mu }[/math] from every data point.
The above null and alternative hypotheses are derived from the fact that [math]\displaystyle{ 2T^+ / n^2 }[/math] is a consistent estimator of [math]\displaystyle{ p_2 }[/math].^{[15]} It can also be derived from the description of [math]\displaystyle{ T^+ }[/math] and [math]\displaystyle{ T^ }[/math] in terms of Walsh averages, since that description shows that the Wilcoxon test is the same as the sign test applied to the set of Walsh averages.^{[16]}
Restricting the distributions of interest can lead to more interpretable null and alternative hypotheses. One mildly restrictive assumption is that [math]\displaystyle{ F^{(2)} }[/math] has a unique median. This median is called the pseudomedian of [math]\displaystyle{ F }[/math]; in general it is different from the mean and the median, even when all three exist. If the existence of a unique pseudomedian can be assumed true under both the null and alternative hypotheses, then these hypotheses can be restated as:
 Null hypothesis H_{0}
 The pseudomedian of [math]\displaystyle{ F }[/math] is located at zero.
 Onesided alternative hypothesis H_{1}
 The pseudomedian of [math]\displaystyle{ F }[/math] is located at [math]\displaystyle{ \mu \lt 0 }[/math].
 Onesided alternative hypothesis H_{2}
 The pseudomedian of [math]\displaystyle{ F }[/math] is located at [math]\displaystyle{ \mu \gt 0 }[/math].
 Twosided alternative hypothesis H_{3}
 The pseudomedian of [math]\displaystyle{ F }[/math] is located at [math]\displaystyle{ \mu \neq 0 }[/math].
Most often, the null and alternative hypotheses are stated under the assumption of symmetry. Fix a real number [math]\displaystyle{ \mu }[/math]. Define [math]\displaystyle{ F }[/math] to be symmetric about [math]\displaystyle{ \mu }[/math] if a random variable [math]\displaystyle{ X }[/math] with distribution [math]\displaystyle{ F }[/math] satisfies [math]\displaystyle{ \Pr(X \le \mu  x) = \Pr(X \ge \mu + x) }[/math] for all [math]\displaystyle{ x }[/math]. If [math]\displaystyle{ F }[/math] has a density function [math]\displaystyle{ f }[/math], then [math]\displaystyle{ F }[/math] is symmetric about [math]\displaystyle{ \mu }[/math] if and only if [math]\displaystyle{ f(\mu + x) = f(\mu  x) }[/math] for every [math]\displaystyle{ x }[/math].^{[17]}
If the null and alternative distributions of [math]\displaystyle{ F }[/math] can be assumed symmetric, then the null and alternative hypotheses simplify to the following:^{[18]}
 Null hypothesis H_{0}
 [math]\displaystyle{ F }[/math] is symmetric about [math]\displaystyle{ \mu = 0 }[/math].
 Onesided alternative hypothesis H_{1}
 [math]\displaystyle{ F }[/math] is symmetric about [math]\displaystyle{ \mu \lt 0 }[/math].
 Onesided alternative hypothesis H_{2}
 [math]\displaystyle{ F }[/math] is symmetric about [math]\displaystyle{ \mu \gt 0 }[/math].
 Twosided alternative hypothesis H_{3}
 [math]\displaystyle{ F }[/math] is symmetric about [math]\displaystyle{ \mu \neq 0 }[/math].
If in addition [math]\displaystyle{ \Pr(X = \mu) = 0 }[/math], then [math]\displaystyle{ \mu }[/math] is a median of [math]\displaystyle{ F }[/math]. If this median is unique, then the Wilcoxon signedrank sum test becomes a test for the location of the median.^{[19]} When the mean of [math]\displaystyle{ F }[/math] is defined, then the mean is [math]\displaystyle{ \mu }[/math], and the test is also a test for the location of the mean.^{[20]}
The restriction that the alternative distribution is symmetric is highly restrictive, but for onesided tests it can be weakened. Say that [math]\displaystyle{ F }[/math] is stochastically smaller than a distribution symmetric about zero if an [math]\displaystyle{ F }[/math]distributed random variable [math]\displaystyle{ X }[/math] satisfies [math]\displaystyle{ \Pr(X \lt x) \ge \Pr(X \gt x) }[/math] for all [math]\displaystyle{ x \ge 0 }[/math]. Similarly, [math]\displaystyle{ F }[/math] is stochastically larger than a distribution symmetric about zero if [math]\displaystyle{ \Pr(X \lt x) \le \Pr(X \gt x) }[/math] for all [math]\displaystyle{ x \ge 0 }[/math]. Then the Wilcoxon signedrank sum test can also be used for the following null and alternative hypotheses:^{[21]}^{[22]}
 Null hypothesis H_{0}
 [math]\displaystyle{ F }[/math] is symmetric about [math]\displaystyle{ \mu = 0 }[/math].
 Onesided alternative hypothesis H_{1}
 [math]\displaystyle{ F }[/math] is stochastically smaller than a distribution symmetric about zero.
 Onesided alternative hypothesis H_{2}
 [math]\displaystyle{ F }[/math] is stochastically larger than a distribution symmetric about zero.
The hypothesis that the data are IID can be weakened. Each data point may be taken from a different distribution, as long as all the distributions are assumed to be continuous and symmetric about a common point [math]\displaystyle{ \mu_0 }[/math]. The data points are not required to be independent as long as the conditional distribution of each observation given the others is symmetric about [math]\displaystyle{ \mu_0 }[/math].^{[23]}
Paired data test
Because the paired data test arises from taking paired differences, its null and alternative hypotheses can be derived from those of the onesample test. In each case, they become assertions about the behavior of the differences [math]\displaystyle{ X_i  Y_i }[/math].
Let [math]\displaystyle{ F(x, y) }[/math] be the joint cumulative distribution of the pairs [math]\displaystyle{ (X_i, Y_i) }[/math]. If [math]\displaystyle{ F }[/math] is continuous, then the most general null and alternative hypotheses are expressed in terms of [math]\displaystyle{ p_2 = \Pr(\tfrac12(X_i  Y_i + X_j  Y_j) \gt 0) }[/math] and are identical to the onesample case:
 Null hypothesis H_{0}
 [math]\displaystyle{ p_2 = \tfrac12 }[/math]
 Onesided alternative hypothesis H_{1}
 [math]\displaystyle{ p_2 \gt \tfrac12 }[/math].
 Onesided alternative hypothesis H_{2}
 [math]\displaystyle{ p_2 \lt \tfrac12 }[/math].
 Twosided alternative hypothesis H_{3}
 [math]\displaystyle{ p_2 \neq \tfrac12 }[/math].
Like the onesample case, under some restrictions the test can be interpreted as a test for whether the pseudomedian of the differences is located at zero.
A common restriction is to symmetric distributions of differences. In this case, the null and alternative hypotheses are:^{[24]}^{[25]}
 Null hypothesis H_{0}
 The observations [math]\displaystyle{ X_i  Y_i }[/math] are symmetric about [math]\displaystyle{ \mu = 0 }[/math].
 Onesided alternative hypothesis H_{1}
 The observations [math]\displaystyle{ X_i  Y_i }[/math] are symmetric about [math]\displaystyle{ \mu \lt 0 }[/math].
 Onesided alternative hypothesis H_{2}
 The observations [math]\displaystyle{ X_i  Y_i }[/math] are symmetric about [math]\displaystyle{ \mu \gt 0 }[/math].
 Twosided alternative hypothesis H_{3}
 The observations [math]\displaystyle{ X_i  Y_i }[/math] are symmetric about [math]\displaystyle{ \mu \neq 0 }[/math].
These can also be expressed more directly in terms of the original pairs:^{[26]}
 Null hypothesis H_{0}
 The observations [math]\displaystyle{ (X_i, Y_i) }[/math] are exchangeable, meaning that [math]\displaystyle{ (X_i, Y_i) }[/math] and [math]\displaystyle{ (Y_i, X_i) }[/math] have the same distribution. Equivalently, [math]\displaystyle{ F(x, y) = F(y, x) }[/math].
 Onesided alternative hypothesis H_{1}
 For some [math]\displaystyle{ \mu \lt 0 }[/math], the pairs [math]\displaystyle{ (X_i, Y_i) }[/math] and [math]\displaystyle{ (Y_i + \mu, X_i  \mu) }[/math] have the same distribution.
 Onesided alternative hypothesis H_{2}
 For some [math]\displaystyle{ \mu \gt 0 }[/math], the pairs [math]\displaystyle{ (X_i, Y_i) }[/math] and [math]\displaystyle{ (Y_i + \mu, X_i  \mu) }[/math] have the same distribution.
 Twosided alternative hypothesis H_{3}
 For some [math]\displaystyle{ \mu \neq 0 }[/math], the pairs [math]\displaystyle{ (X_i, Y_i) }[/math] and [math]\displaystyle{ (Y_i + \mu, X_i  \mu) }[/math] have the same distribution.
The null hypothesis of exchangeability can arise from a matched pair experiment with a treatment group and a control group. Randomizing the treatment and control within each pair makes the observations exchangeable. For an exchangeable distribution, [math]\displaystyle{ X_i  Y_i }[/math] has the same distribution as [math]\displaystyle{ Y_i  X_i }[/math], and therefore, under the null hypothesis, the distribution is symmetric about zero.^{[27]}
Because the onesample test can be used as a onesided test for stochastic dominance, the paired difference Wilcoxon test can be used to compare the following hypotheses:^{[28]}
 Null hypothesis H_{0}
 The observations [math]\displaystyle{ (X_i, Y_i) }[/math] are exchangeable.
 Onesided alternative hypothesis H_{1}
 The differences [math]\displaystyle{ X_i  Y_i }[/math] are stochastically smaller than a distribution symmetric about zero, that is, for every [math]\displaystyle{ x \ge 0 }[/math], [math]\displaystyle{ Pr(X_i \lt Y_i  x) \ge \Pr(X_i \gt Y_i + x) }[/math].
 Onesided alternative hypothesis H_{2}
 The differences [math]\displaystyle{ X_i  Y_i }[/math] are stochastically larger than a distribution symmetric about zero, that is, for every [math]\displaystyle{ x \ge 0 }[/math], [math]\displaystyle{ Pr(X_i \lt Y_i  x) \le \Pr(X_i \gt Y_i + x) }[/math].
Zeros and ties
In real data, it sometimes happens that there is a sample [math]\displaystyle{ X_i }[/math] which equals zero or a pair [math]\displaystyle{ (X_i, Y_i) }[/math] with [math]\displaystyle{ X_i = Y_i }[/math]. It can also happen that there are tied samples. This means that for some [math]\displaystyle{ i \neq j }[/math], we have [math]\displaystyle{ X_i = X_j }[/math] (in the onesample case) or [math]\displaystyle{ X_i  Y_i = X_j  Y_j }[/math] (in the paired sample case). This is particularly common for discrete data. When this happens, the test procedure defined above is usually undefined because there is no way to uniquely rank the data. (The sole exception is if there is a single sample [math]\displaystyle{ X_i }[/math] which is zero and no other zeros or ties.) Because of this, the test statistic needs to be modified.
Zeros
Wilcoxon's original paper did not address the question of observations (or, in the paired sample case, differences) that equal zero. However, in later surveys, he recommended removing zeros from the sample.^{[29]} Then the standard signedrank test could be applied to the resulting data, as long as there were no ties. This is now called the reduced sample procedure.
Pratt^{[30]} observed that the reduced sample procedure can lead to paradoxical behavior. He gives the following example. Suppose that we are in the onesample situation and have the following thirteen observations:
 0, 2, 3, 4, 6, 7, 8, 9, 11, 14, 15, 17, −18.
The reduced sample procedure removes the zero. To the remaining data, it assigns the signed ranks:
 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, −12.
This has a onesided pvalue of [math]\displaystyle{ 55/2^{12} }[/math], and therefore the sample is not significantly positive at any significance level [math]\displaystyle{ \alpha \lt 55/2^{12} \approx 0.0134 }[/math]. Pratt argues that one would expect that decreasing the observations should certainly not make the data appear more positive. However, if the zero observation is decreased by an amount less than 2, or if all observations are decreased by an amount less than 1, then the signed ranks become:
 −1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, −13.
This has a onesided pvalue of [math]\displaystyle{ 109/2^{13} }[/math]. Therefore the sample would be judged significantly positive at any significance level [math]\displaystyle{ \alpha \gt 109/2^{13} \approx 0.0133 }[/math]. The paradox is that, if [math]\displaystyle{ \alpha }[/math] is between [math]\displaystyle{ 109/2^{13} }[/math] and [math]\displaystyle{ 55/2^{12} }[/math], then decreasing an insignificant sample causes it to appear significantly positive.
Pratt therefore proposed the signedrank zero procedure. This procedure includes the zeros when ranking the samples. However, it excludes them from the test statistic, or equivalently it defines [math]\displaystyle{ \sgn(0) = 0 }[/math]. Pratt proved that the signedrank zero procedure has several desirable behaviors not shared by the reduced sample procedure:^{[31]}
 Increasing the observed values does not make a significantly positive sample insignificant, and it does not make an insignificant sample significantly negative.
 If the distribution of the observations is symmetric, then the values of [math]\displaystyle{ \mu }[/math] which the test does not reject form an interval.
 A sample is significantly positive, not significant, or significantly negative, if and only if it is so when the zeros are assigned arbitrary nonzero signs, if and only if it is so when the zeros are replaced with nonzero values which are smaller in absolute value than any nonzero observation.
 For a fixed significance threshold [math]\displaystyle{ \alpha }[/math], and for a test which is randomized to have level exactly [math]\displaystyle{ \alpha }[/math], the probability of calling a set of observations significantly positive (respectively, significantly negative) is a nondecreasing (respectively, nonincreasing) function of the observations.
Pratt remarks that, when the signedrank zero procedure is combined with the average rank procedure for resolving ties, the resulting test is a consistent test against the alternative hypothesis that, for all [math]\displaystyle{ i \neq j }[/math], [math]\displaystyle{ \Pr(X_i + X_j \gt 0) }[/math] and [math]\displaystyle{ \Pr(X_i + X_j \lt 0) }[/math] differ by at least a fixed constant that is independent of [math]\displaystyle{ i }[/math] and [math]\displaystyle{ j }[/math].^{[32]}
The signedrank zero procedure has the disadvantage that, when zeros occur, the null distribution of the test statistic changes, so tables of pvalues can no longer be used.
When the data is on a Likert scale with equally spaced categories, the signedrank zero procedure is more likely to maintain the Type I error rate than the reduced sample procedure.^{[33]}
From the viewpoint of statistical efficiency, there is no perfect rule for handling zeros. Conover found examples of null and alternative hypotheses that show that neither Wilcoxon's and Pratt's methods are uniformly better than the other. When comparing a discrete uniform distribution to a distribution where probabilities linearly increase from left to right, Pratt's method outperforms Wilcoxon's. When testing a binomial distribution centered at zero to see whether the parameter of each Bernoulli trial is [math]\displaystyle{ \tfrac12 }[/math], Wilcoxon's method outperforms Pratt's.^{[34]}
Ties
When the data does not have ties, the ranks [math]\displaystyle{ R_i }[/math] are used to calculate the test statistic. In the presence of ties, the ranks are not defined. There are two main approaches to resolving this.
The most common procedure for handling ties, and the one originally recommended by Wilcoxon, is called the average rank or midrank procedure. This procedure assigns numbers between 1 and n to the observations, with two observations getting the same number if and only if they have the same absolute value. These numbers are conventionally called ranks even though the set of these numbers is not equal to [math]\displaystyle{ \{1, \dots, n\} }[/math] (except when there are no ties). The rank assigned to an observation is the average of the possible ranks it would have if the ties were broken in all possible ways. Once the ranks are assigned, the test statistic is computed in the same way as usual.^{[35]}^{[36]}
For example, suppose that the observations satisfy [math]\displaystyle{ X_3\lt X_2= X_5\lt X_6\lt X_1= X_4= X_7. }[/math] In this case, [math]\displaystyle{ X_3 }[/math] is assigned rank 1, [math]\displaystyle{ X_2 }[/math] and [math]\displaystyle{ X_5 }[/math] are assigned rank [math]\displaystyle{ (2 + 3) / 2 = 2.5 }[/math], [math]\displaystyle{ X_6 }[/math] is assigned rank 4, and [math]\displaystyle{ X_1 }[/math], [math]\displaystyle{ X_4 }[/math], and [math]\displaystyle{ X_7 }[/math] are assigned rank [math]\displaystyle{ (5 + 6 + 7) / 3 = 6 }[/math]. Formally, suppose that there is a set of observations all having the same absolute value [math]\displaystyle{ v }[/math], that [math]\displaystyle{ k  1 }[/math] observations have absolute value less than [math]\displaystyle{ v }[/math], and that [math]\displaystyle{ \ell }[/math] observations have absolute value less than or equal to [math]\displaystyle{ v }[/math]. If the ties among the observations with absolute value [math]\displaystyle{ v }[/math] were broken, then these observations would occupy ranks [math]\displaystyle{ k }[/math] through [math]\displaystyle{ \ell }[/math]. The average rank procedure therefore assigns them the rank [math]\displaystyle{ (k + \ell) / 2 }[/math].
Under the average rank procedure, the null distribution is different in the presence of ties.^{[37]}^{[38]} The average rank procedure also has some disadvantages that are similar to those of the reduced sample procedure for zeros. It is possible that a sample can be judged significantly positive by the average rank procedure; but increasing some of the values so as to break the ties, or breaking the ties in any way whatsoever, results in a sample that the test judges to be not significant.^{[39]}^{[40]} However, increasing all the observed values by the same amount cannot turn a significantly positive result into an insignificant one, nor an insignificant one into a significantly negative one. Furthermore, if the observations are distributed symmetrically, then the values of [math]\displaystyle{ \mu }[/math] which the test does not reject form an interval.^{[41]}^{[42]}
The other common option for handling ties is a tiebreaking procedure. In a tiebreaking procedure, the observations are assigned distinct ranks in the set [math]\displaystyle{ \{1, \dots, n\} }[/math]. The rank assigned to an observation depends on its absolute value and the tiebreaking rule. Observations with smaller absolute values are always given smaller ranks, just as in the standard ranksum test. The tiebreaking rule is used to assign ranks to observations with the same absolute value. One advantage of tiebreaking rules is that they allow the use of standard tables for computing pvalues.^{[43]}
Random tiebreaking breaks the ties at random. Under random tiebreaking, the null distribution is the same as when there are no ties, but the result of the test depends not only on the data but on additional random choices. Averaging the ranks over the possible random choices results in the average rank procedure.^{[44]} One could also report the probability of rejection over all random choices.^{[45]} Random tiebreaking has the advantage that the probability that a sample is judged significantly positive does not decrease when some observations are increased.^{[46]} Conservative tiebreaking breaks the ties in favor of the null hypothesis. When performing a onesided test in which negative values of [math]\displaystyle{ T }[/math] tend to be more significant, ties are broken by assigning lower ranks to negative observations and higher ranks to positive ones. When the test makes positive values of [math]\displaystyle{ T }[/math] significant, ties are broken the other way, and when large absolute values of [math]\displaystyle{ T }[/math] are significant, ties are broken so as to make [math]\displaystyle{ T }[/math] as small as possible. Pratt observes that when ties are likely, the conservative tiebreaking procedure "presumably has low power, since it amounts to breaking all ties in favor of the null hypothesis."^{[47]}
The average rank procedure can disagree with tiebreaking procedures. Pratt gives the following example.^{[48]} Suppose that the observations are:
 1, 1, 1, 1, 2, 3, −4.
The average rank procedure assigns these the signed ranks
 2.5, 2.5, 2.5, 2.5, 5, 6, −7.
This sample is significantly positive at the onesided level [math]\displaystyle{ \alpha = 14 / 2^7 }[/math]. On the other hand, any tiebreaking rule will assign the ranks
 1, 2, 3, 4, 5, 6, −7.
At the same onesided level [math]\displaystyle{ \alpha = 14 / 2^7 }[/math], this is not significant.
Two other options for handling ties are based around averaging the results of tiebreaking. In the average statistic method, the test statistic [math]\displaystyle{ T }[/math] is computed for every possible way of breaking ties, and the final statistic is the mean of the tiebroken statistics. In the average probability method, the pvalue is computed for every possible way of breaking ties, and the final pvalue is the mean of the tiebroken pvalues.^{[49]}
Computing the null distribution
Computing pvalues requires knowing the distribution of [math]\displaystyle{ T }[/math] under the null hypothesis. There is no closed formula for this distribution.^{[50]} However, for small values of [math]\displaystyle{ n }[/math], the distribution may be computed exactly. Under the null hypothesis that the data is symmetric about zero, each [math]\displaystyle{ X_i }[/math] is exactly as likely to be positive as it is negative. Therefore the probability that [math]\displaystyle{ T = t }[/math] under the null hypothesis is equal to the number of sign combinations that yield [math]\displaystyle{ T = t }[/math] divided by the number of possible sign combinations [math]\displaystyle{ 2^n }[/math]. This can be used to compute the exact distribution of [math]\displaystyle{ T }[/math] under the null hypothesis.^{[51]}
Computing the distribution of [math]\displaystyle{ T }[/math] by considering all possibilities requires computing [math]\displaystyle{ 2^n }[/math] sums, which is intractable for all but the smallest [math]\displaystyle{ n }[/math]. However, there is an efficient recursion for the distribution of [math]\displaystyle{ T^+ }[/math].^{[52]}^{[53]} Define [math]\displaystyle{ u_n(t^+) }[/math] to be the number of sign combinations for which [math]\displaystyle{ T^+ = t^+ }[/math]. This is equal to the number of subsets of [math]\displaystyle{ \{1, \dots, n\} }[/math] which sum to [math]\displaystyle{ t^+ }[/math]. The base cases of the recursion are [math]\displaystyle{ u_0(0) = 1 }[/math], [math]\displaystyle{ u_0(t^+) = 0 }[/math] for all [math]\displaystyle{ t^+ \neq 0 }[/math], and [math]\displaystyle{ u_n(t^+) = 0 }[/math] for all [math]\displaystyle{ t \lt 0 }[/math] or [math]\displaystyle{ t \gt n(n + 1)/2 }[/math]. The recursive formula is [math]\displaystyle{ u_n(t^+) = u_{n  1}(t^+) + u_{n  1}(t^+  n). }[/math] The formula is true because every subset of [math]\displaystyle{ \{1, \dots, n\} }[/math] which sums to [math]\displaystyle{ t^+ }[/math] either does not contain [math]\displaystyle{ n }[/math], in which case it is also a subset of [math]\displaystyle{ \{1, \dots, n  1\} }[/math], or it does contain [math]\displaystyle{ n }[/math], in which case removing [math]\displaystyle{ n }[/math] from the subset produces a subset of [math]\displaystyle{ \{1, \dots, n  1\} }[/math] which sums to [math]\displaystyle{ t^+  n }[/math]. Under the null hypothesis, the probability mass function of [math]\displaystyle{ T^+ }[/math] satisfies [math]\displaystyle{ \Pr(T^+ = t^+) = u_n(t^+) / 2^n }[/math]. The function [math]\displaystyle{ u_n }[/math] is closely related to the integer partition function.^{[54]}
If [math]\displaystyle{ p_n(t^+) }[/math] is the probability that [math]\displaystyle{ T^+ = t^+ }[/math] under the null hypothesis when there are [math]\displaystyle{ n }[/math] samples, then [math]\displaystyle{ p_n(t^+) }[/math] satisfies a similar recursion:^{[55]} [math]\displaystyle{ 2p_n(t^+) = p_{n1}(t^+) + p_{n1}(t^+  n) }[/math] with similar boundary conditions. There is also a recursive formula for the cumulative distribution function [math]\displaystyle{ \Pr(T^+ \le t^+) }[/math].^{[56]}
For very large [math]\displaystyle{ n }[/math], even the above recursion is too slow. In this case, the null distribution can be approximated. The null distributions of [math]\displaystyle{ T }[/math], [math]\displaystyle{ T^+ }[/math], and [math]\displaystyle{ T^ }[/math] are asymptotically normal with means and variances:^{[57]} [math]\displaystyle{ \begin{align} \mathbf{E}[T^+] &= \mathbf{E}[T^] = \frac{n(n + 1)}{4}, \\ \mathbf{E}[T] &= 0, \\ \operatorname{Var}(T^+) &= \operatorname{Var}(T^) = \frac{n(n + 1)(2n + 1)}{24}, \\ \operatorname{Var}(T) &= \frac{n(n + 1)(2n + 1)}{6}. \end{align} }[/math]
Better approximations can be produced using Edgeworth expansions. Using a fourthorder Edgeworth expansion shows that:^{[58]}^{[59]} [math]\displaystyle{ \Pr(T^+ \le k) \approx \Phi(t) + \phi(t)\Big(\frac{3n^2 + 3n  1}{10n(n + 1)(2n + 1)}\Big)(t^3  3t), }[/math] where [math]\displaystyle{ t = \frac{k + \tfrac12  \frac{n(n + 1)}{4}}{\sqrt{\frac{n(n + 1)(2n + 1)}{24}}}. }[/math] The technical underpinnings of these expansions are rather involved, because conventional Edgeworth expansions apply to sums of IID continuous random variables, while [math]\displaystyle{ T^+ }[/math] is a sum of nonidentically distributed discrete random variables. The final result, however, is that the above expansion has an error of [math]\displaystyle{ O(n^{3/2}) }[/math], just like a conventional fourthorder Edgeworth expansion.^{[58]}
The moment generating function of [math]\displaystyle{ T }[/math] has the exact formula:^{[60]} [math]\displaystyle{ M(t) = \frac{1}{2^n}\prod_{j=1}^n (1 + e^{jt}). }[/math]
When zeros are present and the signedrank zero procedure is used, or when ties are present and the average rank procedure is used, the null distribution of [math]\displaystyle{ T }[/math] changes. Cureton derived a normal approximation for this situation.^{[61]}^{[62]} Suppose that the original number of observations was [math]\displaystyle{ n }[/math] and the number of zeros was [math]\displaystyle{ z }[/math]. The tie correction is [math]\displaystyle{ c = \sum t^3  t, }[/math] where the sum is over all the sizes [math]\displaystyle{ t }[/math] of each group of tied observations. The expectation of [math]\displaystyle{ T }[/math] is still zero, while the expectation of [math]\displaystyle{ T^+ }[/math] is [math]\displaystyle{ \mathbf{E}[T^+] = \frac{n(n + 1)}{4}  \frac{z(z + 1)}{4}. }[/math] If [math]\displaystyle{ \sigma^2 = \frac{n(n + 1)(2n + 1)  z(z + 1)(2z + 1)  c/2}{6}, }[/math] then [math]\displaystyle{ \begin{align} \operatorname{Var}(T) &= \sigma^2, \\ \operatorname{Var}(T^+) &= \sigma^2 / 4. \end{align} }[/math]
Alternative statistics
Wilcoxon^{[63]} originally defined the Wilcoxon ranksum statistic to be [math]\displaystyle{ \min(T^+, T^) }[/math]. Early authors such as Siegel^{[64]} followed Wilcoxon. This is appropriate for twosided hypothesis tests, but it cannot be used for onesided tests.
Instead of assigning ranks between 1 and n, it is also possible to assign ranks between 0 and [math]\displaystyle{ n  1 }[/math]. These are called modified ranks.^{[65]} The modified signedrank sum [math]\displaystyle{ T_0 }[/math], the modified positiverank sum [math]\displaystyle{ T_0^+ }[/math], and the modified negativerank sum [math]\displaystyle{ T_0^ }[/math] are defined analogously to [math]\displaystyle{ T }[/math], [math]\displaystyle{ T^+ }[/math], and [math]\displaystyle{ T^ }[/math] but with the modified ranks in place of the ordinary ranks. The probability that the sum of two independent [math]\displaystyle{ F }[/math]distributed random variables is positive can be estimated as [math]\displaystyle{ 2T_0^+/(n(n  1)) }[/math].^{[66]} When consideration is restricted to continuous distributions, this is a minimum variance unbiased estimator of [math]\displaystyle{ p_2 }[/math].^{[67]}
Example

order by absolute difference 

[math]\displaystyle{ \sgn }[/math] is the sign function, [math]\displaystyle{ \text{abs} }[/math] is the absolute value, and [math]\displaystyle{ R_i }[/math] is the rank. Notice that pairs 3 and 9 are tied in absolute value. They would be ranked 1 and 2, so each gets the average of those ranks, 1.5.
 [math]\displaystyle{ W = 1.5+1.53456+7+8+9=9 }[/math]
 [math]\displaystyle{ W \lt W_{\operatorname{crit}(\alpha = 0.05,\ 9 \text{, twosided})} = 15 }[/math]
 [math]\displaystyle{ \therefore \text{failed to reject } H_0 }[/math] that the median of pairwise differences is different from zero.
 The [math]\displaystyle{ p }[/math]value for this result is [math]\displaystyle{ 0.6113 }[/math]
Effect size
To compute an effect size for the signedrank test, one can use the rankbiserial correlation.
If the test statistic T is reported, the rank correlation r is equal to the test statistic T divided by the total rank sum S, or r = T/S. ^{[68]} Using the above example, the test statistic is T = 9. The sample size of 9 has a total rank sum of S = (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9) = 45. Hence, the rank correlation is 9/45, so r = 0.20.
If the test statistic T is reported, an equivalent way to compute the rank correlation is with the difference in proportion between the two rank sums, which is the Kerby (2014) simple difference formula.^{[68]} To continue with the current example, the sample size is 9, so the total rank sum is 45. T is the smaller of the two rank sums, so T is 3 + 4 + 5 + 6 = 18. From this information alone, the remaining rank sum can be computed, because it is the total sum S minus T, or in this case 45 − 18 = 27. Next, the two ranksum proportions are 27/45 = 60% and 18/45 = 40%. Finally, the rank correlation is the difference between the two proportions (.60 minus .40), hence r = .20.
Software implementations
 R includes an implementation of the test as
wilcox.test(x,y, paired=TRUE)
, where x and y are vectors of equal length.^{[69]}  ALGLIB includes implementation of the Wilcoxon signedrank test in C++, C#, Delphi, Visual Basic, etc.
 GNU Octave implements various onetailed and twotailed versions of the test in the
wilcoxon_test
function.  SciPy includes an implementation of the Wilcoxon signedrank test in Python.
 Accord.NET includes an implementation of the Wilcoxon signedrank test in C# for .NET applications.
 MATLAB implements this test using "Wilcoxon rank sum test" as [p,h] = signrank(x,y) also returns a logical value indicating the test decision. The result h = 1 indicates a rejection of the null hypothesis, and h = 0 indicates a failure to reject the null hypothesis at the 5% significance level.
 Julia HypothesisTests package includes the Wilcoxon signedrank test as "value(SignedRankTest(x, y))".
 SAS PROC UNIVARIATE includes the WilcoxonSigned Rank Test in the frame titles "Tests for Location" as "Signed Rank". Even though this procedure calculates an SStatistic rather than a WStatistic, the resulting pvalue can still be used for this test.^{[70]}
See also
References
 ↑ Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). John Wiley & Sons, Inc.. ISBN 0471160687., p. 350
 ↑ "Wilcoxon signedrank test  Handbook of Biological Statistics". http://www.biostathandbook.com/wilcoxonsignedrank.html.
 ↑ Wilcoxon, Frank (Dec 1945). "Individual comparisons by ranking methods". Biometrics Bulletin 1 (6): 80–83. doi:10.2307/3001968. http://sci2s.ugr.es/keel/pdf/algorithm/articulo/wilcoxon1945.pdf.
 ↑ Siegel, Sidney (1956). Nonparametric statistics for the behavioral sciences. New York: McGrawHill. pp. 75–83. ISBN 9780070573482. https://books.google.com/books?id=ebfRAAAAMAAJ&q=Wilcoxon.
 ↑ Conover, p. 352
 ↑ Siegel, p. 76
 ↑ Conover, p. 353
 ↑ Pratt, John W.; Gibbons, Jean D. (1981). Concepts of Nonparametric Theory. SpringerVerlag. ISBN 9781461259336., p. 148
 ↑ Pratt and Gibbons, p. 148
 ↑ Pratt and Gibbons, p. 148
 ↑ Pratt and Gibbons, p. 150
 ↑ Conover, pp. 352–357
 ↑ Hettmansperger, Thomas P. (1984). Statistical Inference Based on Ranks. John Wiley & Sons. ISBN 047188474X., pp. 32, 50
 ↑ Pratt and Gibbons, p. 153
 ↑ Pratt and Gibbons, pp. 153–154
 ↑ Hettmansperger, pp. 38–39
 ↑ Pratt and Gibbons, pp. 146–147
 ↑ Pratt and Gibbons, pp. 146–147
 ↑ Hettmansperger, pp. 30–31
 ↑ Conover, p. 353
 ↑ Pratt and Gibbons, pp. 155–156
 ↑ Hettmansperger, pp. 49–50
 ↑ Pratt and Gibbons, p. 155
 ↑ Conover, p. 354
 ↑ Hollander, Myles; Wolfe, Douglas A.; Chicken, Eric (2014). Nonparametric Statistical Methods (Third ed.). John Wiley & Sons, Inc.. ISBN 9780470387375., pp. 39–41
 ↑ Pratt and Gibbons, p. 147
 ↑ Pratt and Gibbons, p. 147
 ↑ Hettmansperger, pp. 49–50
 ↑ Wilcoxon, Frank (1949). Some Rapid Approximate Statistical Procedures. American Cynamic Co..
 ↑ Pratt, J. (1959). "Remarks on zeros and ties in the Wilcoxon signed rank procedures". Journal of the American Statistical Association 54 (287): 655–667. doi:10.1080/01621459.1959.10501526.
 ↑ Pratt, p. 659
 ↑ Pratt, p. 663
 ↑ Derrick, B; White, P (2017). "Comparing Two Samples from an Individual Likert Question". International Journal of Mathematics and Statistics 18 (3): 1–13.
 ↑ Conover, William Jay (1973). "On Methods of Handling Ties in the Wilcoxon SignedRank Test". Journal of the American Statistical Association 68 (344): 985–988. doi:10.1080/01621459.1973.10481460.
 ↑ Pratt and Gibbons, p. 162
 ↑ Conover, pp. 352–353
 ↑ Pratt and Gibbons, p. 164
 ↑ Conover, pp. 358–359
 ↑ Pratt, p. 660
 ↑ Pratt and Gibbons, pp. 168–169
 ↑ Pratt, pp. 661–662
 ↑ Pratt and Gibbons, p. 170
 ↑ Pratt and Gibbons, pp. 163, 166
 ↑ Pratt, p. 660
 ↑ Pratt and Gibbons, p. 166
 ↑ Pratt and Gibbons, p. 171
 ↑ Pratt, p. 661
 ↑ Pratt, p. 660
 ↑ Gibbons, Jean D.; Chakraborti, Subhabrata (2011). Nonparametric Statistical Inference (Fifth ed.). Chapman & Hall/CRC. ISBN 9781420077629., p. 194
 ↑ Hettmansperger, p. 34
 ↑ Pratt and Gibbons, pp. 148–149
 ↑ Pratt and Gibbons, pp. 148–149, pp. 186–187
 ↑ Hettmansperger, p. 171
 ↑ Pratt and Gibbons, p. 187
 ↑ Pratt and Gibbons, p. 187
 ↑ Pratt and Gibbons, p. 187
 ↑ Pratt and Gibbons, p. 149
 ↑ ^{58.0} ^{58.1} Kolassa, John E. (1995). "Edgeworth approximations for rank sum test statistics". Statistics and Probability Letters 24 (2): 169–171. doi:10.1016/01677152(95)00164H.
 ↑ Hettmansperger, p. 37
 ↑ Hettmansperger, p. 35
 ↑ Cureton, Edward E. (1967). "The normal approximation to the signedrank sampling distribution when zero differences are present". Journal of the American Statistical Association 62 (319): 1068–1069. doi:10.1080/01621459.1967.10500917.
 ↑ Pratt and Gibbons, p. 193
 ↑ Wilcoxon, p. 82
 ↑ Siegel, p. 76
 ↑ Pratt and Gibbons, p. 158
 ↑ Pratt and Gibbons, p. 159
 ↑ Pratt and Gibbons, p. 191
 ↑ ^{68.0} ^{68.1} Kerby, Dave S. (2014), "The simple difference formula: An approach to teaching nonparametric correlation.", Comprehensive Psychology 3: 11.IT.3.1, doi:10.2466/11.IT.3.1
 ↑ Dalgaard, Peter (2008). Introductory Statistics with R. Springer Science & Business Media. pp. 99–100. ISBN 9780387790534. https://books.google.com/books?id=YI0kT8cuiVUC&pg=PA99.
 ↑ "Wilcox signedrank test: SAS instruction". https://www.stat.purdue.edu/~tqin/system101/method/method_wilcoxon_signed_rank_sas.htm.
External links
 Wilcoxon SignedRank Test in R
 Example of using the Wilcoxon signedrank test
 An online version of the test
 A table of critical values for the Wilcoxon signedrank test
 Brief guide by experimental psychologist Karl L. Weunsch  Nonparametric effect size estimators (Copyright 2015 by Karl L. Weunsch)
 Kerby, D. S. (2014). The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology, volume 3, article 1. doi:10.2466/11.IT.3.1. link to article
Original source: https://en.wikipedia.org/wiki/Wilcoxon signedrank test.
Read more 