Rank correlation

In statistics, a rank correlation is the relationship between different rankings of the same set of items. A rank correlation coefficient measures the degree of similarity between two rankings, and can be used to assess its significance.

[edit] Correlation coefficients

Some of the more popular rank correlation statistics include

An increasing rank correlation coefficient implies increasing agreement between rankings. The coefficient is inside the interval [−1, 1] and assumes the value:

−1 if the disagreement between the two rankings is perfect; one ranking is the reverse of the other.
0 if the rankings are completely independent.
1 if the agreement between the two rankings is perfect; the two rankings are the same.

Following Diaconis (1988), a ranking can be seen as a permutation of a set of objects. Thus we can look at observed rankings as data obtained when the sample space is (identified with) a symmetric group. We can then introduce a metric, making the symmetric group into a metric space. Different metrics will correspond to different rank correlations.

[edit] General Correlation Coefficient

Kendall Kendall (1944) showed that can be defined a general correlation coefficient, of which his tau and the Spearman rho are particular cases.

Suppose we have a set of $n$ objects, which are being considered in relation to two properties, represented by $x$ and $y$ , forming the sets of values $\{x_i\}_{i\le n}$ and $\{y_i\}_{i\le n}$ . To any pair of individuals, say the $i$ -th and the $j$ -th we assign a $x$ -score, denoted by $a_{ij}$ , and a $y$ -score, denoted by $b_{ij}$ . The only requirement made to this functions is anti-symmetry, so $a_{ij}=-a_{ji}$ and $b_{ij}=-b_{ji}$ . Then the generalised correlation coefficient $\Gamma$ is defined by

$\Gamma = \frac{\sum_{i,j = 1}^n a_{ij}b_{ij}}{\sqrt{\sum_{i,j = 1}^n a_{ij}^2 \sum_{i,j = 1}^n b_{ij}^2}}$

[edit] Kendall's $\tau$ as a particular case

If $r_i$ is the rank of the $i$ -member according to the $x$ -quality, we can define

$a_{ij} = \sgn(r_j-r_i)$

and similarly for $b$ . The sum $\sum a_{ij}b_{ij}$ is twice the amount of concordant pairs minus the discordant pairs (see Kendall tau rank correlation coefficient). The sum $\sum a_{ij}^2$ is just the number of terms $a_{ij}$ , equal to $n(n-1)$ , and so for $\sum b_{ij}^2$ . It follows that $\Gamma$ is equal to the Kendall's $\tau$ coefficient.

[edit] Spearman's $\rho$ as a particular case

If $r_i$ , $s_i$ are the ranks of the $i$ -member according to the $x$ and the $y$ -quality respectively, we can simply define

$a_{ij} = r_j-r_i$

$b_{ij} = s_j-s_i$

The sums $\sum a_{ij}^2$ and $\sum b_{ij}^2$ are equal, since both $r_i$ and $s_i$ range from $1$ to $n$ . Then we have:

$\Gamma = \frac{\sum (r_j-r_i)(s_j-s_i)}{\sum(r_j-r_i)^2}$

now

$\sum_{i,j = 1}^n (r_j-r_i)(s_j-s_i)= \sum_{i=1}^n \sum_{j=1}^n r_is_i + \sum_{i=1}^n \sum_{j=1}^n r_js_j - \sum_{i=1}^n \sum_{j=1}^n (r_is_j+r_js_i)$

$=2n\sum_{i=1}^n r_is_j - 2 \sum_{i=1}^n r_i \sum_{j=1}^n s_j$

$=2n\sum_{i=1}^n r_is_j - \frac12 n^2(n+1)^2$

since $\sum r_i$ and $\sum s_j$ are both equal to the sum of the first $n$ natural numbers, namely $\frac12n(n+1)$ .

We also have

$S = \sum_{i=1}^n (r_i-s_i)^2 = 2 \sum r_i^2 - 2\sum r_is_i$

and hence

$\sum(r_j-r_i)(s_j-s_i) = 2n\sum r_i^2 - \frac12n^2(n+1)^2 - nS$

Being $\sum r_i^2$ the sum of squares of the first $n$ naturals, the last equation reduces to

$\sum(r_j-r_i)(s_j-s_i) = \frac16n^2(n^2-1) - nS$

Further

$\sum(r_j-r_i)^2 = 2n\sum r_i^2-2\sum r_ir_j$

$= 2n\sum r_i^2-2(\sum r_i)^2 = \frac16n^2(n^2-1)$

and thus, substituting into the original formula these results we get

$\Gamma = 1-\frac{6 S}{n^3-n}$

which is exactly the Spearman's rank correlation coefficient $\rho$ .

[edit] References

Everitt, B. S. (2002), The Cambridge Dictionary of Statistics, Cambridge: Cambridge University Press, ISBN 0-521-81099-X
Diaconis, P. (1988), Group Representations in Probability and Statistics, Lecture Notes-Monograph Series, Hayward, CA: Institute of Mathematical Statistics, ISBN 0-940600-14-5
Kendall, M. G. (1970), Rank Correlation Methods, London: Griffin, ISBN 0-85264-199-0

Rank correlation

Contents

[edit] Correlation coefficients

[edit] General Correlation Coefficient

[edit] Kendall's $\tau$ as a particular case

[edit] Spearman's $\rho$ as a particular case

[edit] References

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Interaction

Toolbox

Print/export

Languages

Rank correlation

Contents

[edit] Correlation coefficients

[edit] General Correlation Coefficient

[edit] Kendall's as a particular case

[edit] Spearman's as a particular case

[edit] References

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Interaction

Toolbox

Print/export

Languages

[edit] Kendall's $\tau$ as a particular case

[edit] Spearman's $\rho$ as a particular case