Rank correlation
In statistics, a rank correlation is the relationship between different rankings of the same set of items. A rank correlation coefficient measures the degree of similarity between two rankings, and can be used to assess its significance.
Contents |
[edit] Correlation coefficients
Some of the more popular rank correlation statistics include
An increasing rank correlation coefficient implies increasing agreement between rankings. The coefficient is inside the interval [−1, 1] and assumes the value:
- −1 if the disagreement between the two rankings is perfect; one ranking is the reverse of the other.
- 0 if the rankings are completely independent.
- 1 if the agreement between the two rankings is perfect; the two rankings are the same.
Following Diaconis (1988), a ranking can be seen as a permutation of a set of objects. Thus we can look at observed rankings as data obtained when the sample space is (identified with) a symmetric group. We can then introduce a metric, making the symmetric group into a metric space. Different metrics will correspond to different rank correlations.
[edit] General Correlation Coefficient
Kendall Kendall (1944) showed that can be defined a general correlation coefficient, of which his tau and the Spearman rho are particular cases.
Suppose we have a set of objects, which are being considered in relation to two properties, represented by and , forming the sets of values and . To any pair of individuals, say the -th and the -th we assign a -score, denoted by , and a -score, denoted by . The only requirement made to this functions is anti-symmetry, so and . Then the generalised correlation coefficient is defined by
[edit] Kendall's as a particular case
If is the rank of the -member according to the -quality, we can define
and similarly for . The sum is twice the amount of concordant pairs minus the discordant pairs (see Kendall tau rank correlation coefficient). The sum is just the number of terms , equal to , and so for . It follows that is equal to the Kendall's coefficient.
[edit] Spearman's as a particular case
If , are the ranks of the -member according to the and the -quality respectively, we can simply define
The sums and are equal, since both and range from to . Then we have:
now
since and are both equal to the sum of the first natural numbers, namely .
We also have
and hence
Being the sum of squares of the first naturals, the last equation reduces to
Further
and thus, substituting into the original formula these results we get
which is exactly the Spearman's rank correlation coefficient .
[edit] References
- Everitt, B. S. (2002), The Cambridge Dictionary of Statistics, Cambridge: Cambridge University Press, ISBN 0-521-81099-X
- Diaconis, P. (1988), Group Representations in Probability and Statistics, Lecture Notes-Monograph Series, Hayward, CA: Institute of Mathematical Statistics, ISBN 0-940600-14-5
- Kendall, M. G. (1970), Rank Correlation Methods, London: Griffin, ISBN 0-85264-199-0
|