Gini coefficient

From Wikipedia, the free encyclopedia

Jump to:navigation, search
Gini-coefficient world map

The Gini coefficient is a measure of statistical dispersion developed by the Italian statistician Corrado Gini and published in his 1912 paper "Variability and Mutability" (Italian: Variabilità e mutabilità). It is commonly used as a measure of inequality of income or wealth. It has, however, also found application in the study of inequalities in disciplines as diverse as health science, ecology, and chemistry.

Contents

[edit] Definition

Graphical representation of the Gini coefficient.

The graph shows that while the Gini is technically equal to the area marked 'A' divided by the sum of the areas marked 'A' and 'B' (that is, Gini = A/(A+B)), it is also equal to 2*A, since A+B = 0.5 since the axes scale from 0 to 1, and the total surface of the graph therefore equals 1.

The Gini coefficient is usually defined mathematically based on the Lorenz curve, which plots the proportion of the total income of the population (y axis) that is cumulatively earned by the bottom x% of the population (see diagram). The line at 45 degrees thus represents perfect equality of incomes. The Gini coefficient can then be thought of as the ratio of the area that lies between the line of equality and the Lorenz curve (marked 'A' in the diagram) over the total area under the line of equality (marked 'A' and 'B' in the diagram); i.e., G=A/(A+B).

The Gini coefficient can range from 0 to 1; it is sometimes multiplied by 100 to range between 0 and 100. A low Gini coefficient indicates a more equal distribution, with 0 corresponding to complete equality, while higher Gini coefficients indicate more unequal distribution, with 1 corresponding to complete inequality. To be validly computed, no negative goods can be distributed. Thus, if the Gini coefficient is being used to describe household income inequality, then no household can have a negative income. When used as a measure of income inequality, the most unequal society will be one in which a single person receives 100% of the total income and the remaining people receive none (G=1); and the most equal society will be one in which every person receives the same percentage of the total income (G=0).

Some find it more intuitive (and it is mathematically equivalent) to think of the Gini coefficient as half of the Relative mean difference. The mean difference is the average absolute difference between two items selected randomly from a population, and the relative mean difference is the mean difference divided by the average, to normalize for scale.

Worldwide, Gini coefficients for income range from approximately 0.25 (Denmark) to 0.70 (Namibia) although not every country has been assessed. As a mathematical measure of inequality, the Gini coefficient does not necessarily entail any value judgement, i.e. the "rightness" or "wrongness" of a particular level of equality.

[edit] Different uses

Although the Gini coefficient is most popular in economics, it can in theory be applied in any field of science that studies a distribution. For example, in ecology the Gini coefficient has been used as a measure of biodiversity, where the cumulative proportion of species is plotted against cumulative proportion of individuals[1]. In health, it has been used as a measure of the inequality of health related quality of life in a population[2]. In education, it has been used as a measure of the inequality of universities[3]. In chemistry it has been used to express the selectivity of protein kinase inhibitors against a panel of kinases[4]. In statistics, building decision trees, it is used to measure the purity of possible child nodes, with the aim of maximising the average purity of two child nodes when splitting.

[edit] Calculation

The Gini index is defined as a ratio of the areas on the Lorenz curve diagram. If the area between the line of perfect equality and the Lorenz curve is A, and the area under the Lorenz curve is B, then the Gini index is A/(A+B). Since A+B = 0.5, the Gini index, G = A/(0.5) = 2A = 1-2B. If the Lorenz curve is represented by the function Y = L(X), the value of B can be found with integration and:

G = 1 - 2\,\int_0^1 L(X) dX.

In some cases, this equation can be applied to calculate the Gini coefficient without direct reference to the Lorenz curve. For example:

G = \frac{1}{n}\left ( n+1 - 2 \left ( \frac{\Sigma_{i=1}^n \; (n+1-i)y_i}{\Sigma_{i=1}^n y_i} \right ) \right )
This may be simplified to:
G = \frac{2 \Sigma_{i=1}^n \; i y_i}{n \Sigma_{i=1}^n y_i} -\frac{n+1}{n}
G = 1 - \frac{\Sigma_{i=1}^n \; f(y_i)(S_{i-1}+S_i)}{S_n}
where
S_i = \Sigma_{j=1}^i \; f(y_j)\,y_j\, and S_0 = 0\,
G = 1 - \frac{1}{\mu}\int_0^\infty (1-F(y))^2dy = \frac{1}{\mu}\int_0^\infty F(y)(1-F(y))dy
G(S) = \frac{1}{n-1}\left (n+1 - 2 \left ( \frac{\Sigma_{i=1}^n \; (n+1-i)y_i}{\Sigma_{i=1}^n y_i}\right ) \right )
is a consistent estimator of the population Gini coefficient, but is not, in general, unbiased. Like, G, G(S) has a simpler form:
G(S) = 1 - \frac{2}{n-1}\left ( n - \frac{\Sigma_{i=1}^n \; iy_i}{\Sigma_{i=1}^n y_i}\right ) .

There does not exist a sample statistic that is in general an unbiased estimator of the population Gini coefficient, like the relative mean difference.

Sometimes the entire Lorenz curve is not known, and only values at certain intervals are given. In that case, the Gini coefficient can be approximated by using various techniques for interpolating the missing values of the Lorenz curve. If ( X k , Yk ) are the known points on the Lorenz curve, with the X k indexed in increasing order ( X k - 1 < X k ), so that:

If the Lorenz curve is approximated on each interval as a line between consecutive points, then the area B can be approximated with trapezoids and:

G_1 = 1 - \sum_{k=1}^{n} (X_{k} - X_{k-1}) (Y_{k} + Y_{k-1})

is the resulting approximation for G. More accurate results can be obtained using other methods to approximate the area B, such as approximating the Lorenz curve with a quadratic function across pairs of intervals, or building an appropriately smooth approximation to the underlying distribution function that matches the known data. If the population mean and boundary values for each interval are also known, these can also often be used to improve the accuracy of the approximation.

The Gini coefficient calculated from a sample is a statistic and its standard error, or confidence intervals for the population Gini coefficient, should be reported. These can be calculated using bootstrap techniques but those proposed have been mathematically complicated and computationally onerous even in an era of fast computers. Ogwang (2000) made the process more efficient by setting up a “trick regression model” in which the incomes in the sample are ranked with the lowest income being allocated rank 1. The model then expresses the rank (dependent variable) as the sum of a constant A and a normal error term whose variance is inversely proportional to yk;

k = A + \ N(0, s^{2}/y_k)

Ogwang showed that G can be expressed as a function of the weighted least squares estimate of the constant A and that this can be used to speed up the calculation of the jackknife estimate for the standard error. Giles (2004) argued that the standard error of the estimate of A can be used to derive that of the estimate of G directly without using a jackknife at all. This method only requires the use of ordinary least squares regression after ordering the sample data. The results compare favorably with the estimates from the jackknife with agreement improving with increasing sample size. The paper describing this method can be found here: http://web.uvic.ca/econ/ewp0202.pdf

However it has since been argued that this is dependent on the model’s assumptions about the error distributions (Ogwang 2004) and the independence of error terms (Reza & Gastwirth 2006) and that these assumptions are often not valid for real data sets. It may therefore be better to stick with jackknife methods such as those proposed by Yitzhaki (1991) and Karagiannis and Kovacevic (2000). The debate continues.

The Gini coefficient can be calculated if you know the mean of a distribution, the number of people (or percentiles), and the income of each person (or percentile). Princeton development economist Angus Deaton (1997, 139) simplified the Gini calculation to one easy formula:

G = \frac{N+1}{N-1}-\frac{2}{N(N-1)u}(\Sigma_{i=1}^n \; P_iX_i)

where u is mean income of the population, Pi is the income rank P of person i, with income X, such that the richest person receives a rank of 1 and the poorest a rank of N. This effectively gives higher weight to poorer people in the income distribution, which allows the Gini to meet the Transfer Principle.

[edit] Income Gini indices in the world

While developed European nations and Canada tend to have Gini indices between 24 and 36, the United States' and Mexico's Gini indices are both above 40, indicating that the United States and Mexico have greater inequality. Using the Gini can help quantify differences in welfare and compensation policies and philosophies. However it should be borne in mind that the Gini coefficient can be misleading when used to make political comparisons between large and small countries (see criticisms section).

The Gini index for the entire world has been estimated by various parties to be between 56 and 66.[5][6]

The change in Gini indices has differed across countries. Some countries have change little over time, such as Belgium, Canada, Germany, Japan, and Sweden.  Brazil has oscillated around a steady value.  France, Italy, Mexico, and Norway have shown marked declines.  China and the US have increased steadily.  Australia grew to moderate levels before dropping.  India sank before rising again.  The UK and Poland stayed at very low levels before rising.  Bulgaria had an increase of fits-and-starts. .svg‎ alt text

[edit] US income Gini indices over time

Gini indices for the United States at various times, according to the US Census Bureau:[7][8]

[edit] EU Gini index

In 2005 the Gini index for the EU was estimated at 31.[10].

[edit] Advantages of Gini coefficient as a measure of inequality

[edit] Disadvantages of Gini coefficient as a measure of inequality

[edit] Problems in using the Gini coefficient

[edit] General problems of measurement

As one result of this criticism, in addition to or in competition with the Gini coefficient entropy measures are frequently used (e.g. the Theil Index and the Atkinson index). These measures attempt to compare the distribution of resources by intelligent agents in the market with a maximum entropy random distribution, which would occur if these agents acted like non-intelligent particles in a closed system following the laws of statistical physics.

[edit] Credit risk

The Gini coefficient is also commonly used for the measurement of the discriminatory power of rating systems in credit risk management.

The discriminatory power refers to a credit risk model's ability to differentiate between defaulting and non-defaulting clients. The above formula G1 may be used for the final model and also at individual model factor level, to quantify the discriminatory power of individual factors. This is as a result of too many non defaulting clients falling into the lower points scale e.g. factor has a 10 point scale and 30% of non defaulting clients are being assigned the lowest points available e.g. 0 or negative points. This indicates that the factor is behaving in a counter-intuitive manner and would require further investigation at the model development stage. [16]

[edit] See also

[edit] References

  1. ^ Wittebolle, Lieven; et al (2009). "Initial community evenness favours functionality under selective stress". Nature 458: pp. 623–626. 
  2. ^ Asada, Yukiko (2005). "Assessment of the health of Americans: the average health-related quality of life and its inequality across individuals and groups". Population Health Metrics 3: pp. 7. doi:10.1186/1478-7954-3-7. 
  3. ^ Halffman, Willem (2010). "Is Inequality Among Universities Increasing? Gini Coefficients and the Elusive Rise of Elite Universities". Minerva 48: pp. 55-72. doi:10.1007/s11024-010-9141-3. 
  4. ^ Graczyk, Piotr (2007). "Gini Coefficient: A New Way To Express Selectivity of Kinase Inhibitors against a Family of Kinases". Journal of Medicinal Chemistry 50: pp. 5773–5779. doi:10.1021/jm070562u. 
  5. ^ Bob Sutcliffe (April 2007), Postscript to the article ‘World inequality and globalization’ (Oxford Review of Economic Policy, Spring 2004), http://siteresources.worldbank.org/INTDECINEQ/Resources/PSBSutcliffe.pdf, retrieved 2007-12-13 
  6. ^ United Nations Development Programme
  7. ^ "Gini Ratios for Households, by Race and Hispanic Origin of Householder: 1967 to 2007". Historical Income Tables - Households. United States Census Bureau. http://www.census.gov/hhes/www/income/histinc/h04.html. 
  8. ^ "Table 3. Income Distribution Measures Using Money Income and Equivalence-Adjusted Income: 2007 and 2008". Income, Poverty, and Health Insurance Coverage in the United States: 2008. United States Census Bureau. p. 17. http://www.census.gov/prod/2009pubs/p60-236.pdf. 
  9. ^ Note that the calculation of the index for the United States was changed in 1992, resulting in an upwards shift of about 2.
  10. ^ http://www.eurofound.europa.eu/areas/qualityoflife/eurlife/index.php?template=3&radioindic=158&idDomain=3
  11. ^ Ray, Debraj. Development Economics. Princeton University Press, 1998. page 188].
  12. ^ Friedman, David D.
  13. ^ (Data from the Statistics Sweden.)
  14. ^ N. Blomquist, "A comparison of distributions of annual and lifetime income: Sweden around 1970", Review of Income and Wealth, Volume 27 Issue 3, Pages 243 - 264, [1]
  15. ^ "Politics, work, and daily life in the USSR", James R. Millar, 1987, p.193
  16. ^ The Analytics of risk model validation[specify]

[edit] Further reading

  • Amiel, Y.; Cowell, F.A. (1999). Thinking about Inequality. Cambridge. 
  • Anand, Sudhir (1983). Inequality and Poverty in Malaysia. New York: Oxford University Press. 
  • Brown, Malcolm (1994). "Using Gini-Style Indices to Evaluate the Spatial Patterns of Health Practitioners: Theoretical Considerations and an Application Based on Alberta Data". Social Science Medicine 38: 1243–1256. doi:10.1016/0277-9536(94)90189-9. 
  • Chakravarty, S. R. (1990). Ethical Social Index Numbers. New York: Springer-Verlag. 
  • Deaton, Angus (1997). Analysis of Household Surveys. Baltimore MD: Johns Hopkins University Press. 
  • Dixon, PM, Weiner J., Mitchell-Olds T, Woodley R. (1987). "Bootstrapping the Gini coefficient of inequality". Ecology 68: 1548–1551. doi:10.2307/1939238. 
  • Dorfman, Robert (1979). "A Formula for the Gini Coefficient". The Review of Economics and Statistics 61: 146–149. doi:10.2307/1924845. 
  • Gastwirth, Joseph L. (1972). "The Estimation of the Lorenz Curve and Gini Index". The Review of Economics and Statistics 54: 306–316. doi:10.2307/1937992. 
  • Giles, David (2004). "Calculating a Standard Error for the Gini Coefficient: Some Further Results". Oxford Bulletin of Economics and Statistics 66: 425–433. doi:10.1111/j.1468-0084.2004.00086.x. 
  • Gini, Corrado (1912). "Variabilità e mutabilità" Reprinted in Memorie di metodologica statistica (Ed. Pizetti E, Salvemini, T). Rome: Libreria Eredi Virgilio Veschi (1955).
  • Gini, Corrado (1921). "Measurement of Inequality of Incomes". The Economic Journal 31: 124–126. doi:10.2307/2223319. 
  • Karagiannis, E. and Kovacevic, M. (2000). "A Method to Calculate the Jackknife Variance Estimator for the Gini Coefficient". Oxford Bulletin of Economics and Statistics 62: 119–122. doi:10.1111/1468-0084.00163. 
  • Mills, Jeffrey A.; Zandvakili, Sourushe (1997). "Statistical Inference via Bootstrapping for Measures of Inequality". Journal of Applied Econometrics 12: 133–150. doi:10.1002/(SICI)1099-1255(199703)12:2<133::AID-JAE433>3.0.CO;2-H. 
  • Modarres, Reza and Gastwirth, Joseph L. (2006). "A Cautionary Note on Estimating the Standard Error of the Gini Index of Inequality". Oxford Bulletin of Economics and Statistics 68: 385–390. doi:10.1111/j.1468-0084.2006.00167.x. 
  • Morgan, James (1962). "The Anatomy of Income Distribution". The Review of Economics and Statistics 44: 270–283. doi:10.2307/1926398. 
  • Ogwang, Tomson (2000). "A Convenient Method of Computing the Gini Index and its Standard Error". Oxford Bulletin of Economics and Statistics 62: 123–129. doi:10.1111/1468-0084.00164. 
  • Ogwang, Tomson (2004). "Calculating a Standard Error for the Gini Coefficient: Some Further Results: Reply". Oxford Bulletin of Economics and Statistics 66: 435–437. doi:10.1111/j.1468-0084.2004.00087.x. 
  • Xu, Kuan (January 2004). How Has the Literature on Gini's Index Evolved in the Past 80 Years?. Department of Economics, Dalhousie University. http://economics.dal.ca/RePEc/dal/wparch/howgini.pdf. Retrieved 2006-06-01.  The Chinese version of this paper appears in Xu, Kuan (2003). "How Has the Literature on Gini's Index Evolved in the Past 80 Years?". China Economic Quarterly 2: 757–778. 
  • Yitzhaki, S. (1991). "Calculating Jackknife Variance Estimators for Parameters of the Gini Method". Journal of Business and Economic Statistics 9: 235–239. doi:10.2307/1391792. 

[edit] External links

Navigation
Interaction
Toolbox
Print/export
Languages