Bayesian probability

From Wikipedia, the free encyclopedia

Jump to: navigation, search

Bayesian probability is an interpretation of the probability calculus which holds that the concept of probability can be defined as the degree to which a person (or community) believes that a proposition is true. Bayesian theory also suggests that Bayes' theorem can be used as a rule to infer or update the degree of belief in light of new information.

Contents

[edit] History

Thomas Bayes. (The correct identification of this portrait has been questioned.)
Thomas Bayes. (The correct identification of this portrait has been questioned.)

Bayesian theory and Bayesian probability are named after Thomas Bayes (1702 — 1761), who proved a special case of what is now called Bayes' theorem. The term Bayesian, however, came into use only around 1950, and it is not clear that Bayes would have endorsed the very broad interpretation of probability that is associated with his name. Laplace proved a more general version of Bayes' theorem and used it to solve problems in celestial mechanics, medical statistics and, by some accounts, even jurisprudence. Laplace, however, didn't consider this general theorem to be important for probability theory. He instead adhered to the classical definition of probability, a special case of the Bayesian definition.

The subjective theory of probability which interprets 'probability' as 'subjective degree of belief in a proposition' was proposed independently and at about the same time by Bruno de Finetti in Italy in Fondamenti Logici del Ragionamento Probabilistico (1930) and Frank Ramsey in Cambridge in The Foundations of Mathematics (1931).[1] It was devised to solve the problems of the classical definition of probability and replace it. L. J. Savage expanded the idea in The Foundations of Statistics (1954).

Formal attempts have been made to define and apply the intuitive notion of a "degree of belief". The most common interpretation is based on betting: a degree of belief is reflected in the odds and stakes that the subject is willing to bet on the proposition at hand. However, the probability of the kind of spatio-temporally universal hypotheses that are so fundamental to science - such as Newton's law of inertia or his law of universal gravitation - poses a problem for the betting definition of 'degree of belief' on the ground that the only fair betting-quotient[2] on such universal hypothesis is zero, since a bet on its truth can never be won because its truth can never be decided.[3] This problem of the Bayesian philosophy of probability becomes a fundamental problem for the Bayesian philosophy of science that scientific reasoning is subjective Bayesian probabilist, which thereby seeks to reduce scientific method to gambling, but some regard it as solvable.[4] But it is also noteworthy that by 1981 De Finetti himself came to reject the betting conception of probability.[5] Note that the problem is easily solved for this type of hypothesis by only allowing bets on two or more alternative theories.[citation needed] The likelihood each theory gives to the observations can be used to estimate the expected return on the bet while taking into account measurement errors.[citation needed] This is analogous to the way Bayesian calculations are made in practice.[citation needed]

On the Bayesian interpretation, the theorems of probability relate to the rationality of partial belief in the way that the theorems of logic are traditionally seen to relate to the rationality of full belief.

The Bayesian approach has been explored by Harold Jeffreys, Richard T. Cox, Edwin Jaynes and I. J. Good. Other well-known proponents of Bayesian probability have included John Maynard Keynes and B.O. Koopman, and many philosophers of the 20th century.

Recently, it has been shown that Bayes' Rule and the Principle of Maximum Entropy (MaxEnt) are completely compatible and can be seen as special cases of the Method of Maximum (relative) Entropy (ME). This method reproduces every aspect of orthodox Bayesian inference methods. In addition this new method opens the door to tackling problems that could not be addressed by either the MaxEnt or orthodox Bayesian methods individually.[6]

[edit] Varieties

The terms subjective probability, personal probability, epistemic probability and logical probability describe some of the schools of thought which are customarily called "Bayesian". These overlap but there are differences of emphasis. Some of the people mentioned here would not call themselves Bayesians.

Subjective Bayesian probability interprets 'probability' as 'the degree of belief (or strength of belief) an individual has in the truth of a proposition', and is in that respect subjective. Some people who call themselves Bayesians do not accept this subjectivity. The chief exponents of this objectivist school were Edwin Thompson Jaynes and Harold Jeffreys. Perhaps the main objectivist Bayesian now living is James Berger of Duke University. Jose Bernardo and others accept some degree of subjectivity but believe a need exists for "reference priors" in many practical situations.

Advocates of logical (or objective epistemic) probability, such as Harold Jeffreys, Rudolf Carnap, Richard Threlkeld Cox and E.T. Jaynes, hope to codify techniques whereby any two persons having the same information relevant to the truth of an uncertain proposition would calculate the same probability. Such probabilities are not relative to the person but to the epistemic situation, and thus lie somewhere between subjective and objective. The methods proposed are not without controversy. Critics challenge the claim that there are grounds for preferring one degree of belief over another in the absence of information about the facts to which those beliefs refer. However, these criticisms are usually reconciled once the question one is trying to ask is clear. It now has been shown that Principle of Maximum Entropy and Bayes' Rule are completely compatible and can be seen as special cases of the Method of Maximum (relative) Entropy (ME)[see http://en.wikipedia.org/wiki/Principle_of_maximum_entropy]

[edit] The Controversy between Bayesian and Frequentist Probability

Bayesian probability - sometimes called credence (i.e. degree of belief) - contrasts with frequency probability, in which probability is derived from observed frequencies in defined distributions or proportions in populations.

The theory of statistics and probability using frequency probability was developed by R.A. Fisher, Egon Pearson and Jerzy Neyman during the first half of the 20th century. A. N. Kolmogorov also used frequency probability to lay the mathematical foundation of probability in measure theory via the Lebesgue integral in Foundations of the Theory of Probability (1933). Savage, Koopman, Abraham Wald and others have developed Bayesian probability since 1950.

The difference between Bayesian and Frequentist interpretations of probability has important consequences in statistical practice. For example, when comparing two hypotheses using the same data, the theory of hypothesis tests, which is based on the frequency interpretation of probability, allows the rejection or non-rejection of one model/hypothesis (the 'null' hypothesis) based on the probability of mistakenly inferring that the data support the other model/hypothesis more. The probability of making such a mistake, called a Type I error, requires the consideration of hypothetical data sets derived from the same data source that are more extreme than the data actually observed. This approach allows the inference that 'either the two hypotheses are different or the observed data are a misleading set'. In contrast, Bayesian methods condition on the data actually observed, and are therefore able to assign posterior probabilities to any number of hypotheses directly. The requirement to assign probabilities to the parameters of models representing each hypothesis is the cost of this more direct approach.

Although there is no reason why different interpretations (senses) of a word cannot be used in different contexts, there is a history of antagonism between Bayesians and frequentists, with the latter often rejecting the Bayesian interpretation as ill-grounded. The groups have also disagreed about which of the two senses reflects what is commonly meant by the term 'probable'. More importantly, the groups have agreed that Bayesian and Frequentist analyses answer genuinely different questions, but disagreed about which class of question it is more important to answer in scientific and engineering contexts.

[edit] Applications

Since the 1950s, Bayesian theory and Bayesian probability have been widely applied through Cox's theorem, Jaynes' principle of maximum entropy and the Dutch book argument. In many applications, Bayesian methods are more general and appear to give better results than frequency probability. Bayes factors have also been applied with Occam's Razor. See Bayesian inference and Bayes' theorem for mathematical applications.

Some regard the scientific method as an application of Bayesian probabilist inference because they claim Bayes's Theorem is explicitly or implicitly used to update the strength of prior scientific beliefs in the truth of hypotheses in the light of new information from observation or experiment. This is said to be done by the use of Bayes's Theorem to calculate a posterior probability using that evidence and is justified by the Principle of Conditionalisation that P'(h) = P(h/e), where P'(h) is the posterior probability of the hypothesis 'h' in the light of the evidence 'e', but which principle is denied by some [7] Adjusting original beliefs could mean (coming closer to) accepting or rejecting the original hypotheses.

Bayesian techniques have recently been applied to filter spam e-mail. A Bayesian spam filter uses a reference set of e-mails to define what is originally believed to be spam. After the reference has been defined, the filter then uses the characteristics in the reference to define new messages as either spam or legitimate e-mail. New e-mail messages act as new information, and if mistakes in the definitions of spam and legitimate e-mail are identified by the user, this new information updates the information in the original reference set of e-mails with the hope that future definitions are more accurate. See Bayesian inference and Bayesian filtering.

[edit] Probabilities of probabilities

One criticism levelled at the Bayesian probability interpretation has been that a single probability assignment cannot convey how well grounded the belief is—i.e., how much evidence one has. Consider the following situations:

  1. You have a box with white and black balls, but no knowledge as to the quantities
  2. You have a box from which you have drawn n balls, half black and the rest white
  3. You have a box and you know that there are the same number of white and black balls

The Bayesian probability of the next ball drawn being black is 0.5 in all three cases. Keynes called this the problem of the "weight of evidence". One approach is to reflect difference in evidential support by assigning probabilities to these probabilities (so-called metaprobabilities) in the following manner:

1. You have a box with white and black balls, but no knowledge as to the quantities
Letting θ = p represent the statement that the probability of the next ball being black is p, a Bayesian might assign a uniform Beta prior distribution:
\forall \theta \in [0,1]
P(\theta) = \Beta(\alpha_B=1,\alpha_W=1) = \frac{\Gamma(\alpha_B + \alpha_W)}{\Gamma(\alpha_B)\Gamma(\alpha_W)}\theta^{\alpha_B-1}(1-\theta)^{\alpha_W-1} = \frac{\Gamma(2)}{\Gamma(1)\Gamma(1)}\theta^0(1-\theta)^0=1.
Assuming that the ball drawing is modelled as a binomial sampling distribution, the posterior distribution, P(θ | m,n), after drawing m additional black balls and n white balls is still a Beta distribution, with parameters αB = 1 + m, αW = 1 + n. An intuitive interpretation of the parameters of a Beta distribution is that of imagined counts for the two events. For more information, see Beta distribution.
2. You have a box from which you have drawn N balls, half black and the rest white
Letting θ = p represent the statement that the probability of the next ball being black is p, a Bayesian might assign a Beta prior distribution, Β(N / 2 + 1,N / 2 + 1). The maximum aposteriori estimate (MAP estimate) of θ is \theta_{MAP}=\frac{N/2+1}{N+2}, precisely Laplace's rule of succession.
3. You have a box and you know that there are the same number of white and black balls
In this case a Bayesian would define the prior probability P\left(\theta\right)=\delta\left(\theta - \frac{1}{2}\right).

Other Bayesians have argued that probabilities need not be precise numbers.

Because there is no room for metaprobabilities on the frequency interpretation, frequentists have had to find different ways of representing difference of evidential support. Cedric Smith and Arthur Dempster each developed a theory of upper and lower probabilities. Glenn Shafer developed Dempster's theory further, and it is now known as Dempster-Shafer theory.

[edit] Footnotes

  1. ^ See p50-1, Gillies 2000 "The subjective theory of probability was discovered independently and at about the same time by Frank Ramsey in Cambridge and Bruno de Finetti in Italy." See Gillies' discussion for its explanation of how the wrong impression came about that Ramsey proposed it first.
  2. ^ 'A betting quotient is the quantity p = k/(1+k), where k are the odds on a hypothesis you believe fair and will therefore be taken as your degree of belief it is true. 'p' is called the betting-quotient associated with the odds k. Odds can be recovered uniquely from betting-quotients by means of the reverse transformation k = p/(1-p).' Paraphrase of p76 Howson & Urbach 1993
  3. ^ e.g. see Gillies 2000, p55: "My own view is that betting does give a reasonable measure of the strength of a belief in many cases, but not in all. In particular, betting cannot be used to measure the strength of someone's belief in a universal scientific law or theory."
  4. ^ See Gillies 'Induction and Probability' Parkinson (ed) An Encyclopedia of Philosophy 1988; p263-4, Howson & Urbach 1989
  5. ^ He said "...betting strictly speaking does not pertain to probability but to the Theory of Games". See "The role of 'Dutch Books' and 'Proper Scoring Rules' " in British Journal for the Philosophy of Science 32 1981 55-6.]
  6. ^ See Giffin and Caticha 2007 "Updating Probabilities with Data and Moments",(http://arxiv.org/abs/0708.1593)
  7. ^ See Updating Belief, Chapter 6 of Howson & Urbach 1993, p99-114 and its references to the discussions of Bayesian Conditionalisation of Hacking 1967, Kyburg, Skyrms 1987 and Jeffrey 1965 etc.

[edit] See also

[edit] External links and references

Personal tools