# Bayes & Base Rates

A Bayesian approach is an important mental model for a probabilistic exercise like investing — we shared some thoughts about this framework back in 2011. A well-tuned application must be complemented by an understanding of relevant base rates.

To understand how this works with a more practical example, imagine a drug test for cyclists that can detect doping 98 percent of the time, with a 3 percent false-positive rate. We may initially assume that if a rider tests positive, there is only a 3 percent chance that they are not doping. But Bayes’s theorem can be used to calculate the probability a cyclist is doping based on how pervasive we believe doping is in the field. Bayesian inference starts with a gut feeling (5 percent of bikers are doping) and, as more evidence is obtained (the results of actual drug tests), the probability becomes increasingly objective. If we believe that only 5 percent of the field is doping, it turns out that a positive test corresponds with just over a 60 percent chance that a cyclist is doping. This may seem counterintuitive, but it is simply because, based on our assumption about how prevalent doping is generally, there are so many more non-doping cyclists who could produce a false positive than doping cyclists who could produce a true positive. As we conduct more tests, we increase the accuracy of our guess about how much of the field is doping, as the output of subsequent rounds of calculation reflect the data more and more and our initial assumptions less and less.

Though people start from different assumptions, beliefs (such as how common doping is) should converge with additional evidence, becoming less subjective and more objective. In this way, Bayesian thinking provides a guide for how to think — or more specifically, how to turn our subjective beliefs into increasingly objective statements about the world. Using Bayes’s theorem to update our beliefs based on evidence, we arrive ever closer to the same conclusions that others with the same evidence arrive at, slowly moving away from our starting assumptions.

* * * * *

To appreciate how revolutionary Bayesian probability is, it is helpful to understand the kind of statistics it is supplanting: frequentism, the dominant mode of statistical analysis and scientific inference in the 20th century. This approach defines probability as the long-run frequency of a system, with statisticians designing experiments to gather evidence to prove or disprove a proposed claim about what that long-run frequency is. (Though it should be noted there are many debates internal to frequentism about exactly what is proved or disproved in an experiment.) This initial claim does not change during the analysis but is instead determined as disproven or not. For example, a statistician might claim that a flipped coin will land heads 50 percent of the time and hold this claim static while they gather evidence (i.e. flip the coin). Their prediction addresses not a particular coin flip but rather what is expected to happen over a series of flips, based on some theory.

The upside of this approach to probability is its apparent objectivity: Probability merely represents the expected frequency of a physical system — or something that can be imagined as analogous to one. What one believes about the next coin flip does not matter; the long-run frequency of heads will be the basis for evaluating whether the hypothesis about its probability is accurate (i.e. whether the flips we observe correspond with how we believe the system operates). In this way, scientific analysis proceeds by setting a hypothesis at the beginning, and, only after all of the data is gathered, evaluating whether that hypothesis objectively corresponds with the data.

While this approach seems sensible, it is also limiting. It prevents statisticians from directly assigning a probability to a single event, which has no long-run frequency. Actual events do not have a probability: They happen or they do not. It rains tomorrow or it doesn’t; a car crashes or it doesn’t; a scientific hypothesis is true or not. A coin lands on head or tails; it does not half-land on heads. So for a frequentist, a claim like “there is a 70 percent chance of rain tomorrow” requires imagining a series of similar days — what statisticians refer to as a “reference class” — that could function like a series of coin flips. Moreover, this means under a traditional frequentist interpretation, a hypothesis cannot be given a probability since it is not a frequency (e.g. either a coin is fair, or it isn’t)…

Bayesianism in machine learning allows computers to find patterns in large data sets by creating a nearly infinite set of hypotheses that can be given probabilities directly, and can be constantly updated as new data is processed. In this way, each possible category for a given thing can be given a probability, selecting the one with the highest probability as the one to use (at least until the probabilities update when new data is gathered). Thus no complex theory is needed to set up a hypothesis or an experiment; instead a whole field of hypotheses can be automatically generated and evaluated. With cheap computing and massive data stores, there is little cost to evaluating these myriad hypotheses; the most probable can be extracted from a data set and acted upon immediately to produce profit, by selling ads, moving money to new investments, or introducing two users to each other, and so on.