C&B Notes

Beer and Statistics

Student’s t-distribution is symmetrical and bell-shaped like a normal distribution but with heavier tails.  It is used when the mean or standard deviation of a sample is unknown and is robust against departures from normality.  We believe the assumption of normality that is so common in much of nature is not necessarily safe when working with financial market data.  Using a Student’s t-distribution in statistical analysis betters protect capital than the widely used risk models that assume normality.  So, how did Student’s t get its name?

After a year spent on sabbatical at Pearson’s lab, Gosset had worked out the math behind a “law of errors” when working with small samples. Today, we know his discovery as the “Student’s t-distribution”. It is the primary way to understand the likely error of an estimate depending on your sample size and remains highly depended upon by those in academia and industry. It is among the pillars of modern statistics, and among the first things learned in introductory statistics courses. It is the source of the concept of “statistical significance.”  But why is it the “Student’s” t-distribution rather than “Gosset’s”?

Upon completing his work on the t-distribution, Gosset was eager to make his work public. It was an important finding, and one he wanted to share with the wider world. The managers of Guinness were not so keen on this. They realized they had an advantage over the competition by using this method and were not excited about relinquishing that leg up. If Gosset were to publish the paper, other breweries would be on to them.  So they came to a compromise. Guinness agreed to allow Gosset to publish the finding, as long as he used a pseudonym. This way, competitors would not be able to realize that someone on Guinness’s payroll was doing such research and figure out that the company’s scientifically enlightened approach was key to their success.  So Gosset published his article introducing the t-distribution, “The Probable Error of the Mean”, under the name “Student.” “The Probable Error of the Mean” is a relatively dry piece of work, mostly made up of mathematical derivations and a Monte Carlo simulation to demonstrate the accuracy of his method.

Though Gosset’s paper was, at the outset, mostly ignored by statistical researchers, a young mathematician named R.A. Fisher read the paper and was exhilarated by Gosset’s results and approach. Fisher was especially taken by the Gosset’s idea that his distribution table could be used to get a sense of how likely a certain result would be, compared to random chance, and that if the chances were low, we might consider the result “significant.” Fisher’s response to Gosset’s work would have major ramifications for modern science.

* * * * *

When Gosset began working at Guinness, it was already the world’s largest brewery. Even compared to modern companies, Guinness was unusually focused on using science to improve its products. They hired the “brightest young men they could find” as scientists, and gave them liberal license to innovate and implement their findings. Perhaps the equivalent of being a computer scientist at Bell Labs in the 1970s or an artificial intelligence researcher at Google today, it was a wonderful job for the inquisitive and practical minded Gosset.  At that time, Guinness’s primary focus was maintaining the quality of its beer, while increasing quantity and decreasing costs. Between 1887 and 1914, the output of the brewery doubled, reaching almost one billion pints. How could the company increase production, while keeping its beer tasting as consumers expected? Gosset was assigned as part of the team that would answer that question.

Like most beers, Guinness is flavored with the flowers of the plant Humulus lupulus, also known as “hops”. The brewery used nearly five million pounds of the stuff in 1898. They determined which hops to use based on qualitative measures such as “looks and fragrance.”  At the scale at which Guinness was brewing, the “looks and fragrance” method was not economical or even accurate. The scientific brewing team, of which Gosset was a part, would improve this selection process.

Gosset’s first boss, the “scientific brewer” Thomas B. Case, believed that the best way to determine the quality of hops was by calculating the proportion of “soft” resins to “hard” resins in a batch (resins are a semisolid substance that comes out of the glands of the hops).  Case decided to take a small number of samples from different batches of hops from Kent, England, and calculate the percentage of soft resins to hard resins. He found an average of 8.1% of soft resins in one batch of eleven samples, and in another sample of fourteen, 8.4% of soft resins. What did these numbers mean for the consistency of hops across batches? Case didn’t really know. He looked at the data and couldn’t “support” any particular conclusion, but Case knew they would want to solve this problem in order to analyze such data in the future.

And so he turned to Gosset. The historian Joan Fisher Box explains that Gosset was called upon because he had studied a bit of Math at Oxford and was “less scared” of this kind of problem than the other brewers.  For a quantitative researcher working today, it is almost unfathomable to imagine, but at that time, a theory of making inferences from small samples did not exist. Of course, people periodically used small samples as evidence for conclusions, but they had no way of measuring the likely accuracy of their estimate. All methods for extrapolating from a sample relied on the idea that you had a large sample size, well over 30 observations, and could use the “standard normal distribution.” While this was true for most academic studies of the day, in many industrial settings, it was not possible to get such a large sample. Even a “scientifically minded” company like Guiness was limited in the amount of its product it could dedicate to testing.

So Gosset set to work. His goal was to understand just how much less representative a sample is when the sample is small. In more technical terms, how much wider is the error distribution of an estimate when you only have a sample of two or ten, compared to when you have a sample of a thousand?  Gosset’s first problem involved figuring out exactly how many observations of malt extract, a substance used in beer making, were necessary to be confident the “degrees saccharine” of the extract was within 0.5 degrees of a targeted goal of 133 degrees.  His initial approach was just to simulate a whole bunch of data. He had an extract for which he had a very large number of samples and could be relatively confident of the exact degrees saccharine. He then took many different two-observation samples from the extract in order to test the accuracy of such a small sample. He found that about 80% of the time, the measurement from just two observations was within 0.5 degrees of the true number.  He then tried the same thing with three measurements. This time, there was an approximate 87.5% chance of getting with 0.5 degrees. With four measurements, he found a better than 92% chance. With 82 measurements, the likelihood of getting within 0.5 degrees was “practically infinite”.

His bosses at Guinness were thrilled with the findings. This would allow them to make intelligent decisions about which materials to use for their beer, in a way that no other business could.  Yet Gosset was not satisfied with his approximated method. He wanted to uncover the exact mathematics behind inference from small samples. He told Guinness that he wanted to consult with “some mathematical physicist” on the matter. The company obliged and sent Gosset to Karl Pearson’s lab at the University College London. Pearson was one of the leading scientific figures of his time and the man later credited with establishing the field of statistics.

Referenced In This Post