C&B Notes

Chocolate Promotes Weight Loss…

…well not really, unfortunately!   The quest for statistical significance as the arbiter of meaningfulness creates all sorts of conscious and subconscious incentives for researchers to torture data and to troll for results as part of studies.   A journalist recently set a trap that highlights how these findings get published and disseminated.

“Slim by Chocolate!” the headlines blared.  A team of German researchers had found that people on a low-carb diet lost weight 10 percent faster if they ate a chocolate bar every day.  It made the front page of Bild, Europe’s largest daily newspaper, just beneath their update about the Germanwings crash.  From there, it ricocheted around the internet and beyond, making news in more than 20 countries and half a dozen languages.  It was discussed on television news shows.  It appeared in glossy print, most recently in the June issue of Shape magazine (“Why You Must Eat Chocolate Daily,” page 128).  Not only does chocolate accelerate weight loss, the study found, but it leads to healthier cholesterol levels and overall increased well-being.  The Bild story quotes the study’s lead author, Johannes Bohannon, Ph.D., research director of the Institute of Diet and Health: “The best part is you can buy chocolate everywhere.”

I am Johannes Bohannon, Ph.D.  Well, actually my name is John, and I’m a journalist.  I do have a Ph.D., but it’s in the molecular biology of bacteria, not humans.  The Institute of Diet and Health? That’s nothing more than a website.

Other than those fibs, the study was 100 percent authentic.  My colleagues and I recruited actual human subjects in Germany.  We ran an actual clinical trial, with subjects randomly assigned to different diet regimes.  And the statistically significant benefits of chocolate that we reported are based on the actual data.  It was, in fact, a fairly typical study for the field of diet research.  Which is to say: It was terrible science.  The results are meaningless, and the health claims that the media blasted out to millions of people around the world are utterly unfounded.

* * * * *

I know what you’re thinking.  The study did show accelerated weight loss in the chocolate group — shouldn’t we trust it?  Isn’t that how science works?

Here’s a dirty little science secret: If you measure a large number of things about a small number of people, you are almost guaranteed to get a “statistically significant” result.  Our study included 18 different measurements — weight, cholesterol, sodium, blood protein levels, sleep quality, well-being, etc. — from 15 people.  (One subject was dropped.)  That study design is a recipe for false positives.

Think of the measurements as lottery tickets.  Each one has a small chance of paying off in the form of a “significant” result that we can spin a story around and sell to the media.  The more tickets you buy, the more likely you are to win.  We didn’t know exactly what would pan out — the headline could have been that chocolate improves sleep or lowers blood pressure — but we knew our chances of getting at least one “statistically significant” result were pretty good.  Whenever you hear that phrase, it means that some result has a small p value.  The letter p seems to have totemic power, but it’s just a way to gauge the signal-to-noise ratio in the data.  The conventional cutoff for being “significant” is 0.05, which means that there is just a 5 percent chance that your result is a random fluctuation.  The more lottery tickets, the better your chances of getting a false positive.  So how many tickets do you need to buy?

With our 18 measurements, we had a 60% chance of getting some “significant” result with p < 0.05  (The measurements weren’t independent, so it could be even higher.)  The game was stacked in our favor.  It’s called p-hacking — fiddling with your experimental design and data to push p under 0.05 — and it’s a big problem.  Most scientists are honest and do it unconsciously.  They get negative results, convince themselves they goofed, and repeat the experiment until it “works.”  Or they drop “outlier” data points.

But even if we had been careful to avoid p-hacking, our study was doomed by the tiny number of subjects, which amplifies the effects of uncontrolled factors.  Just to take one example: A woman’s weight can fluctuate as much as 5 pounds over the course of her menstrual cycle, far greater than the weight difference between our chocolate and low-carb groups.  Which is why you need to use a large number of people, and balance age and gender across treatment groups.  (We didn’t bother.)

You might as well read tea leaves as try to interpret our results.  Chocolate may be a weight loss accelerator, or it could be the opposite.  You can’t even trust the weight loss that our non-chocolate low-carb group experienced versus control.  Who knows what the handful of people in the control group were eating?  We didn’t even ask them.  Luckily, scientists are getting wise to these problems.   Some journals are trying to phase out p value significance testing altogether to nudge scientists into better habits.  And almost no one takes studies with fewer than 30 subjects seriously anymore.  Editors of reputable journals reject them out of hand before sending them to peer reviewers.  But there are plenty of journals that care more about money than reputation.