BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

When Data Journalism Goes Wrong

Following
This article is more than 9 years old.

One of the great challenges in any discussion of what people consume is gauging the distribution. There is no “average American who plays videogames” or “eats sushi,” and therefore the aggregate consumption of sushi or hours spent playing videogames is of limited value; some do, some do a lot, some don’t at all.

Which is why it was interesting to see the following, recent headline on the Washington Post’s Wonkblog: “Think you drink a lot? This chart will tell you.”

The chart, reproduced below breaks down the distribution of drinkers into deciles, and ends with the startling conclusion that 24 million American adults—10 percent of the adult population over 18—consume a staggering 74 drinks a week.

The source for this figure is “Paying the Tab,” by Phillip J. Cook, which was published in 2007. If we look at the section where he arrives at this calculation, and go to the footnote, we find that he used data from 2001-2002 from NESARC, the National Institute on Alcohol Abuse and Alcoholism, which had a representative sample of 43,093 adults over the age of 18. But following this footnote, we find that Cook corrected these data for under-reporting by multiplying the number of drinks each respondent claimed they had drunk by 1.97 in order to comport with the previous year’s sales data for alcohol in the US. Why? It turns out that alcohol sales in the US in 2000 were double what NESARC’s respondents—a nationally representative sample, remember—claimed to have drunk.

While the mills of US dietary research rely on the great National Health and Nutrition Examination Survey to digest our diets and come up with numbers, we know, thanks to the recent work of Edward Archer, that recall-based survey data are highly unreliable: we misremember what we ate, we misjudge by how much; we lie. Were we to live on what we tell academics we eat, life for almost two thirds of Americans would be biologically implausible.

But Cook, who is trying to show that distribution is uneven, ends up trying to solve an apparent recall problem by creating an aggregate multiplier to plug the sales data gap. And the problem is that this requires us to believe that every drinker misremembered by a factor of almost two. This might not much of a stretch for moderate drinkers; but did everyone who drank, say, four or eight drinks per week systematically forget that they actually had eight or sixteen? That seems like a stretch.

We are also required to believe that just as those who drank consumed significantly more than they were willing to admit, those who claimed to be consistently teetotal never touched a drop. And, we must also forget that those who aren’t supposed to be drinking at all are also younger than 18, and their absence from Cook’s data may well constitute a greater error. A recent study by Fairfax County in Virginia, for instance, found that while underage drinking was declining, one in five 12th graders had binged (five or more drinks in a row) in the past two weeks. We also know that waste is a huge issue with food, with estimates running from 30-40 percent of calories produced; we do not know how much alcohol is, if you’ll forgive the pun, wasted.

While it is important to consider how ultra consumption or insatiability can skew our understanding of what people eat, drink, and do (a 2010 disaggregation of NHANES data, for instance, reveals the top quintile of teenage boys as the ne plus ultra of sugar sweetened beverage drinkers), Cook’s multiplier effect is at variance with other surveys of alcohol consumption.

This year, Gallup found that 9 percent of respondents said they had drunk between eight and 19 drinks in the past week, while 5 percent said they had drunk 20 or more. The Substance Abuse and Mental Health Services Administration (SAMHSA) 2012 National Survey on Drug Use and Health (NSDUH) found that 6.5 percent of those aged 12 and over engaged in “heavy drinking” in 2012, which is defined as “drinking five or more drinks on the same occasion on each of five or more days in the past 30 days.” And an analysis of the troubled NHANES survey data for 2009-2010 found that 3.3 percent of men consumed over eight or more drinks on a given day, and 1.3 percent of women six or more.

All converge on a similar proportion; none come remotely close to Cook’s estimate; none are mentioned in Wonkblog. The reporter simply double checks Cook’s extraordinary estimate with Cook, who tells him the number is hard to imagine but is not implausible. That settles it, I suppose.

To perceive alcohol as a problem of ultra consumption is not without political consequence: it makes it easier to advocate taxation as an intervention, as it can be argued that a tax increase would only have a substantial impact on the alcoholic, and that any impact on the abstemious would be an acceptable price for fixing an immense social ill. Cook, it turns out, is just such an advocate for high taxes on alcohol, and the Wonkblog piece ends with his claim that a policy curbing the avidity of those in the tenth decile such that they’d consume no more than the ninth would cause alcohol sales to drop by 60 percent.

There is a fascinating—or, rather, ultra wonky—piece to be written on the impact of an alcohol tax on a consumption spread as envisaged by Cook versus that suggested by the more temperate data; for now, one can but note that there is evidence that the heavier the drinker the less responsive they are to changes in price (thereby illuminating the sin of sin taxes that tax the sinner who keeps on sinning; or, the St. Augustine effect—make them good, just not yet).

Oddly—or perhaps not—tax advocacy provides another, even more striking, example of Wonkblog’s failure to follow the footnotes and check out the data. In “Why the U.S. should start taxing soda like cigarettes and alcohol” the Post reported that “America’s love affair with soda has occurred alongside an eerily similar climb in the country’s sugar intake.” In evidence, the Post displayed the following graph, assembled by the Union of Concerned Scientists (UCS), a group that has been challenged over whether its commitment to science is more ideological than research based.

Neither the version on Wonkblog nor the one in the original UCS paper indicates where the “per capita” consumption data comes from. It seems likely that it comes from the US Department of Agriculture (USDA), given the discussion under the graph about the methodology for calculating consumption, and the fact that such consumption data generally come from the USDA’s Economic Research Service. In which case, the UCS and Wonkblog graph doesn’t correspond to the USDA data.

It starts off on a similar footing. The current loss adjusted per capita figure for sugar in 1970 is 17.7 teaspoons per person per day, but then it steadily declines to 10.4 teaspoons in 1986 before fluctuating to reach 11.8 teaspoons in 2013. So sugar consumption is inching up, but only after a massive drop.

But this is just cane and beet sugar. You have to look at other calorific sweeteners too, such as high fructose corn syrup (HFCS)—the most commonly used sweetener in calorific beverages—and honey and dextrose. In 1970, per capita consumption of HFCS was 0.1 teaspoon; by 1999, it was 11.1 teaspoons; and by 2013, it had slid to 7.6 teaspoons. Now add the remaining sweeteners—3 teaspoons per person in 1970, 2.9 teaspoons in 2013.

If you put all this data together, you can see the following trends: In 1970, total calorific sweetener consumption per person was 20.8 teaspoons a day; in 1980, it was 21 teaspoons a day; in 1990, it was 23.1 teaspoons a day, in 2000, it was 26 teaspoons; in 2010, in was 22.9; and in 2013, it was 22.3.

This raises several interesting questions. Was the obesity crisis really born in the late 1980s and 1990s from a gradual aggregate increase of about five teaspoons of calorific sweetener a day (80 calories)? And if so, why have we not seen a corresponding decline in weight over the past decade given that we are returning to levels of consumption not seen since the 1970s?

But answering these questions is not the point of this piece; the point is that data drives analysis, and that by outsourcing some fairly basic data analysis to an out-of-date graph, and by not checking the numbers against the original source, Wonkblog has produced an inaccurate account of sugar consumption in the US.

One wonders whether the pressure to churn out material means that Wonkblog is more blog than wonk, and, more broadly, that the entire field of “explainer” or “wonk” journalism is at risk of undermining itself by being thought of as a distinctive kind of beat or even as a news operation. Surely data and data analysis is just something journalists should do if the story requires it, not something that should be partitioned off like “Sunday Styles” or “How to Spend It.”

There is an obvious parallel with “Big Data,” which last February peaked on Gartner ’s “Hype Cycle,” the research firm’s haloed tool for mapping the impact of new technology. Big Data was something new and seemed to offer so many promises that it could not but ascend the “peak of inflated expectations,” before plummeting into a “trough of disillusionment.” The hype cycle for Big Data will end—is ending—with insight and integration: data analysis of the “Big” variety will simply be what you do when you are in business; it won’t be this special thing you do to make lots of money.

Time will tell whether wonk journalism too reached a peak of inflated expectations with the launch of 538 and Vox; whether it is in or about to enter a trough of disillusionment (a recent report on USA Today suggests yes); and whether it will be just something journalists end up doing as part of doing journalism.

But writing about numbers is immensely challenging: at a certain point, they cannot be reported as something else—something more interesting to someone who is not really interested in numbers. The risk for wonk journalism is that you either lose in audience as you expand in analysis, or you dumb down and end up dumb. The rub is, you can’t tell good data from bad without doing analysis.