Statistics, lies and the virus: 5 lessons from a pandemic

In an age of disinformation, the value of rigorous data has never been more evident

Data from testing people for the Covid-19 virus can yield life-saving insights but there is also concern over how officials, swayed by emotions, politics and, above all, preconceptions, interpret the numbers. PHOTO:EPA-EFE
Data from testing people for the Covid-19 virus can yield life-saving insights but there is also concern over how officials, swayed by emotions, politics and, above all, preconceptions, interpret the numbers. PHOTO:EPA-EFE

FINANCIAL TIMES - Will this year be 1954 all over again? Forgive me, I have become obsessed with 1954, not because it offers another example of a pandemic (that was 1957) or an economic disaster (there was a mild US downturn in 1953), but for more parochial reasons.

Nineteen fifty-four saw the appearance of two contrasting visions for the world of statistics - visions that have shaped our politics, our media and our health.

This year confronts us with a similar choice.

The first of these visions was presented in How To Lie With Statistics, a book by a US journalist named Darrell Huff. Brisk, intelligent and witty, it is a little marvel of numerical communication.

The book received rave reviews at the time, has been praised by many statisticians over the years and is said to be the best-selling work on the subject ever published.

It is also an exercise in scorn. Read it and you may be disinclined to believe a number-based claim ever again.

There are good reasons for scepticism today. David Spiegelhalter, author of last year's The Art Of Statistics: How To Learn From Data, laments some of the British government's coronavirus graphs and testing targets as "number theatre", with "dreadful, awful" deployment of numbers as a political performance.

"There is great damage done to the integrity and trustworthiness of statistics when they're under the control of the spin doctors," Spiegelhalter says. He is right.

But we geeks must be careful because the damage can come from our own side too.

For Huff and his followers, the reason to learn statistics is to catch the liars at their tricks. That sceptical mindset took Huff to a very unpleasant place. Once the cynicism sets in, it becomes hard to imagine that statistics could ever serve a useful purpose.

But they can - and back in 1954, the alternative perspective was embodied in the publication of an academic paper by British epidemiologists Richard Doll and Austin Bradford Hill. They marshalled some of the first compelling evidence that smoking cigarettes dramatically increases the risk of lung cancer. The data they assembled persuaded both men to quit smoking and helped save tens of millions of lives by prompting others to do likewise. This was no statistical trickery but a contribution to public health that is almost impossible to exaggerate.

You can appreciate, I hope, my obsession with these two contrasting accounts of statistics: one as a trick, one as a tool. Epidemiologists Doll and Hill's painstaking approach illuminates the world and saves lives into the bargain.

Huff's alternative seems clever but is the easy path: seductive, addictive and corrosive. Scepticism has its place but easily curdles into cynicism and can be weaponised into something even more poisonous than that.

The two world views soon began to collide. Huff's How To Lie With Statistics seemed to be the perfect illustration of why ordinary, honest folk should not pay too much attention to the slippery experts and their dubious data.

Data from testing people for the Covid-19 virus can yield life-saving insights but there is also concern over how officials, swayed by emotions, politics and, above all, preconceptions, interpret the numbers. PHOTO:EPA-EFE

Such ideas were quickly picked up by the tobacco industry, with its darkly brilliant strategy of manufacturing doubt in the face of evidence such as that provided by epidemiologists Doll and Hill. As described in books such as Merchants Of Doubt by Erik Conway and Naomi Oreskes, this industry perfected the tactics of spreading uncertainty: calling for more research, emphasising doubt and the need to avoid drastic steps, highlighting disagreements between experts and funding alternative lines of inquiry. These tactics, and sometimes even the same personnel, were later used to cast doubt on climate science.

These tactics are powerful in part because they echo the ideals of science. It is a short step from the Royal Society's motto, "nullius in verba" (take nobody's word for it), to the corrosive nihilism of "nobody knows anything".

So will 2020 be another 1954? From the point of view of statistics, we seem to be standing at another fork in the road. The disinformation is still out there, as the public understanding of Covid-19 has been muddied by conspiracy theorists, trolls and government spin doctors.

Yet the information is out there too. The value of gathering and rigorously analysing data has rarely been more evident. Faced with a complete mystery at the start of the year, statisticians, scientists and epidemiologists have been working miracles. I hope that we choose the right fork, because the pandemic has lessons to teach us about statistics if we are willing to learn.


Without good data, for example, we would have no idea that this infection is 10,000 times deadlier for a 90-year-old than it is for a nine-year-old - even though we are far more likely to read about the deaths of young people than the elderly, simply because those deaths are surprising. It takes a statistical perspective to make it clear who is at risk and who is not.

Good statistics, too, can tell us about the prevalence of the virus - and identify hot spots for further activity.


But while we can use statistics to calculate risks and highlight dangers, it is all too easy to fail to ask the question: "Where do these numbers come from?" By that, I do not mean the now-standard request to cite sources; I mean the deeper origin of the data.

For all his faults, Huff did not fail to ask the question.

He retells a cautionary tale that has become known as "Stamp's Law" after economist Josiah Stamp - warning that no matter how much a government may enjoy amassing statistics, "raise them to the nth power, take the cube root and prepare wonderful diagrams", it was all too easy to forget that the underlying numbers would always come from a local official, "who just puts down what he damn pleases".

The cynicism is palpable but there is insight here too. Statistics are not simply downloaded from an Internet database or pasted from a scientific report. Ultimately, they came from somewhere. Somebody counted or measured something, ideally systematically and with care. These efforts at systematic counting and measurement require money and expertise - they are not to be taken for granted.

In my new book, How To Make The World Add Up, I introduce the idea of "statistical bedrock" - data sources such as the census and the national income accounts that are the results of painstaking data collection and analysis, often by official statisticians who get little thanks for their pains and are all too frequently the target of threats, smears or persecution.

In Argentina, for example, long-serving statistician Graciela Bevacqua was ordered to "round down" inflation figures, then demoted in 2007 for producing a number that was too high. She was later fined US$250,000 for false advertising - her crime being to have helped produce an independent estimate of inflation.

In 2011, Mr Andreas Georgiou was brought in to head Greece's statistical agency at a time when it was regarded as being about as trustworthy as the country's giant wooden horses. When he started producing estimates of Greece's deficit that international observers finally found credible, he was prosecuted for his "crimes" and threatened with life imprisonment.

The foundations of our statistical understanding of the world are often gathered in response to a crisis. We take it for granted now that there is such a thing as an "unemployment rate" but 100 years ago nobody could have told you how many people were searching for work. Severe recessions made the question politically pertinent so governments began to collect data.

So it is with the Sars-CoV-2 virus. At first, we had little more than a few data points from Wuhan, showing an alarmingly high death rate of 15 per cent - six deaths in 41 cases. Quickly, epidemiologists started sorting through the data, trying to establish how exaggerated that case fatality rate was by the fact that the confirmed cases were mostly people in intensive care.

Quirks of circumstance - such as the Diamond Princess cruise ship, in which almost everyone was tested - provided more insight.

Johns Hopkins University in the US launched a dashboard of data resources, as did the Covid Tracking Project, an initiative from The Atlantic magazine. An elusive and mysterious threat became legible through the power of this data.

That is not to say that all is well. Nature recently reported on "a coronavirus data crisis" in the US, in which "political meddling, disorganisation and years of neglect of public-health data management mean the country is flying blind".

Nor is the US alone. Spain simply stopped reporting certain Covid-19 deaths in early June, making its figures unusable.

Several countries, particularly in East Asia, provide accessible, usable data about recent infections to allow people to avoid hot spots.

These things do not happen by accident: They require us to invest in the infrastructure to collect and analyse the data.


Jonas Olofsson, a psychologist who studies our perceptions of smell, once told me of a classic experiment in the field. Researchers gave people a whiff of scent and asked them for their reactions to it. In some cases, the experimental subjects were told: "This is the aroma of a gourmet cheese." Others were told: "This is the smell of armpits."

In truth, the scent was both: an aromatic molecule present both in runny cheese and in bodily crevices. But the reactions of delight or disgust were shaped dramatically by what people expected.

But while solid data offers us insights we cannot gain in any other way, the numbers never speak for themselves. They, too, are shaped by our emotions, our politics and, perhaps above all, our preconceptions.

A striking example is the decision, on March 23, to introduce a lockdown in Britain. In hindsight, that was too late.

"Locking down a week earlier would have saved thousands of lives," says Kit Yates, author of The Maths Of Life And Death - a view now shared by influential epidemiologist Neil Ferguson, and by David King, chair of the "Independent Sage" group of scientists.

The logic is straightforward enough: At the time, cases were doubling every three to four days. If a lockdown had stopped that process in its tracks a week earlier, it would have prevented two doublings and saved three-quarters of the 65,000 people who died in the first wave of the epidemic, as measured by the excess death toll.

Why, then, was the lockdown so late? No doubt there were political dimensions to that decision, but senior scientific advisers to the government seemed to believe that Britain still had plenty of time. On March 12, Prime Minister Boris Johnson was flanked by the government's chief medical adviser Chris Whitty, and chief scientific adviser Patrick Vallance, in the first big set-piece press conference. Italy had just suffered its 1,000th Covid-19 death and Dr Vallance noted that Britain was about four weeks behind Italy on the epidemic curve.

With hindsight, this was wrong: Now that late-registered deaths have been tallied, we know that Britain passed the same landmark on lockdown day, March 23, just 11 days later. It seems that in early March the government did not realise how little time it had. As late as March 16, Mr Johnson declared that infections were doubling every five to six days.

The trouble, says Yates, is that British data on cases and deaths suggested that things were moving much faster than that, doubling every three or four days - a huge difference. What exactly went wrong is unclear - but my bet is that it was a cheese-or-armpit problem.

Some influential epidemiologists had produced sophisticated models suggesting that a doubling time of five to six days seemed the best estimate, based on data from the early weeks of the epidemic in China.

Yates argues that the epidemiological models that influenced the government's thinking about doubling times were sufficiently detailed and convincing that when the patchy, ambiguous, early British data contradicted them, it was hard to readjust.

We all see what we expect to see.

The data is invaluable but, unless we can overcome our own cognitive filters, the data is not enough.


The expert who made the biggest impression on me during this crisis was infectious-disease specialist Nathalie MacDermott, at King's College London, who in mid-February calmly debunked the more lurid public fears about how deadly the new coronavirus was.

Then, with equal calm, she explained to me that the virus was very likely to become a pandemic, that barring extraordinary measures, we could expect it to infect more than half the world's population, and that the true fatality rate was uncertain but seemed to be something between 0.5 and 1 per cent. In hindsight, she was broadly right about everything that mattered.

Her educated guesses pierced through the fog of complex modelling and data-poor speculation.

I was curious as to how she did it, so I asked her. "People who have spent a lot of their time really closely studying the data sometimes struggle to pull their head out and look at what's happening around them," she said. "I trust data as well, but sometimes when we don't have the data, we need to look around and interpret what's happening."

She worked in Liberia in 2014 on the front line of an Ebola outbreak that killed more than 11,000 people. At the time, international organisations were sanguine about the risks, while the local authorities were in crisis. When she arrived in Liberia, the treatment centres were overwhelmed, with patients lying on the floor, bleeding freely from multiple areas and dying by the hour.

The horrendous experience has shaped her assessment of subsequent risks: On the one hand, Sars-CoV-2 is far less deadly than Ebola; on the other, she has seen the experts move too slowly while waiting for definitive proof of a risk. "From my background working with Ebola, I'd rather be overprepared than underprepared because I'm in a position of denial," she said.

There is a broader lesson here. We can try to understand the world through statistics, which at their best provide a broad and representative overview that encompasses far more than we could personally perceive.

Or we can try to understand the world up close, through individual experience. Both perspectives have their advantages and disadvantages.

Professor Muhammad Yunus, a microfinance pioneer and Nobel laureate, has praised the "worm's eye view" over the "bird's eye view". But birds see a lot too.

Ideally, we want both the rich detail of personal experience and the broader, low-resolution view that comes from the spreadsheet. Insight comes when we can combine the two - which is what Dr MacDermott did.


Reporting on the numbers behind the Brexit referendum, the vote on Scottish independence, several general elections and the rise of Mr Donald Trump, there was poison in the air. Many claims were made in bad faith, indifferent to the truth or even embracing the most palpable lies in an effort to divert attention from the issues. Fact-checking in an environment where people did not care about the facts, only whether their side was winning, was a thankless experience.

For a while, one of the consolations of doing data-driven journalism during the pandemic was that it felt blessedly free of such political tribalism.

That did not last. America polarised quickly, with mask-wearing becoming a badge of political identity - and more generally the Democrats seeking to underline the threat posed by the virus, with Republicans following President Trump in dismissing it as overblown. The prominent infectious-disease expert Anthony Fauci does not strike me as a partisan figure but the US electorate thinks otherwise.

He is trusted by 32 per cent of Republicans and 78 per cent of Democrats.

Rather than bringing some kind of consensus, more years of education simply seem to provide people with the cognitive tools they require to reach the politically convenient conclusion. From climate change to gun control to certain vaccines, there are questions for which the answer is not a matter of evidence but a matter of group identity.

In this context, the strategy that the tobacco industry pioneered in the 1950s is especially powerful. Emphasise uncertainty, expert disagreement and doubt and you will find a willing audience. If nobody really knows the truth, then people can believe whatever they want.

All of which brings us back to Huff. While his incisive criticism of statistical trickery has made him a hero to many of my fellow nerds, his career took a darker turn.

Huff worked on a tobacco-funded sequel, How To Lie With Smoking Statistics, casting doubt on the scientific evidence that cigarettes were dangerous. (Mercifully, it was not published.)

He also appeared in front of a US Senate committee that was pondering mandating health warnings on cigarette packaging and explained to the lawmakers that there was a statistical correlation between babies and storks (which, it turns out, there is) even though the true origin of babies is rather different. The connection between smoking and cancer, he argued, was similarly tenuous.

Huff's statistical scepticism turned him into the ancestor of today's contrarian trolls, spouting bullsh** while claiming to be the straight-talking voice of common sense. It should be a warning to us all. There is a place in anyone's cognitive toolkit for healthy scepticism, but that scepticism can all too easily turn into a refusal to look at any evidence at all.

This crisis has reminded us of the lure of partisanship, cynicism and manufactured doubt. But surely it has also demonstrated the power of honest statistics. Statisticians, epidemiologists and other scientists have been producing inspiring work in the footsteps of epidemiologists Doll and Hill. I suggest we set aside How To Lie With Statistics and pay attention.

Carefully gathering the data we need, analysing it openly and truthfully, sharing knowledge and unlocking the puzzles that nature throws at us - this is the only chance we have to defeat the virus and, more broadly, an essential tool for understanding a complex and fascinating world.

Join ST's Telegram channel and get the latest breaking news delivered to you.