How Statistics became a weapon of propaganda in a pandemic

Let me preface this piece by saying that Statistics as a method is not fundamentally flawed, but ask any Machine Learning engineer in your local cafe, and they’ll be quick to tell you that the process of data collection and storage is usually flawed. A new engineer is quick on their feet to go to Kaggle and download a dataset, perform a few cleaning operations, do a little analysis, classification, prediction and update their Linkedin headlines with “Machine Learning Enthusiast”, “AI Expert” and “Data Scientist”. While I find little points of contention with the first two titles, the last one is questionable, since you’re working with what is essentially a DIY data cleaning project, instead of true data processing, but I digress.

Data, especially in these “unprecedented times” of COVID-19 pandemic, are much far from the utopia that is Kaggle. Data in real life is messy, full of clutter you don’t want and has flaws that we will talk about. I will also go over how a lot statistics are not absolute truths. Darrell Huff’s book “How to lie with statistics” has 10 chapters barring introduction and acknowledgements, but we will look at 3 concepts.

The Sample with the Built-in Bias

A few months ago, just when the lockdowns were to start in India, an Islamic missionary group named Tablighi Jamaat had a congregation in South Delhi. This ended up becoming a hotspot for Coronavirus in India. Twitter trolls and journalists were too eager to attack the entire Muslim community. Muslims were lynched, attacked and discriminated against.

For example, here is an article by the Economic Times, that starts off with —

NEW DELHI: Over 95% of the coronavirus cases reported over the last two days in India have been found to have links with the Tablighi Jamaat congregation in Delhi.

But nowhere in the article did it mention the sampling bias in testing. Once the Tablighi Jamat was discovered, there was aggressive testing for the Jamaatis (the people of the Missionary), and a lot of them tested positive. The fact that the missionary had a structure, helped in contract tracing of spreaders. This level of testing was not performed on other groups of people. In some states, it was legally mandated for Jamatis to test. Some journalists stated that Jamat was the single reason why India was in dire straits in terms of fighting COVID-19.

This is what is essentially a biased sample, you aggresively test one section of people, and then you get more than half the folks testing positive, but you haven’t tested any other cluster as well, so you establish a co-relation between the high number of cases and tablighi jamat, and that is the only co-relation you present. Intellectual Dishonesty aside, this is simply muddy journalism.

The Semi-Attached Figure

In his book, Huff writes

If you can’t prove what you want to prove, demonstrate something else and
retend that they are the same thing. In the daze that follows the collision of statistics with the human mind, hardly anybody will notice the difference.

Researching for this article was incredibly difficult. Not for lack of data or reporting, but the clear clickbaiting and bias of the Google search engine was an obstacle. Nonetheless, I came across some interesinng articles that did this, and at the forefront is CNN —

The article leads with the title

Black Lives Matter protests have not led to a spike in coronavirus cases, research says

Alright, that seems like Good News! However, I am not one to trust media websites, so I did a little digging.

A new study, published this month by the National Bureau of Economic Research, used data on protests from more than 300 of the largest US cities, and found no evidence that coronavirus cases grew in the weeks following the beginning of the protests.

Now the following part of the article is almost a cartoonish, hilarious representation of the joke that modern media is.

In fact, researchers determined that social distancing behaviors actually went up after the protests — as people tried to avoid the protests altogether.

So the researchers state that the people are afraid to go back out, and thus the number of cases have dropped, because people who are scared are staying indoors.

“Our findings suggest that any direct decrease in social distancing among the subset of the population participating in the protests is more than offset by increasing social distancing behavior among others who may choose to shelter-at-home and circumvent public places while the protests are underway”

In simpler words: Any increase is cases due to lack of Social Distancing at protests have been overshadowed by the decrease in cases due people staying indoors because they are scared of the violence of the protests.

The lack of Social Distancing at a BLM protest

Here is another article, by the Healthline, which I assumed was a moderate website, but the bias is showing.

“I have not seen any peer-reviewed research linking outdoor protests (or really any major outdoor events) to the surge here in Texas” said Rodney Rohde, PhD, an associate dean for research at Texas State’s College of Health Professions who focuses on public health microbiology.

I’ll break this down as well, but let me take a second to address the one thing I have noticed in deceptive news articles: That is that they bury the fact in between lines of garbage, and resort to using titles of other people of authority (here, they use Rodney Rhode) as a supplement for factual, ethical reporting. Anyway, in this article, Rodney simply states that he has not seen any paper co-relating outdoor protest and surge in cases. Ridiculousness of aquiring such an isolated dataset aside, you cannot simply say you haven’t found any papers of such kind, and thus the conclusion is not true. On part of healthline.com, this is simple intellectual dishonesty.

Rhodes goes on to say

The COVID-19 spike in Texas is likely tied to the reopening, not the protests

One can be equally intellectually dishonest and feign ignorance and state that thus, there is no co-relation between the two, but there definitely is some degree of co-relation.

One needs to understand, corelation is not a binary value. Here is the forumla for co-relation between two datasets, x and y. Using a co-relation between datasets A and B, and presenting it to be a co-relation between C and B, is as bad as journalism gets, but are we surprised anymore?

The outrage against people wanting the economy to safely open up was ridiculous, and the discrediting of co-relation between BLM protests and Coronavirus is insane.

The One Dimensional Picture

Alright, now let’s compare the numbers for countries with COVID-19 cases for a second. At the time of writing, these are the numbers, higher to lower.

From worldometers.info/coronavirus/ on 15th Aug, 2020

As an aside, I always wonder how many stories are hidden in each of these numbers. Statistics is merciless for no fault of its own — stories, people, lives become just a number. I remember the first time I made my Coronavirus tracking module, I was sad as fuck after deploying. 3 months later, I have been numbed to this.

So here’s how it works, the more people you test, the more will test positive. Some deaths are marked COVID-related death despite not being COVID-related. For example, some countries test only if you have symptoms. Otherwise, please explain how do smaller countries with no form of contract tracing, a mostly labour based economy (thus relatively less strict lockdown) and little healthcare infrastructure have lesser cases than some of the most successful countries relatively.

A better measure of cases would be number of cases vs number of tests done. Instead of measures such as death rate, which is almost impossible to concretely lay a finger upon, a better measure would be rate of hospitalization and rate of infection, that would help countries prepare faster. Death Rate is variable due to non-standard way of classifying a death as COVID-related or otherwise. Death Rate can be calculated by anybody with a computer, but some data is held by the government, such as number of tests.

Those are the three ways I have read news agencies being deceptive. I am a little tired now, and had a little scare where I thought I lost this article. I can’t think of any more, if you do, let me know!

Cheers,
Mir.

Research Intern @ Persistence.one

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store