Covid-19 – What the metrics do not reveal?

April 26, 2020

Rarely have ordinary people seen so many charts and numbers.

We get a daily dose of the staggering number of confirmed cases that capture the magnitude of the coronavirus crisis across the globe. Yet these numbers sometimes conceal more than what they reveal.

For instance, on April 19, Singapore with 6,588 cases, overtook Indonesia at 6,575 and the Philippines at 6,259, to become the country with the highest number of confirmed coronavirus cases in SE Asia. This was alarming especially since the country has a much lower population than its neighbours.

Let’s examine these facts in the context of the numbers tested.

According to various sources, the incidence of testing in Singapore was about 13.9 per 1,000 (April 20) as compared with about 0.19 per 1,000 in Indonesia. So the Singapore count is based on about 81,000 unique person tests while the Indonesian count is based on 52,000. That translates to 0.08% confirmed cases per test in Singapore (April 19) compared to 0.13% in Indonesia.

What about the metric that is quoted every day – the new confirmed covid-19 cases?

The number of new confirmed cases in US shot up from less than 1,000 per day in the first half of March to an average of roughly 30,000 per day in April. We see similar surge in numbers across the world, though time frames differ from region to region.

While these numbers reflect the scale of the problem, they are also a function of the number of tests being conducted. If the supply of test kits fluctuates, so will the number of new cases.

Importantly we need to account for the motivation for testing. Besides testing patients who are showing covid-19 symptoms, the authorities are testing individuals linked to known clusters. As the number of clusters increase, and as some really big clusters such as cruise ships or workers’ dormitories emerge, the number of tests conducted has multiplied. That we were testing fewer people at the start of the crisis, deflates the confirmed numbers for those days.

Comparisons too are misleading. For instance, while it is true that the incidence of the disease is much higher among foreign workers in Singapore, we cannot benchmark the 0.02 per cent of positive cases in the community (April 20) against the 1.9 per cent among the 323,000 workers living in dorms, because a very high proportion of workers in the dorms are being tested, compared to a very low proportion for the rest of the population.

For the abovementioned reasons, the number of new confirmed cases, though a useful indicator, is not an accurate measure of the scale of epidemic, and it may conceal some hard facts.

The death count, another useful indicator, is also tricky to work with. It is complicated too, because the death rate fluctuates as the epidemic moves through the phases of inception, growth, maturity and decline, and it must be interpreted in that context.

Importantly the death count is understated because resources, including staff and test kits, are limited, there is lack of surveillance, and practices vary across countries and within countries.

Testing patients is costly, time consuming, and the Covid RT-PCR test kits are in short supply in many regions. So if a patient with covid-like symptoms dies before being tested, the overstretched medical team may not use scarce resources on the cadaver.

The reason why Belgium has the highest per capita coronavirus death rate in the world is because the country counts deaths at nursing homes even if there wasn’t a confirmed infection.

A clearer picture of the number of deaths and the death rate can emerge by modelling deaths over past few years. Historical data reveals the base line, i.e., the expected number of deaths on any particular day or week. Since there is no other contributing factor in most regions, the numbers that exceed the baseline can be attributed to covid-19.

In conclusion, a lot of the information being shared every day needs to be carefully interpreted. For instance, the comparison of confirmed cases between countries such as Singapore and Indonesia, or the comparisons across time periods where the scale of testing differs dramatically, or even comparisons of absolute confirmed cases between different groups of the population where the proportions that are tested vary greatly. These are examples of what statisticians label as not apple to apple comparisons.

Moreover, the incidence of testing is motivated, biased in favour of known clusters. This is practical, and it is the right approach at the time when resources are heavily constrained. Yet it is not easy to make hard projections based on observations from biased samples.

If there were less work pressures and more testing kits, scientists would rely on proven quantitative research methods.

In this instance, we need what statisticians refer to as stratified samples. Strata are basically different groups, such as people with symptoms, people from known clusters, and people at random who have no symptoms and who do fall within clusters.

It is the last group of people, those that are tested at random, that reveal the full extent of the problem by yielding an estimate of the number of undetected cases.

For instance, if random tests in areas outside the known clusters and high risk regions yield 5 confirmed cases over 100,000 tests, the proportion that tested positive is 0.05%. For a population of say 10 million residents, it tells us there are 5,000 residents that are infected and we don’t know where to find them. Furthermore, if no action is taken, these numbers will multiply exponentially.

Once tests reveal the virus is past containment, all options are fraught with risks.

If it was at all feasible, the perfect solution to the coronavirus problem would be to test everyone every day. Then we would have no undetected cases, and therefore no need for circuit breakers or lockdowns.