Display New Daily Cases of COVID-19 with Care

Statistics are playing a major role during the COVID-19 pandemic. The ways that we collect, analyze, and report them, greatly influences the degree to which they inform a meaningful response. An article in the Investor’s Business Daily titled “Dow Jones Futures Jump As Virus Cases Slow; Why This Stock Market Rally Is More Dangerous Than The Coronavirus Market Crash” (April 6, 2020, by Ed Carson) brought this concern to mind when I read the following table of numbers and the accompanying commentary:

U.S. coronavirus cases jumped 25,316 on Sunday [April 5th] to 336,673, with new cases declining from Saturday’s record 34,196. It was the first drop since March 21.

The purpose of the Investor’s Business Daily article was to examine how the pandemic was affecting the stock market. After the decline in the number of reported new COVID-19 cases on Sunday, April 5th, on Monday, April 6, 2020, the stock market surged (Dow Jones gained 1,627.46 points, or 7.73%). This was perhaps a response to hope that the pandemic was easing. This brings a question to mind. Can we trust this apparent decline as a sign that the pandemic has turned the corner in the United States? I wish we could, but we dare not, for several reasons. The purpose of this blog post is not to critique the news article and certainly not to point out the inappropriateness of this data’s effects on the stock market, but merely to argue that we should not read too much into the daily ups and downs of newly reported COVID-19 case counts.

How accurate should we consider daily new case counts based on the date when those counts are recorded? Not at all accurate and of limited relevance. I’ll explain, but first let me show you the data displayed graphically. Because the article did not identify its data source, I chose to base the graph below on official CDC data, so the numbers are a little different. I also chose to begin the period with March 1st rather than 2nd, which seems more natural.

What feature most catches your eye? For most of us, I suspect, it is the steep increase in new cases on April 3rd, followed by a seemingly significant decline on April 4th and 5th.

A seemingly significant rise or fall in new cases on any single day, however, is not a clear sign that something significant has occurred. Most day-to-day volatility in reported new case counts is noise—it’s influenced by several factors other than actual new infections that developed. There is a great deal of difference between the actual number of new infections and the number of new infections that were reported as well as a significant difference between the date on which infections began and the date on which they were reported. We currently have no means to count the number of infections that occurred, and even if we tested everyone for the virus’s antibodies at some point, we would still have no way of knowing the date on which those infections began. Reported new COVID-19 cases is a proxy for the measure that concerns us.

Given the fact that reported new cases is probably the best proxy that’s currently available to us, we could remove much of the noise related to the specific date on which infections began by expressing new case counts as a moving average. A moving average would provide us with a better overview of the pandemic’s trajectory. Here’s the same data as above, this time expressed as a 5-day moving average. With a 5-day moving average the new case count for any particular day is averaged along with the four preceding days (i.e., five-days-worth of new case counts are averaged together), which smooths away most of the daily volatility.

While it still looks as if the new case count is beginning to increase at a lesser rate near the end of this period, this trend no longer appears as dramatic.

Daily volatility in reported new case counts is caused by many factors. We know that the number of new cases that are reported on any particular day do not accurately reflect the number of new infections. It’s likely that most people who have been infected have never been tested. Two prominent reasons for this are 1) the fact that most cases are mild to moderate and therefore never involve the medical intervention, and 2) the fact that many people who would like to be tested cannot because tests are still not readily available. Of those who are tested and found to have the virus, not all of those cases are recorded or, if recorded, are forwarded to an official national database. And finally, of those new cases that are recorded and do make it into an official national data base, the dates on which they are recorded are not the dates on which the infections actually occurred. Several factors determine the specific day on which cases are recorded, including the following:

  1. When the patient chooses or is able to visit a medical facility.
  2. The availability of medical staff to collect the sample. Staff might not be available on particular days.
  3. The availability of lab staff to perform the test. The sample might sit in a queue for days.
  4. The speed at which the test can be completed. Some tests can be completed in a single day and some take several days.
  5. When medical staff has the time to record the case.
  6. When medical staff gets around to forwarding the new case record to an official national database.

There’s a lot that must come together for a new case to be counted and to be counted on a particular day. As the pandemic continues, this challenge will likely increase because, as medical professionals become increasingly overtaxed, both delays in testing and errors in reporting the results will no doubt increase to a corresponding degree.

Now, back to my warning that we shouldn’t read too much into daily case counts as events are unfolding. Here’s the same daily values as before with one additional day, April 6th, included at the end.

Now what catches your eye. It’s different, isn’t it? As it turns out, by waiting one day we can see that reported new cases did not peek on April 3rd followed by a clear turnaround. New cases are still on the rise. Here’s the same data expressed as a 5-day moving average:

The trajectory is still heading upwards at the end of this period. We can all hope that expert projections that the curve will flatten out in the next few days will come to pass, but we should not draw that conclusion from the newly reported case count for any particular day. The statistical models that we’re using are just educated guesses based on approximate data. The true trajectory of this pandemic will only be known in retrospect, if ever, not in advance. Patience in interpreting the data will be rewarded with greater understanding, and ultimately, that will serve our needs better than hasty conclusions.

6 Comments on “Display New Daily Cases of COVID-19 with Care”


By RH. April 10th, 2020 at 1:17 pm

Each of the dips in the graph (except 3/20) are weekends.

By Stephen Few. April 10th, 2020 at 3:10 pm

RH,

The first thing that I looked for when I examined this data graphically was a weekend effect. While it is reasonable to expect one, it actually does not appear to exist. Even though it is true that the dip in reported new cases on April 4th and 5th occurred on Saturday and Sunday, weekend days overall are not uncharacteristically lower in new case counts than weekdays.

Stephen

By Micah Rietschlin. April 10th, 2020 at 4:31 pm

Thank you; your analysis and considerations of data collection issues is timely. I have forwarded to people in my circle.

Micah

By Carlos Barboza. April 13th, 2020 at 3:50 am

Great way to this piece, Stephen:

“Patience in interpreting the data will be rewarded with greater understanding, and ultimately, that will serve our needs better than hasty conclusions.”

Your values are up to the 6th of April and I think and hope with this last week´s values (from April 7th – to April 12th) of new cases; the situation has sort of “stabilized”… I hope it has, and we stop loosing loved ones…

By Dale Lehman. April 13th, 2020 at 12:38 pm

Thank you for this very sensible description of the limitations of the available data and a better way to think about the trends. Interestingly, Johns Hopkins is reporting the confirmed cases, along with the 5 day moving average to address precisely the concerns you express(https://coronavirus.jhu.edu/data/new-cases). However, one thing that is interesting is the 5 day moving average they are using is defined as
“This analysis uses a 5-day moving average to visualize the number of new COVID-19 cases and calculate the rate of change. This is calculated for each day by averaging the values of that day, the two days before, and the two next days.”
Calculating the moving average mixing past and future data seems a bit strange to me, and I was wondering if you had any thoughts on that. To me, it seems sensible to look at a variety of moving average windows since there is nothing magical about 5 days. In addition, other smoothing methods for the data might be explored (such as splines) and I was wondering if you had thoughts about that.

By Stephen Few. April 13th, 2020 at 2:45 pm

Dale,

When noise should be reduced to provide a better overview of change through time, I always use a moving average rather than other smoothing techniques because it was specifically designed for change through time, it works well, and it’s easy to understand. In the example above, there is nothing magical about averaging across five days, but the choice is reasonable. When choosing the number of time intervals across which to average, we seek the sweet spot between too few intervals, which would remove too little noise, and too many intervals, which would eliminate significant changes. In this particular case, either a 5-day or a 7-day moving average seems appropriate. I’m not familiar with the rationale that Johns Hopkins is using for calculating a 5-day moving average by including two prior and two later days in the average. It seems to me that one downside of this method is the fact that you cannot display a day’s value until three days later rather than on the next day. The normal method of averaging the current day with the four previous days to produce a 5-day moving average avoids this problem.

Leave a Reply