| |

Playing with Covid-19 Data

I’ve been avoiding doing any kind of graphing or numerical play with the data that is being put out on this crisis mainly because the news is so distressing. All I have been doing is just gathering the data without doing anything with it. But with the states now starting to open up the economy, I wanted to see if I could detect any positive news in the current set of data and whether I see cases rise as projected by the health officials. Most, if not all, health officials say that if we re-open the economy too soon, or “skip a couple of steps”, then we will see infection rates start to rise and we would be right back to where we were about a month and a half ago.

And we would have to shut down the economy again, possibly more severely and longer. That doesn’t bear thinking.

Currently as a whole in the US, we seem to have leveled off or flattened, which my graphing shows. But that is not the case on a state by state case.

I decided to select a couple of states like those in the south that are said to be re-opening maybe too precipitously (Georgia, Texas, Florida, Tennessee, South Carolina) and compare them to one southern state that I don’t think will be opening (Louisiana), to some of the larger states (California and New York) and to a midwestern state that has been in the news for its protests (Michigan) for a total of nine states.

Pull in Data

I started off with pulling the numbers from the Wikipedia website at https://en.wikipedia.org/wiki/Template:2019-20_coronavirus_pandemic_data/United_States_medical_cases and table ‘Non-repatriated covid-19 cases in the US by state‘. I used the Get Data and the source as Other – Web. I plopped in the website address and selected the appropriate table.

First Simple Plot – Daily confirmed cases and daily cumulative

The data being pulled in had to be cleaned up first, such as deleting some extraneous rows, changing the state columns from text to whole numbers and changing the date column from text to date. Once the I got the data cleaned up, I proceeded to playing around with some graphing. The first one I did was a simple plot of daily confirmed cases against the date timeline and cumulative totals against the date timeline. The daily confirmed cases were easy enough to plot but a “quick new measure” called “running totals” had to be created for cumulative totals for each state you want to look at. Hover your cursor over the state field you want to calculate the cumulative and you should see an icon of three dots. Click on those dots and you will then be able to see “Quick new measure”.

The quick measure you want to use is the running totals but I really didn’t know how to set up the running totals to make it work, so I had to adjust it for what I wanted. I did the initial set up and then I googled and found something to use in the quick new measure running total formula:

South GA running total in South GA = 
CALCULATE(
    SUM('Non-repatriated COVID-19 cases in the US by state (  vte  )'[South GA]),
    FILTER(
        ALLSELECTED('Non-repatriated COVID-19 cases in the US by state (  vte  )'[Date]),
        'Non-repatriated COVID-19 cases in the US by state (  vte  )'[Date]<= MAX('Non-repatriated COVID-19 cases in the US by state (  vte  )'[Date])
    )
)

That’s the formula I used for the running total quick new measure. I did some numerical testing and it appears to work.

Here’s the first attempt for Georgia:

Data from website https://en.wikipedia.org/wiki/Template:2019–20_coronavirus_pandemic_data/United_States_medical_cases
Table: Non-repatriated covid-19 cases in the US by state

I didn’t really find the plot informative, especially the cumulative. The daily confirmed cases plot was wild, up and down, up and down. I really didn’t see the infection cases to be declining for the Georgia to open up after only 2 weeks of shut down.

Second Plot – Moving Average

I decided to try moving average for # of confirmed cases rather than daily cumulative to see if the plot was more informative. I believe I used a new measure, as opposed to a quick new measure. Again, I googled for the formula since I’m not an expert in writing in DAX. Here’s the formula I found – with my variables substituted in:

South GA_Moving_Ave = CALCULATE (
    AVERAGEX ( 'Non-repatriated COVID-19 cases in the US by state (  vte  )', 'Non-repatriated COVID-19 cases in the US by state (  vte  )'[South GA]),
    DATESINPERIOD (
        'Non-repatriated COVID-19 cases in the US by state (  vte  )'[Date],
        LASTDATE ( 'Non-repatriated COVID-19 cases in the US by state (  vte  )'[Date]),
        -(3),
        DAY
    ))

This formula is for a 3-day moving average – 2 days before the date plus the day in question. I decided to change the -3 into a variable that I could easily manage, so I created a measure called ‘Day Variable’ and made it equal to however many moving days I want. This ‘Day Variable’ can be changed in one go rather than changing all moving average measures for each state with the -3. Now my moving average formula looks like this:

South GA_Moving_Ave = CALCULATE (
    AVERAGEX ( 'Non-repatriated COVID-19 cases in the US by state (  vte  )', 'Non-repatriated COVID-19 cases in the US by state (  vte  )'[South GA]),
    DATESINPERIOD (
        'Non-repatriated COVID-19 cases in the US by state (  vte  )'[Date],
        LASTDATE ( 'Non-repatriated COVID-19 cases in the US by state (  vte  )'[Date]),
        -('Non-repatriated COVID-19 cases in the US by state (  vte  )'[Day Variable]),
        DAY
    ))

So, here’s the result:

Data from website https://en.wikipedia.org/wiki/Template:2019–20_coronavirus_pandemic_data/United_States_medical_cases
Table: Non-repatriated covid-19 cases in the US by state

Now the plots are starting to show something more informative. Note I changed the moving average from 3 days to 7.

On the right-hand side, I added in cards to depict the latest cumulative total and the last moving average of # of confirmed cases.

At this point, I set aside Power BI overnight, and came back the next morning with new ideas.

Third Plot – Combine daily confirmed cases with moving average

The change isn’t really a big deal but I did want the moving average to flow with the daily confirmed cases so I did a combo chart called ‘Line and clustered column chart’. I also added a card for a quick new measure depicting the maximum daily confirmed cases. This quick new measure was easy enough to create.

Here’s the formula for it:

South GA max per South GA = 
MAXX(
    KEEPFILTERS(VALUES('Non-repatriated COVID-19 cases in the US by state (  vte  )'[South GA])),
    CALCULATE(SUM('Non-repatriated COVID-19 cases in the US by state (  vte  )'[South GA]))
)

Now, I didn’t get a chance to add in two data: the date when the state's shutdown began and date of when the state's economy is scheduled to re-open. That’s probably my next post. I will use these charts to see if I can discern whether infection cases start to tick up, starting sometime around 2 weeks after the state’s businesses open up. The businesses are supposed to institute some various rules such as masks, social distancing, disinfecting, partial opening, etc. to help protect their employees and customers from further infections.

Current plots for 9 states

There are some interesting things to note:

  • Tennessee just a day ago saw a spike in new confirmed cases. I suspect that the test results are not coming in a timely manner. There must be some old tests that just got back yesterday. There may be other lags in reporting which was why I decided to use moving average.
  • Louisiana had a high daily spike early in April, even greater than California, but then there is a dramatic decline in new cases. Their latest moving average of new cases is at 367. Because Louisiana had such a high peak early on, the state may be hesitant to open up their businesses. I think they are one of a set of states that are going to remain close for a little longer.
  • New York is finally moving down but their latest moving average of new cases is at 4000, still too high for them to feel comfortable opening up their economy.
  • Michigan, the state with ongoing protests to open up businesses, has seen some decline in new cases but it's kind of bumpy. Their latest moving average is still a high (to me) 835.
  • California has not yet plateaued but it looks close.
  • South Carolina looks like it plateaued with the latest moving average of 168. Should they re-open the economy? It's hard to say since their numbers aren't declining. It might depend on whether the state's healthcare system can handle any surges that could come from re-opening.
  • Florida looks like the infection rates are on the decline but the latest moving average is at 570, so the state may not be ready to open.
  • Texas and Georgia do not look like they are ready to open up; their moving average looks to be on the high side.

All right, that's it for today's post.

Similar Posts