|

Updates On My States Covid Tracking

Time for an update on my graphing of certain states’ covid fighting progress. I’ve since added more states because quite a few have opted to start opening up the economy. Last time I had 9 states (Texas, Georgia, Florida, Tennessee, South Carolina, Louisiana, California, New York, Michigan); this time I’ve added 11 more states mainly because they were re-opening (Alabama, West Virginia, Colorado, Idaho, Indiana, Kansas, Missouri, Montana, New Jersey, Connecticut). Not only did I update data through Saturday 5/9 but I also added some additional analytical tools such as close and re-open dates and statistics taking into account population size (or per capita numbers). At the end of my last post on this topic, I mentioned that I needed to add these statistics to see if they shed any light on why some states may be opening or not.

Before diving into my latest analysis, I’m starting off with my data sources, a refresher on some calculations of cumulative / running totals and moving average, and a discussion on how I cleaned my coronavirus data.

Then, I will show the latest charts for those states introduced in the last post with minimal discussion because the post will be long if I add in the other states. I think I will save for another two posts discussions centering how I developed the new additions/calculations and a series of charts including the new states, just to keep things short.

Data Sources

Raw daily data: https://en.wikipedia.org/wiki/Template:2019–20_coronavirus_pandemic_data/United_States_medical_cases Table – Non-repatriated COVID-19 cases in the US by state

Population data: https://worldpopulationreview.com/states/ Table – Table(0)

State abbreviations: https://www.50states.com/abbreviations.htm I actually copied the data into Excel because the site would not let me pull directly from the website. Then Power BI pulled the excel information.

Stay-at-home orders: https://en.wikipedia.org/wiki/U.S._state_and_local_government_response_to_the_2020_coronavirus_pandemic Table – Regions that formerly had stay at home orders or advisory

Refresher: DAX calculations for some measures

  • Running total or cumulative (I couldn’t make the quick measure work)
South GA running total in South GA = 
CALCULATE(
    SUM('Non-repatriated COVID-19 cases in the US by state (  vte  )'[South GA]),
    FILTER(
        ALLSELECTED('Non-repatriated COVID-19 cases in the US by state (  vte  )'[Date]),
        'Non-repatriated COVID-19 cases in the US by state (  vte  )'[Date]<= MAX('Non-repatriated COVID-19 cases in the US by state (  vte  )'[Date])
    )
)
  • Moving average – This version is a 3-day moving average
South GA_Moving_Ave = CALCULATE (
    AVERAGEX ( 'Non-repatriated COVID-19 cases in the US by state (  vte  )', 'Non-repatriated COVID-19 cases in the US by state (  vte  )'[South GA]),
    DATESINPERIOD (
        'Non-repatriated COVID-19 cases in the US by state (  vte  )'[Date],
        LASTDATE ( 'Non-repatriated COVID-19 cases in the US by state (  vte  )'[Date]),
        -(3),
        DAY
    ))
  • Moving average with flexibility to denote how many days (or weeks, months, etc.)
South GA_Moving_Ave = CALCULATE (
    AVERAGEX ( 'Non-repatriated COVID-19 cases in the US by state (  vte  )', 'Non-repatriated COVID-19 cases in the US by state (  vte  )'[South GA]),
    DATESINPERIOD (
        'Non-repatriated COVID-19 cases in the US by state (  vte  )'[Date],
        LASTDATE ( 'Non-repatriated COVID-19 cases in the US by state (  vte  )'[Date]),
        -('Non-repatriated COVID-19 cases in the US by state (  vte  )'[Day Variable]),
        DAY
    ))
  • Maximum # of cases – this is a quick measure rather than a regular measure. This is easier to create. You don’t really need to write the formula but here it is anyway.
South GA max per South GA = 
MAXX(
    KEEPFILTERS(VALUES('Non-repatriated COVID-19 cases in the US by state (  vte  )'[South GA])),
    CALCULATE(SUM('Non-repatriated COVID-19 cases in the US by state (  vte  )'[South GA]))
)

Data Cleanup

Something I didn’t talk about in my last post was whether I cleaned up my data that was pulled from the Wiki site and yes, I did do some clean ups. Here is how it looks when I first pulled the data into the Power Query Editor:

Here’s how it ended when I finished cleaning up the data last time:

Here are the steps, as best as I remember it. The steps are also spelled out in the right hand column – it was copied from the Advanced Editor section of Power BI:

  1. In the first column, filtered out “Date”, “Notes…..”, “States Sources”, and “Total”. Basically anything that is not a date.
  2. Still in the first column, changed the date column type to date.
  3. I saw that some errors popped up in column one due to a row that was text (“Daily Sources…”). I don’t think I could pull it up in step 1 so I removed the error in this step. If you could do it in step 1, you won’t have to remove the error here.
  4. Next, I changed all of the states’ column from text to whole number…I believe it was whole number.
  5. Check for errors – always check for errors. After step 4, there was no errors.
  6. Off to the right of the table, there is an additional date column. I deleted that or removed it.
  7. And then finally, for the remaining columns off to the right, I changed type from text to whole numbers.

Post continues below...

For those of you who can read DAX, here’s the language of the steps from the Advanced Editor in Power Query Editor:

let

Source = Web.Page(Web.Contents("https://en.wikipedia.org/wiki/Template:2019–20_coronavirus_pandemic_data/United_States_medical_cases")),

Data0 = Source{0}[Data],

#"Changed Type" = Table.TransformColumnTypes(Data0,{{"Date", type text}, {"West AK", type text}, {"West AZ", type text}, {"West CA", type text}, {"West CO", type text}, {"West HI", type text}, {"West ID", type text}, {"West MT", type text}, {"West NM", type text}, {"West NV", type text}, {"West OR", type text}, {"West UT", type text}, {"West WA", type text}, {"West WY", type text}, {"Midwest IA", type text}, {"Midwest IL", type text}, {"Midwest IN", type text}, {"Midwest KS", type text}, {"Midwest MI", type text}, {"Midwest MN", type text}, {"Midwest MO", type text}, {"Midwest ND", type text}, {"Midwest NE", type text}, {"Midwest OH", type text}, {"Midwest OK", type text}, {"Midwest SD", type text}, {"Midwest WI", type text}, {"South AL", type text}, {"South AR", type text}, {"South FL", type text}, {"South GA", type text}, {"South KY", type text}, {"South LA", type text}, {"South MS", type text}, {"South NC", type text}, {"South SC", type text}, {"South TN", type text}, {"South TX", type text}, {"South VA", type text}, {"South WV", type text}, {"Northeast CT", type text}, {"Northeast DC", type text}, {"Northeast DE", type text}, {"Northeast MA", type text}, {"Northeast MD", type text}, {"Northeast ME", type text}, {"Northeast NH", type text}, {"Northeast NJ", type text}, {"Northeast NY", type text}, {"Northeast PA", type text}, {"Northeast RI", type text}, {"Northeast VT", type text}, {"Territories GU", type text}, {"Territories MP", type text}, {"Territories PR", type text}, {"Territories VI", type text}, {"Date2", type text}, {"Confirmed New", type text}, {"Confirmed Cml", type text}, {"Deaths New", type text}, {"Deaths Cml", type text}, {"Recovered New", type text}, {"Recovered Cml", type text}}),

#"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Date] <> "Date" and [Date] <> "Notes: Numbers shown above as of date of confirmation by state authorities, cases might have occurred before the date of confirmation." and [Date] <> "State Sources" and [Date] <> "Total")),

#"Changed Type1" = Table.TransformColumnTypes(#"Filtered Rows",{{"Date", type date}}),

#"Removed Errors" = Table.RemoveRowsWithErrors(#"Changed Type1", {"Date"}),

#"Changed Type2" = Table.TransformColumnTypes(#"Removed Errors",{{"West AK", Int64.Type}, {"West AZ", Int64.Type}, {"West CA", Int64.Type}, {"West CO", Int64.Type}, {"West HI", Int64.Type}, {"West ID", Int64.Type}, {"West MT", Int64.Type}, {"West NM", Int64.Type}, {"West NV", Int64.Type}, {"West OR", Int64.Type}, {"West UT", Int64.Type}, {"West WA", Int64.Type}, {"West WY", Int64.Type}, {"Midwest IA", Int64.Type}, {"Midwest IL", Int64.Type}, {"Midwest IN", Int64.Type}, {"Midwest KS", Int64.Type}, {"Midwest MI", Int64.Type}, {"Midwest MN", Int64.Type}, {"Midwest MO", Int64.Type}, {"Midwest ND", Int64.Type}, {"Midwest NE", Int64.Type}, {"Midwest OH", Int64.Type}, {"Midwest OK", Int64.Type}, {"Midwest SD", Int64.Type}, {"Midwest WI", Int64.Type}, {"South AL", Int64.Type}, {"South AR", Int64.Type}, {"South FL", Int64.Type}, {"South GA", Int64.Type}, {"South KY", Int64.Type}, {"South LA", Int64.Type}, {"South MS", Int64.Type}, {"South NC", Int64.Type}, {"South SC", Int64.Type}, {"South TN", Int64.Type}, {"South TX", Int64.Type}, {"South VA", Int64.Type}, {"South WV", Int64.Type}, {"Northeast CT", Int64.Type}, {"Northeast DC", Int64.Type}, {"Northeast DE", Int64.Type}, {"Northeast MA", Int64.Type}, {"Northeast MD", Int64.Type}, {"Northeast ME", Int64.Type}, {"Northeast NH", Int64.Type}, {"Northeast NJ", Int64.Type}, {"Northeast NY", Int64.Type}, {"Northeast PA", Int64.Type}, {"Northeast RI", Int64.Type}, {"Northeast VT", Int64.Type}, {"Territories GU", Int64.Type}, {"Territories MP", Int64.Type}, {"Territories PR", Int64.Type}, {"Territories VI", Int64.Type}}),

#"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"Date2"}),

#"Changed Type3" = Table.TransformColumnTypes(#"Removed Columns",{{"Confirmed New", Int64.Type}, {"Confirmed Cml", Int64.Type}, {"Deaths New", Int64.Type}, {"Deaths Cml", Int64.Type}, {"Recovered New", Int64.Type}, {"Recovered Cml", Int64.Type}})

in

#"Changed Type3"

Graphics and Analysis

So, I see here that there is a distinct downward trend in the USA but I know New York has a huge effect on the US’s overall numbers just from the sheer population size of New York, so I don’t really rely on it. I just look at it to see if those numbers are driving the administration’s decision-making process.

The following graphics reflect updated statistics for the nine states used in the last post.

Now you will see I made some changes since last time: I changed some bars into red, green or dark blue. The red is supposed to represent the close down or stay-at-home order date and the green is the re-open date. The blue is for instances when the red or green bar does not contrast enough from the other colored bars. The states that did not have a re-open date was Louisiana, New York, California and Michigan. With California, you can see why the governor would not re-open the economy because the moving average is kind of still rising, but Louisiana and Michigan were a little harder to see. New York, well, the state went through a very severe period of rising number of infections and the latest infection rate hit 2,715 (you can't see it in the image) – still too high to open up, but they are on a downward trend. New York's daily infection rate needs to get lower before re-opening.

Now I don’t want to do a long analytical discussion about every state because this would be a very long post. I’ll just hit on a few interesting things I note. Again, the newer states will be left for another post.

Graphics and Discussion for Texas, Georgia and Florida

First Texas re-opened amidst rising infection rates and it looks like it has been rising since. Georgia seems to have flattened out and Florida did decline a little but appears to have flattened out; however both states have infection rates of over 500 on 5/9 and Texas stands at 1251.

Now I want to strip the population size effect on these statistics, creating what is commonly called per capita numbers for comparison of apples to apples. Here's the first one for Texas, Georgia and Florida:

The new graphic on the right lines up all of the states’ cumulative totals, from largest to smallest. You can see that New York, and then New Jersey overwhelm all other states in the top chart. I highlighted Texas, Georgia and Florida for greater visibility and you can see them clustered to the left, meaning the cumulative number of infection cases is pretty high - practically in the top ten.

The middle chart on the right hand side strips out the effect of population into a per capita infection rate per 100,000 and now you can see that Texas falls way to the right. Its per capita infection rate is very small, so now I’m getting a better understanding why Texas wants to re-open early. It’s a huge state of approximately 29 million residing in a few large cities amidst large rural areas. Texas is mostly rural so shutting down that state means adversely affecting the rural areas economically. The same kind of dynamics may be at play in Georgia and Florida but less so.

The last chart on the right shows per capita infection rate for the latest date, 5/9/2020, and now you see Texas shooting right back to the left, with high per capita infection rate for that day. So, the question boils down to whether the current infection rates are mostly in the cities or the rural areas. Right now, I’m guessing that Texas probably shouldn’t have re-opened because its infection rate is still going up and the state has one of the larger per capita infection rate for the most recent date. Florida and Georgia are also showing high per capita infection rate for 5/9 - again, practically in the top ten.

Graphics and Discussion for Tennessee, South Carolina and Louisiana

Last time I noted that Louisiana has shown a dramatic drop in infection rates so I was wondering why the state was not opening. Now when I look at the cumulative per capital, I see that Louisiana suffers from a very high per capita - seventh in the nation, so I think they may be a little gun-shy about opening the economy back up. The latest daily infection rate of 562 looks relatively low, lower than Texas, Georgia, or Florida.

But what daily or per capita infection rate is low enough? Does anybody know? That's the issue.

South Carolina and Tennessee appear to be safe to re-open. South Carolina, even though it's infection rate has plateaued, the overall per capita infection number and its latest infection rate is rather lower in the nation. Tennessee appears to be middle of the nation rank. We'll have to see what happens with the re-opening; this is sort of a national experiment, unfortunately with people's lives.

Graphics and Discussion for California, New York and Michigan

Right off the bat, I can see that California needs to remain closed; the infection cases are still rising, even on a per capita basis. The state was one of the earliest, if not the first, to institute a shut-down, so why are the infection rates still rising? The only thing I can think of is that they may have a large number of immigrants who have to still work but unprotected so the infection is proliferating amongst the poorest and the uninsured. That's just a guess. Another guess is that the infection had been percolating for a lot longer than known - some research indicates that the virus may have been spreading in December, way long before the first noted case in January. There also has been suspicion that the virus may have been in the US even earlier. If that's the case, then the virus is a lot more widespread and is still spreading.

Michigan, the state with the armed protesters in the state capital, shows declining infection rates but its cumulative per capita infection is 11th in the nation. Maybe the governor is wary of another surge. The 5/9 confirmed infection rate is 430. Is that low enough to be considered safe? I don't think anybody knows.

Closing thoughts in this post

My next post will be an explanation of how I got the colors in the charts and how I developed the per capita numbers or it will just show all of the charts, including the additional states that are or have re-opened their economy. They should be coming in the next few days.

Similar Posts