Some Data with Power BI Before Labor Day
Before the Labor Day weekend gets underway, I decided to pull and note the final data for how the US is doing during the pandemic. Fall is arriving along with the dreaded flu season so I wanted to get a snapshot (actually, I already get a snapshot everyday) of where America stands and note if the American people is getting pandemic fatigue. The data I pulled is as of 9/3/2020.
New Goal
Previously, I was just trying to get some data and start using Power BI – all for the purpose of learning how to use it and developing some analytical tools.
My new goal with Power BI and the coronavirus analysis, now that I have some rudimentary charts going, is to now try to develop charts/tables giving early warning on which part of the US is facing a surge in cases. Furthermore, I want to distill the information into core basic graphics so that I don’t have to wade through a large number of graphs and tables to find trouble spots. And finally, maybe work on the design of the visual information to make it more visually interesting.
Sigh. I have spent the past two weeks trying to improve the tables/charts/information to no avail. I feel like I am no closer than I were two weeks ago. Right now, the huge difficulty is the handling of moving averages, especially for the US as a whole or for regions (South, Midwest, Northeast, West) of the US. The absolute numbers are coming out incorrect but the form of the line graphs matches those I create in Excel (I first do the math in Excel and graph the results as a way of checking if the Power BI is working). I’m a little frustrated right now because whatever I try to do, things just don’t work (Power BI, Python programming, drawing, whatever).
So….
Lately, the news has indicated the Midwest was seeing signs of rising infection rate, but I’ve been seeing it for a while now – maybe since end of July. So, I actually do have one set of graphics that have worked pretty well in providing advance warning of pending surge, but I’m wondering if there are other signals I can capture. I have created some new graphs in the past two weeks (that still have some problems, numerically) and I will test to see if I successfully use them a bit as the days and weeks go by. The new ones I built are graphs of infection cases, deaths, hospitalizations, testing and positivity rate for the US and for the regions. Eventually I want to figure out how to deal with the individual states: I want a set of graphs that show me upfront a set of states showing rising infection over the past “x” weeks. I don’t want to have to search through a series of graphs to find that information: I just want that information to pop out. Right now, I have 50 series of graphs showing daily infections, hospitalization and positivity rates and those graphs are spread out on multiple tabs. I need something better: more concise with key information popping out.
The New Graphs
Here are the new graphics that I’ve created: note that in some images, I do not have absolute numbers for cumulative or moving averages, and the regional graphs do not have the Y-axis marked with numbers. That is because I am having problems pulling the moving averages correctly. Also, I’m playing around with some visuals to test out what Power BI offers.
Sources of Data:
Mostly The Atlantic’s Covid Tracking Project: https://covidtracking.com/api/v1/states/daily.csv
The first two graphs on the left: Wikipedia https://en.wikipedia.org/wiki/Template:2019–20_coronavirus_pandemic_data/United_States_medical_cases
We can see that the Northeast suffered early in the year and then the South surged over the summer. The South has surged to a much higher peak than the Northeast, but their daily cases has declined since early August. You can barely see it, but the Midwest is beginning to surge; however, it’s so small – it’s more of a gradual thing that you would debate if a surge is really going on. There’s another chart, which I’ll show later, that shows the broader surge more clearly.
The daily deaths graph shows that deaths was much greater in the Northeast earlier in the year when we did not know how to treat those sick with the coronavirus, but now we appear to have a much better handle on treatment. Northeast had the greatest hospitalization compared to the other regions, but again, that was during the early phase of the virus spread when our medical community did not know anything about the virus. The hospitalization graphs indicate there were two peaks in hospitalization: the early part of the year in the Northeast and then during summer with the South, West and Midwest combined.
The testing graphs show that it took a while for the testing to ramp up: it looks like five or six months to reach a peak testing. Unfortunately, I don’t think the medical community thinks we are doing enough testing today. As a matter of fact, testing seems to have declined since July/August, just after Trump announced that he would like for the testing to decrease. I am not sure that testing dipped because of what Trump desired or because of testing resources constraints (supplies running out, too many people requesting test results, not enough skilled personnel, etc.). The testing decline is worrisome.
Positivity rate looks more positive as it hovers around 5% for the US as a whole; however, when you dig into the individual states, you start to see variations amongst the states ranging from close to zero to greater than 15%. (Alabama has been having weird, wickedly high positivity rates – are they catching up on some testing data?) Sometimes these positivity rates look funky to me.
Previous Old Graphs
Now, here’s my old sets of graphs showing the each region’s states and their infection count, and these sets of graphs are the parts that I want to revise so that states approaching criticality pop up at me. I have added tables at the end (bottom?), sorted from maximum to minimum for cumulative infection rates, maximum number of daily cases, the latest number of cases and positivity rate.
Source of Data:
Wikipedia https://en.wikipedia.org/wiki/Template:2019–20_coronavirus_pandemic_data/United_States_medical_cases
At this point (September 3), North and South Carolina and West Virginia in the southern region appear to be in incipient surges. Florida and Georgia have declined in the last month or so, but is that a tiny uptick in the last few days for Florida? Overall, the southern region is either winding down or stabilizing. The West seems to be okay, too, either declining or stabilizing, although Utah looks like it may do a surge. Alaska, Hawaii and Montana bear watching as they just recently stabilized. I don’t know if that “stabilization” will hold.
You can see it is kind of hard to tell which state will surge and those positivity rates are not helping since most of them are over the recommended 5%. Those positivity rates look funky to me.
For the Midwest I have the following states as problematic, problem as in surges: Ohio, Minnesota, maybe Iowa, Nebraska, Kansas, Oklahoma, and North and South Dakota. South Dakota had very little cases before that Sturgis Motorcycle rally but you can see that cases have been surging since then, so that motorcycle rally can be stated to be a super spreader event. I feel sorry for the residents of Sturgis because about 60% of them did not want that festival but the townspeople did not see how they could prevent the motorcyclists from coming into their town. South Dakota is definitely surging. The entire Midwest is definitely becoming problematic.
The Northeast looks to be under control – its overall positivity rate is under 5% and those infection rates appear stable. After the rocky start at the beginning of the year, the residents have corralled their discipline and did what needed to be done to bring down the infection rate.
You can see I’m having trouble telling which state may be in trouble and these graphs may be the best as it gets. But I’m going to try to get at something that pops out at me rather than I having to scrutinize each and every graph for each state to discern trends.
The Graphs that Clued Me in to the Midwest Surge
These are the graphs that clearly told me the Midwest story: I could see that the Midwest infection cases starting to rise sometime in July and by end of August the infection rates surpassed the West, despite California leading the West. As a matter of fact, California has been declining or improving since mid-August. The graphs that show the individual states for each region clearly depict that the Midwest infection has been spreading since mid-June whereas the Northeast has remained relatively flat on the chart. It is just a contrast in chart behavior. Even the West shows improvement as Arizona clearly declines so that the spread narrows to a thin line and California hovers above the rest. The Midwest chart lines are just spread out.
Source of Data:
Wikipedia https://en.wikipedia.org/wiki/Template:2019–20_coronavirus_pandemic_data/United_States_medical_cases
And in Closing: Europe vs US
Earlier during summer, the news were comparing the US against Europe and US looked like a sad case. Now, Europe is showing surging rates of infection since (slowly) re-opening their economy a couple of months ago.
Source of Data:
Wikipedia https://en.wikipedia.org/wiki/COVID-19_pandemic_by_country_and_territory
I hadn’t been paying attention to this dynamic until a few weeks ago when I was shocked that Europe was starting to surge. Right now, it looks like Spain and France are facing a possible second surge. So yes, Europe is also struggling, although not as badly as the US.
Final closing: Other People’s Coronavirus Tracking Using Power BI
I almost forgot: somehow, I caught sight of a Microsoft site that was tracking the coronavirus using Power BI and then immediately after that, I started to see YouTube videos describing how to create a Coronavirus dashboard. My plan is to go through a few of these videos to see what data people are using, how they are tracking the virus, and what kind of an analysis they are doing. I’m hoping to learn a few more Power BI tricks and analytical tools.
So much learning and so little time.