|

Update #3

This post is part 3 of my Power BI updates. Post 1 was an update on the state tracking on Power BI and contained a rehash of how I created some measures such as running total (or cumulative) from the daily cases, moving averages for “x” days (in my case, I’m using 7 days moving average), and maximum infection case number. I also talked about how I cleaned up the data and I want to continue on this topic in this post because I did some changes. And finally Post 1 ended up with new state charts showing the new colored bars and the per capita charting. Post 2 contains all of the images of charts for 5/9/2020, including the new states that were re-opening their economy. I want to see if cases show an uptick since re-opening. Right now, I’d say the results are mixed.

In this third and final installment, I will repeat the data clean up and add the changes, show how I imposed colored bars in the chart, and how I developed the per capita measures. In addition, there will be a brief discussion about linking tables because I brought in population figures for per capita measures, which required linking state’s population with the daily infection cases.

Before diving into my post, I will start off again with my data sources, provide a refresher on some calculations of cumulative or running totals and moving average, and then lead into today’s discussion on a new data clean up.


Data Sources

Raw daily data: https://en.wikipedia.org/wiki/Template:2019–20_coronavirus_pandemic_data/United_States_medical_cases
Table – Non-repatriated COVID-19 cases in the US by state

Population data: https://worldpopulationreview.com/states/
Table – Table(0)

State abbreviations: https://www.50states.com/abbreviations.htm
I actually copied the data into Excel because the site would not let me pull directly from the website. Then Power BI pulled the excel information.

Stay-at-home orders: https://en.wikipedia.org/wiki/U.S._state_and_local_government_response_to_the_2020_coronavirus_pandemic
Table – Regions that formerly had stay at home orders or advisory


Refresher: DAX calculations for some measures

  • Running total or cumulative (I couldn’t make the quick measure work)
South GA running total in South GA = 
CALCULATE(
    SUM('Non-repatriated COVID-19 cases in the US by state (  vte  )'[South GA]),
    FILTER(
        ALLSELECTED('Non-repatriated COVID-19 cases in the US by state (  vte  )'[Date]),
        'Non-repatriated COVID-19 cases in the US by state (  vte  )'[Date]<= MAX('Non-repatriated COVID-19 cases in the US by state (  vte  )'[Date])
    )
)
  • Moving average – This version is a 3-day moving average
South GA_Moving_Ave = CALCULATE (
    AVERAGEX ( 'Non-repatriated COVID-19 cases in the US by state (  vte  )', 'Non-repatriated COVID-19 cases in the US by state (  vte  )'[South GA]),
    DATESINPERIOD (
        'Non-repatriated COVID-19 cases in the US by state (  vte  )'[Date],
        LASTDATE ( 'Non-repatriated COVID-19 cases in the US by state (  vte  )'[Date]),
        -(3),
        DAY
    ))

  • Moving average with flexibility to denote how many days (or weeks, months, etc.)
South GA_Moving_Ave = CALCULATE (
    AVERAGEX ( 'Non-repatriated COVID-19 cases in the US by state (  vte  )', 'Non-repatriated COVID-19 cases in the US by state (  vte  )'[South GA]),
    DATESINPERIOD (
        'Non-repatriated COVID-19 cases in the US by state (  vte  )'[Date],
        LASTDATE ( 'Non-repatriated COVID-19 cases in the US by state (  vte  )'[Date]),
        -('Non-repatriated COVID-19 cases in the US by state (  vte  )'[Day Variable]),
        DAY
    ))

  • Maximum # of cases in a day – this is a quick measure rather than a regular measure. This is easier to create. You don’t really need to write the formula but here it is anyway.
South GA max per South GA = 
MAXX(
    KEEPFILTERS(VALUES('Non-repatriated COVID-19 cases in the US by state (  vte  )'[South GA])),
    CALCULATE(SUM('Non-repatriated COVID-19 cases in the US by state (  vte  )'[South GA]))
)

Data Cleanup – the original plus the new steps

When trying to develop the per capita measures, I realized I could have gone further with the original data cleanup, leading to a very different table structure to enable me to create the per capita measures. I suspect I could have used this method initially when developing the charts but I still need to test this theory; however, I do believe that this different table would have reduced the need to create so many measures that I currently have.

In Post 1, I showed how the table looked when I first pulled the data into the Power Query Editor:

Here’s how it ended when I finished cleaning up the data:

For the per capita measures (a type of measure that eliminates the impact of population differences and becomes apples to apples on a per 100,000 basis), I expanded on that data clean up, but first, I made a copy of that original table. By the way, that table was called “Non-repatriated COVID-19 cases in the US by state (vte)” (the name of the table on the Wiki website). So, I copied this table and named the new table “Per Capita Calculations”.

Upon completion of data cleanup, the new table has the following data structure:


Steps 1 through 7 details the cleanup of the first table; steps 8 through 10 explains what I did to convert the first table into the second table (after duplicating the first table). On the right hand side, the same steps are spelled out in the Advanced Editor in Power Query Editor for those of you who know DAX – yellow highlights are the new additions.


  1. In the first column, filtered out “Date”, “Notes…..”, “States Sources”, and “Total”. Basically anything that is not a date.
  2. Still in the first column, changed the date column type to date.
  3. I saw that some errors popped up in column one due to a row that was text (“Daily Sources…”). I don’t think I could pull it up in step 1 so I removed the error in this step. If you could do it in step 1, you won’t have to remove the error here.
  4. Next, I changed all of the states’ column from text to whole number…I believe it was whole number.
  5. Check for errors – always check for errors. After step 4, there was no errors.
  6. Off to the right of the table, there is an additional date column. I deleted that or removed it.
  7. And then finally, for the remaining columns off to the right, I changed type from text to whole numbers.
  8. I unpivoted the columns. I highlighted all of the state columns and additional summary columns (basically all of the columns but the date column) and did a unpivot on those columns.
  9. I ended up with  3 columns: date, attribute, and values. I wanted to split the attribute column so that I could get at the state abbrev. I used the space as the delimiter.
  10. And finally, I renamed attribute.2 column (the newly split column) into States.

This new table becomes the basis for creating the per capita calculations.

On the right is the language of the same steps from the Advanced Editor in Power Query Editor for those of you who know DAX – yellow highlights are the new additions (steps 8 to 10):

Post continues down below.

The Colored Bars

The red and green colored bars that you see in the next image have to be manually manipulated. To get there, highlight the chart where you want those bars colored differently at a certain date point.


Then go to the Visualization section off to the right of Power BI and click on the paint roller (the icon circled in red shown below in image on left). The magic is going to happen in the “Show all” once you turn it “on”. Individual dates will appear below and you can specify what color you want for a specific date (see image below on right).

I will say, this is a very manual process. At this time, I don't know how to automate this process. But still, this colored bars for specific dates is a nice feature.

let

Source = Web.Page(Web.Contents("https://en.wikipedia.org/wiki/Template:2019–20_coronavirus_pandemic_data/United_States_medical_cases")),

Data0 = Source{0}[Data],

#"Changed Type" = Table.TransformColumnTypes(Data0,{{"Date", type text}, {"West AK", type text}, {"West AZ", type text}, {"West CA", type text}, {"West CO", type text}, {"West HI", type text}, {"West ID", type text}, {"West MT", type text}, {"West NM", type text}, {"West NV", type text}, {"West OR", type text}, {"West UT", type text}, {"West WA", type text}, {"West WY", type text}, {"Midwest IA", type text}, {"Midwest IL", type text}, {"Midwest IN", type text}, {"Midwest KS", type text}, {"Midwest MI", type text}, {"Midwest MN", type text}, {"Midwest MO", type text}, {"Midwest ND", type text}, {"Midwest NE", type text}, {"Midwest OH", type text}, {"Midwest OK", type text}, {"Midwest SD", type text}, {"Midwest WI", type text}, {"South AL", type text}, {"South AR", type text}, {"South FL", type text}, {"South GA", type text}, {"South KY", type text}, {"South LA", type text}, {"South MS", type text}, {"South NC", type text}, {"South SC", type text}, {"South TN", type text}, {"South TX", type text}, {"South VA", type text}, {"South WV", type text}, {"Northeast CT", type text}, {"Northeast DC", type text}, {"Northeast DE", type text}, {"Northeast MA", type text}, {"Northeast MD", type text}, {"Northeast ME", type text}, {"Northeast NH", type text}, {"Northeast NJ", type text}, {"Northeast NY", type text}, {"Northeast PA", type text}, {"Northeast RI", type text}, {"Northeast VT", type text}, {"Territories GU", type text}, {"Territories MP", type text}, {"Territories PR", type text}, {"Territories VI", type text}, {"Date2", type text}, {"Confirmed New", type text}, {"Confirmed Cml", type text}, {"Deaths New", type text}, {"Deaths Cml", type text}, {"Recovered New", type text}, {"Recovered Cml", type text}}),

#"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Date] <> "Date" and [Date] <> "Notes: Numbers shown above as of date of confirmation by state authorities, cases might have occurred before the date of confirmation." and [Date] <> "State Sources" and [Date] <> "Total")),

#"Changed Type1" = Table.TransformColumnTypes(#"Filtered Rows",{{"Date", type date}}),

#"Removed Errors" = Table.RemoveRowsWithErrors(#"Changed Type1", {"Date"}),

#"Changed Type2" = Table.TransformColumnTypes(#"Removed Errors",{{"West AK", Int64.Type}, {"West AZ", Int64.Type}, {"West CA", Int64.Type}, {"West CO", Int64.Type}, {"West HI", Int64.Type}, {"West ID", Int64.Type}, {"West MT", Int64.Type}, {"West NM", Int64.Type}, {"West NV", Int64.Type}, {"West OR", Int64.Type}, {"West UT", Int64.Type}, {"West WA", Int64.Type}, {"West WY", Int64.Type}, {"Midwest IA", Int64.Type}, {"Midwest IL", Int64.Type}, {"Midwest IN", Int64.Type}, {"Midwest KS", Int64.Type}, {"Midwest MI", Int64.Type}, {"Midwest MN", Int64.Type}, {"Midwest MO", Int64.Type}, {"Midwest ND", Int64.Type}, {"Midwest NE", Int64.Type}, {"Midwest OH", Int64.Type}, {"Midwest OK", Int64.Type}, {"Midwest SD", Int64.Type}, {"Midwest WI", Int64.Type}, {"South AL", Int64.Type}, {"South AR", Int64.Type}, {"South FL", Int64.Type}, {"South GA", Int64.Type}, {"South KY", Int64.Type}, {"South LA", Int64.Type}, {"South MS", Int64.Type}, {"South NC", Int64.Type}, {"South SC", Int64.Type}, {"South TN", Int64.Type}, {"South TX", Int64.Type}, {"South VA", Int64.Type}, {"South WV", Int64.Type}, {"Northeast CT", Int64.Type}, {"Northeast DC", Int64.Type}, {"Northeast DE", Int64.Type}, {"Northeast MA", Int64.Type}, {"Northeast MD", Int64.Type}, {"Northeast ME", Int64.Type}, {"Northeast NH", Int64.Type}, {"Northeast NJ", Int64.Type}, {"Northeast NY", Int64.Type}, {"Northeast PA", Int64.Type}, {"Northeast RI", Int64.Type}, {"Northeast VT", Int64.Type}, {"Territories GU", Int64.Type}, {"Territories MP", Int64.Type}, {"Territories PR", Int64.Type}, {"Territories VI", Int64.Type}}),

#"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"Date2"}),

#"Changed Type3" = Table.TransformColumnTypes(#"Removed Columns",{{"Confirmed New", Int64.Type}, {"Confirmed Cml", Int64.Type}, {"Deaths New", Int64.Type}, {"Deaths Cml", Int64.Type}, {"Recovered New", Int64.Type}, {"Recovered Cml", Int64.Type}}),

#"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Changed Type3", {"Date"}, "Attribute", "Value"),

#"Split Column by Delimiter" = Table.SplitColumn(#"Unpivoted Columns", "Attribute", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Attribute.1", "Attribute.2"}),

#"Changed Type4" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Attribute.1", type text}, {"Attribute.2", type text}}),

#"Renamed Columns" = Table.RenameColumns(#"Changed Type4",{{"Attribute.2", "States"}})

in

#"Renamed Columns"

Per Capita Calculation

The per capital calculation is a calculated measure, but first I brought in a table of states’ populations. I have listed my source at the beginning of this blog. When I pulled it in, it looked like the following:

I made no changes to the table.

I also had to bring in a state abbreviations table because the Population table had the full name of the states but the Per Capita Calculation table had the abbreviated form (that form originated from the website). Bringing in additional tables meant I had to build relationships between the tables. I will talk briefly in the next section about relationships, so keep this fact in the back of your mind.

Once I got all of the tables connected, I then was able to write up a new measure “PerCapita”:

PerCapita = (Sum('Per Capita Calculations'[Value])/Sum('Population'[2020 Pop.]))*100000

This measure is a per capita per 100,000. All I’m doing is dividing some value – in this case, the number of infection cases – by the population to eliminate population effects. Then I multiplied that number by 100,000. You could use 1 million rather than hundred thousand. I’m using 100,00 because that’s the common usage and because not all states have a population of over a million.


Table Relationships

This section has to necessarily be very brief and light on how-to information since I’m not an expert in getting tables to link up properly. First, it’s easier to link up tables when most of the tables have the key listed once rather than multiple times. These kinds of links are commonly called one-to-one relationship. For example, the key that would connect the Population and State abbreviation tables would be the states. Population has the full state listed and State abbreviation has both the abbreviated name and the full name, but for both tables, the states are listed once. In the PerCapita Calculations table, the states will be listed multiple times due to the fact that daily data is being captured.

I start trying to link the tables by using the common keys that could connect the tables which in this case would be the states, either the full name or the abbreviation. I admit that this linking is done by trial and error in a search for one-to-one relationship or one-to-many relationship since I'm not an expert yet. I will usually create a matrix table visual with numbers to check that the linkings result in the way I expect it too – I always have a tab for testing numerical results when creating measures.

So, here are my table relationships - the red boxed area is of relevance to today's post.

Alright, I think I'm going to end here. I was going to do my usual thoughts on our current state of affairs but it is getting late and it's time to move on.

Similar Posts