| |

Power BI Measures for Roll Ups

The topic is going to be a little bit denser today as I will try to discuss creating measures for when you have several layers of roll ups. By roll ups, I mean you have granular detail data such as counties rolling up to state level details and then rolling up to USA level kind of detail. For today’s discussion, I’m using moving average: moving average for USA, moving average for states and moving average for counties.

When creating measures for these kind of roll ups, I find that I have to verify the results using Excel because the results can often turn out to be incorrect. I’m still learning how the whole Power BI works, so I always cross check the results using Excel.

When I first started tracking the coronavirus in order to learn Power BI, I originally pulled data from Wikipedia and then later, in 2021, I found the CDC and then even later the JHU raw data.

For discussion I will skip the Wikipedia data because they just recently updated their data structure and I need to work out the errors that are now popping up.

CDC Data

First the CDC. In the CDC data, I have the state level data that can roll up to the USA level data. I also did something that enabled me to get regional information: I connected a region table (list of states and the regions they fall in) to the CDC’s cases and deaths table to make a relationship.

When I first started developing the moving average measure, I created the following:

Moving_Ave_new_cases_CDC = 
CALCULATE (
    AVERAGEX ('CDC Cases and Deaths','CDC Cases and Deaths'[new_case]),
    DATESINPERIOD (
        'CDC Cases and Deaths'[submission_date],
        LASTDATE ( 'CDC Cases and Deaths'[submission_date]),
        -('Non-repatriated COVID-19 cases in the US by state (  vte  )'[Day Variable]),
        DAY
    )
)

The formula is basically calling for an average of new cases (or I call it daily cases) in the last # of days residing in the “Day Variable”. The “Day Variable” contains the number 7, so I’m calculating a moving average over the last 7 days or over the last week.

You could probably create this formula using the New Measure icon. I think I started using that route but then slightly changed the measure to include a variable for number of days.

This formula worked perfectly for the state level, but I couldn’t make it work for rolling up to the USA level. The two graphics below show what I mean, and they pack a lot of information. To see what I’m talking about, the lower left corner of each Graphic 1 and 2 are the graphics to focus on. On the lower left chart for both Graphic 1 and 2, the blue line is the moving average line created by the measure titled “Moving_Ave_new_cases_CDC”. The bottom left graphic 1 is filtered in the Filter Pane at the state level, in this case Texas, whereas the bottom left graphics 2 has no filter, so it defaults to USA level. Note that the moving average line works at the state level in Graphics 1 but fails at the USA level in Graphics 2. The result is just a flat line.

Graphics 1 showing State and Region level daily cases and moving averages, filtering at state level
Graphics 2 showing USA and Region level daily cases and moving averages, no filtering.

So, I had to experiment to find a different formula for creating moving average at the USA level. Here is what I found to work at the USA level:

zUSA_new_cases_Moving_Ave_CDC = 
CALCULATE (
    SUMX('CDC Cases and Deaths','CDC Cases and Deaths'[new_case]),
    DATESINPERIOD (
        'CDC Cases and Deaths'[submission_date],
        LASTDATE ( 'CDC Cases and Deaths'[submission_date]),
        -('Non-repatriated COVID-19 cases in the US by state (  vte  )'[Day Variable]),
        DAY
    ))/('Non-repatriated COVID-19 cases in the US by state (  vte  )'[Day Variable])

Instead of using an average function AVERAGEX, I used SUMX and divided that sum by the “Day Variable” of 7. It looks like the same kind of calculation, but it is that subtle difference that makes the second measure work for USA level moving average. Now, in Graphics 1 and 2, the upper left-hand graphics uses this second measure and note that the measure works both at the state level and at the USA level, unlike the graphics in lower left-hand corner.

Finally, to just add in a little twist, I find that both measures (“Moving_Ave_new_cases_CDC” and “zUSA_new_cases_Moving_Ave_CDC”) work with regional levels, when I filter one of the fields on the regions. In Graphics 1 and 2, the upper right graphs display the moving averages for the regions, and these graphs are created by dropping the region field into the Legend box to separate out the moving averages for each region.

Confused? Yes, me too. I don’t really know why SUMX and division by 7 works better than AVERAGEX.

JHU Data

Once I cracked the CDC data and got the graphs working, I thought I could apply the same reasoning to the JHU data but I ran into problems because of the 3 tier roll ups. The CDC has states and USA roll ups while JHU has counties, states and USA roll ups. I had trouble handling the moving average for the counties.

I don’t remember the exact steps I took to trying to get the JHU data to work, so I decided to replicate the paths for the CDC data that I took to do this post. So, beginning with the AVERAGEX formula, I get the following measure:

1_Moving_Ave_AVERAGEX = CALCULATE (
    averageX (FILTER('time_series_covid19_confirmed_US','time_series_covid19_confirmed_US'[Date]),'time_series_covid19_confirmed_US'[DailyCases]),
    DATESINPERIOD (
        'time_series_covid19_confirmed_US'[Date],
        LASTDATE ( 'time_series_covid19_confirmed_US'[Date]),
        -('Non-repatriated COVID-19 cases in the US by state (  vte  )'[Day Variable]),
        DAY
    )
)

Using the AVERAGEX function worked only on the lowest level of the roll up which in this database would be the counties. The measure fails at the state and USA level as seen in Graphics 3 and 4. In the CDC data, AVERAGEX worked at the state level and not at the USA level.

Then I tried using a measure with SUMX and found that measure worked for all three levels. I don’t recall being able to get at the county level but then, I don’t remember exactly what I did. It’s interesting that SUMX works for all levels of roll ups. The measure’s formula looked like:

2_Moving_Ave_SUMX = CALCULATE (
    sumX (FILTER('time_series_covid19_confirmed_US','time_series_covid19_confirmed_US'[Date]),'time_series_covid19_confirmed_US'[DailyCases]),
    DATESINPERIOD (
        'time_series_covid19_confirmed_US'[Date],
        LASTDATE ( 'time_series_covid19_confirmed_US'[Date]),
        -('Non-repatriated COVID-19 cases in the US by state (  vte  )'[Day Variable]),
        DAY
    )
)/'Non-repatriated COVID-19 cases in the US by state (  vte  )'[Day Variable]

I added a table to double check the numbers. You can see that the moving average measure using AVERAGEX also failed in the table in Graphic 3. The numbers in the table in Graphics 4 looks about right and calculations in Excel confirmed them.

By the way, in each of the graphics below (3 through 6), some filtering was done in the Filter Pane. I filtered the state field for California and filtered the county field for Los Angeles.

Graphics 3 showing daily cases and moving averages for all 3 levels: county, state, and USA, using AVERAGEX
Graphics 4 showing daily cases and moving averages for all 3 levels: county, state, and USA, using SUMX

The measures for the CDC versus for JHU are almost the same except the JHU measures utilized the FILTER. For my own satisfaction, I needed to test the JHU measures without the FILTER component and then try the graphics again.

I get the same results. I thought the FILTER aspect might make a difference, but it does not.

The revised measures for the JHU became:

3_Moving_Ave_AVERAGEX = CALCULATE (
    averageX ('time_series_covid19_confirmed_US','time_series_covid19_confirmed_US'[DailyCases]),
    DATESINPERIOD (
        'time_series_covid19_confirmed_US'[Date],
        LASTDATE ( 'time_series_covid19_confirmed_US'[Date]),
        -('Non-repatriated COVID-19 cases in the US by state (  vte  )'[Day Variable]),
        DAY
    )
)

4_Moving_Ave_SUMX = CALCULATE (
    sumX ('time_series_covid19_confirmed_US','time_series_covid19_confirmed_US'[DailyCases]),
    DATESINPERIOD (
        'time_series_covid19_confirmed_US'[Date],
        LASTDATE ( 'time_series_covid19_confirmed_US'[Date]),
        -('Non-repatriated COVID-19 cases in the US by state (  vte  )'[Day Variable]),
        DAY
    )
)/'Non-repatriated COVID-19 cases in the US by state (  vte  )'[Day Variable]


And Graphics 5 and 6 appear no different from Graphics 3 and 4:

Graphics 5 showing daily cases and moving averages for all 3 levels: county, state, and USA, using AVERAGEX but without FILTER
Graphics 6 showing daily cases and moving averages for all 3 levels: county, state, and USA, using SUMX but without FILTER

Conclusion

At this time the only conclusion I will make – other than it’s confusing – SUMX works at all levels. The FILTER component in the measure appears to make no difference – at least in how I set up the formulas.

I will have to revisit this again and see if I encounter anything new. It’s interesting that SUMX will work for all levels of roll ups, but AVERAGEX won’t. I previously thought that SUMX worked for one level of roll up and AVERAGEX worked for another level. But I was wrong – I was confused.

For my next post, I want to write down the problems I had using the filter “graphic” in trying to get county data into a bar and line graph.

Similar Posts