|

Creating Measures in Power BI

A couple of posts ago, when I was discussing the three ways of creating calculations (measures, new columns in Power Query, and new columns in the charting/visualization view), there was at one point in the post an image of 2 moving average measures and I mentioned that I would do a post about why I have 2 of them, possibly the next day. Well, I forgot about that. This post will address why I might have multiple measures calculating seemingly the same thing.

Now keep in mind that I am still in the beginning stage of learning how to work with Power BI and don’t have a lot of years of experience under my belt so there may be a way to develop one measure rather than multiple measures to calculate a certain metric, say moving average, for example, or the maximum number.

One formula should do the trick, right?

I thought so too, but, alas, I found that the calculations did not always work.

The issue has to do with how the data is structured: if you have multiple rollups, you are going to have to figure out how to structure your measures. As an example, the Covid Tracking Project has data for each state tracked on a day-by-day basis for cases, hospitalizations, deaths and a whole host of other things we can track. All of the states combined roll up to the overarching United States level, so we have a US level day-by-day data for cases, hospitalizations, deaths, etc. Let’s call that one level of rollups.

Single Level Roll Ups

Here’s the basic measure for cases – the first one being created:

posInc_Moving_Ave_Covid_tracking = 
CALCULATE (
    AVERAGEX (FILTER('Covid Tracking Historical','Covid Tracking Historical'[Attribute]="positiveIncrease"),'Covid Tracking Historical'[Value]),
    DATESINPERIOD (
        'Covid Tracking Historical'[Date],
        LASTDATE ( 'Covid Tracking Historical'[Date]),
        -('Non-repatriated COVID-19 cases in the US by state (  vte  )'[Day Variable]),
        DAY
    )
)

In the data from the Covid Tracking Project, there is a field “positiveIncrease” that represents a single day’s case count. Because of the way the data came in from the Covid Tracking Project, I had to “unpivot” parts of the data and the different types of measures (cases, deaths, hospitalization, etc) became a column called “Attributes”. The field “Value” would hold the values for each of those attributes.

The moving average measure calculates the moving average by taking the average (AVERAGEX) of “positiveIncrease” over so many days in a certain period range. But first, we have to filter (FILTER) the Covid Tracking data’s attributes for “positiveIncrease” (representing daily case infections) and discard the other measures (deaths, hospitalizations, etc.) The DATESINPERIOD dictates the number of days to take the averaging. The variable ‘Non-repatriated COVID-19 cases in the US by state ( vte )'[Day Variable] is a variable previously created for another chart and currently holds a constant 7 for 7 days and actually resides in another table. I could have created that variable in the Covid Tracking Project data table but I just decided to re-use this variable.

This measure “posInc_Moving_Ave_Covid_tracking” works beautifully at the state level but fails at the U.S. level. To get at the moving average at the U.S. level I had to do something slightly different:

posInc_US_Moving_Ave_Covid_tracking = 
CALCULATE (
    SUMX (FILTER('Covid Tracking Historical','Covid Tracking Historical'[Attribute]="positiveIncrease"),'Covid Tracking Historical'[Value]),
    DATESINPERIOD (
        'Covid Tracking Historical'[Date],
        LASTDATE ( 'Covid Tracking Historical'[Date]),
        -('Non-repatriated COVID-19 cases in the US by state (  vte  )'[Day Variable]),
        DAY
    )
)/7

This is the version that worked for me: I took out the averaging function and used SUMX to sum up the daily cases variable “positiveIncrease” over a period of 7 days and then I divided that result by 7 (I could have used the ‘Non-repatriated COVID-19 cases in the US by state ( vte )'[Day Variable] variable).

Two Levels of Roll Ups

What if I have two levels of roll ups? The John Hopkins University (JHU) data falls under that scenario and here’s what I did to make that scenario work. The JHU repository had data down to the county level, so I can pull county, state and overall U.S. numbers.

The following images for California illustrate the use of 3 different moving average measures: US level, state level, and county level. The left-hand side shows a line graph of the daily infection count. As you can see, using daily numbers creates a jagged line effect whereas on the right-hand side, the line smooths out if we use a 7-day moving (or rolling) average.

Image 1
Image 2

I’m using John Hopkins University data.

Here’s the county level measure.

CountyLA_Moving_Ave_JHU = 
CALCULATE (
    AVERAGEX (FILTER('time_series_covid19_confirmed_US','time_series_covid19_confirmed_US'[Combined_Key]="Los Angeles, California, US"),'time_series_covid19_confirmed_US'[DailyCases]),
    DATESINPERIOD (
        'time_series_covid19_confirmed_US'[Date],
        LASTDATE ( 'time_series_covid19_confirmed_US'[Date]),
        -('Non-repatriated COVID-19 cases in the US by state (  vte  )'[Day Variable]),
        DAY
    )
)

I don’t know why, but a plain Jane moving average variable did not work; consequently, I found I had to specify the county, in this case Los Angeles.

Moving on to the state level measure:

StateCA_Moving_Ave_JHU = 
CALCULATE (
    sumX (FILTER('time_series_covid19_confirmed_US','time_series_covid19_confirmed_US'[Province_State]="California"),'time_series_covid19_confirmed_US'[DailyCases]),
    DATESINPERIOD (
        'time_series_covid19_confirmed_US'[Date],
        LASTDATE ( 'time_series_covid19_confirmed_US'[Date]),
        -('Non-repatriated COVID-19 cases in the US by state (  vte  )'[Day Variable]),
        DAY
    )
)/'Non-repatriated COVID-19 cases in the US by state (  vte  )'[Day Variable]

Here you see me filtering for the state of California and using the [Day Variable] rather than a constant of 7.

And finally, here’s a variable for the U.S. as a whole:

US_Moving_Ave_JHU = 
CALCULATE (
    sumX (FILTER('time_series_covid19_confirmed_US','time_series_covid19_confirmed_US'[Date]),'time_series_covid19_confirmed_US'[DailyCases]),
    DATESINPERIOD (
        'time_series_covid19_confirmed_US'[Date],
        LASTDATE ( 'time_series_covid19_confirmed_US'[Date]),
        -('Non-repatriated COVID-19 cases in the US by state (  vte  )'[Day Variable]),
        DAY
    )
)/'Non-repatriated COVID-19 cases in the US by state (  vte  )'[Day Variable]

I replaced the filtering by state with [Date] and it mysteriously worked. I crosschecked the results against Excel because that was weird. That measure doesn’t make a whole lot of logical sense to me and requires further investigation but for now it works for me.

General advice for creating measures

Needless to say, when developing your measures, you need to check the results because they may not be reporting the results you need. You will have to play around with the formula, starting with the simplest which is AVERAGEX to see what the measure gives you. Then proceed on to SUMX and various FILTERing to see if you can arrive at what the measure should be showing you.

That’s my advice for creating measures at this time. It could change as I learn more.

Similar Posts