| |

When Your Data Source Breaks – what to do

On Friday, Jan 1st, as I was doing my nightly ritual – updating my files with the latest coronavirus data – I got a weird image.

Graphic 1: January 1, 2021 at beginning of graph rather than at end after December 31, 2020.

That long blue vertical line at the front of the graph, around January 1st? That is supposed to be January 1, 2021. Originally, the dates from Wikipedia looked like this – dates with no year:

Graphic 2: Original date format from Wikipedia

What the Power Bi did was convert all of the dates into the year 2021 (March 21, 2020 became March 21, 2021; March 24, 2020 turned into March 24, 2021; and so on) – I suspect using the January 1st data as the blueprint. I stayed up until almost midnight on Friday trying to get a working version of the graphs but I eventually stopped with a plan to continue on Saturday. The big showstopper was having to update all of the graphics with the new “date” variable I had created to correct the problem in the Power BI. I thought Saturday would be the day spent on fixing things.

So, the first lesson when dealing with data from another source is:

Watch out for when the year rolls over. Or the dates may bite you.

Most larger companies or companies populated with engineers will have the foresight and the knowledge to set up the date data properly, but I find smaller ones may not handle dates very well. I did not have problems with data from the Covid Tracking Project or from John Hopkins University; it was the Wikipedia page that threw off the Power BI. But that’s okay because Wikipedia is a volunteer effort so I don’t expect them to be data managers or experts in such things. When I first started pulling data from them, I did wonder what would happen when the year switched over and kind of suspected I would have problems. And that came to pass.

So, Friday I was on the look out for issues regarding the new year and the expectation of date failures didn’t fail me. But what I did not expect was Wikipedia correcting the issue the very next day. That surprised me. They actually were on the ball or someone told them.

However, in addition to correcting the date issue, Wikipedia also proceeded to change some of the column headings. That was a surprise and caused quite a bit of problems and a bit of time trying to adjust to the new column headers.

And this leads to the second lesson when dealing with data from another source:

Develop a strategy to handle efficiently column heading changes. Or you may spend hours fixing your visuals.

Here is where I progressed in fits and starts, starting on Saturday when I learned that Wikipedia changed some column headings as well as fix the date. Some things I tried caused the program to run excessively long time, making me wonder if I broke the file.

Rather than illuminating all of the different approaches I tried, I will just strip down to what I finally figured out would be the most efficient approach for adjusting to column heading changes from the data source.

First, let’s look at the Query Applied Steps and the code or formula for a particular step – I’m making Graphic 4 larger to make the formula line more legible since it’s Microsoft’s “Snip and Sketch”:

Graphic 3: Query Applied Steps – “Changed Type”
Graphic 4: The code or formula for “Changed Type” Applied Step

The “Changed Type” step lists out all of the column headings that will come in through data refresh so if the column heading changes, problems will pop up here. Originally, the highlighted column headings were “Confirmed New”, “Confirmed Cml”, “Deaths New”, “Deaths Cml”, etc. I had to change the name of the column headings to “Confirmed Daily”, “Confirmed Total”, “Deaths Daily”, “Deaths Total”, etc., like you see in the second image above, because those were the new column headings as of January 2, 2021.

This change is what you will always have to do (at least as far as I’m aware of) whenever your data source makes a change in the column headings.

But you can’t stop there. There are other lines in the Query Applied Steps that may utilize those discontinued column headings, so those code lines will have to be revised as they are in error too. Once you have identified all of those code lines to be revised, you then might have to update the charts using those newly renamed fields.

So, if you have a lot of charts, you will have a lot of revising to do. You could spend quite some time going through all of your charts and maybe even your measures, changing everything to match the new column headings/field.

Here’s where strategy comes in: Make the least amount of changes/revisions by focusing on the reducing changes in the power query side.

What I did was instead of re-doing all of those lines of codes in the Applied Steps, I just added a step that renamed the new column headings back to the old column headings as Graphic 6 shows (again, image enlarged to make it easier to read).

Graphic 5: Query Applied Steps – Renamed Columns
Graphic 6: The code or formula for the renamed columns

Such a simple step. Just doing that alone cuts out a lot of work. The program might run a while but it will be a lot better than trying to redo all of the other applied steps code and all of the charts. After the first refresh, the hang-up in the program goes away.

Similar Posts