Linking Tables in Power BI: One Work Around When Your Table Relationships Do Not Work

Sometimes I try to graph a relationship between two sets of data through the use of table relationships but the results often fail. An example would be connecting a customer table with addresses and another table with customer purchases and I want to see if there is anything interesting about the geographic nature of customer purchases. The identifying feature that connects the two tables is the customer id number. I connect the two tables through table relationships. The customer table with addresses generally has only one entry for each customer (although not always if the customer moves) and the purchasing table will have many entries for a customer. This type of connection is a one-to-many relationship and they generally should work…although not always for me.

In my latest case though, I have been trying to connect the corona data from Wikipedia, the John Hopkins University (JHU) and the CDC so I could chart their daily cases on a line graph to see whether the data from different sources are actually trending very similarly or if the data sources are very different. I wanted to make sure the data were very similar so that if one data source temporarily goes down, I can trust the others because they track similarly. The data sources won’t be exactly the same due to maybe differences in collection process during the early days of the pandemic, or differences in how state corrections are updated, or even differences in terminology definitions. I know that at a cumulative level, Wikipedia is always less than the JHU and the CDC while JHU and the CDC are very close.

(Image 1 below has been enlarged for easier reading.)

Image 1: Cumulative Totals for Wikipedia, CDC and JHU

I was curious how well they tracked together; however, I was having problems trying to get the numbers line up correctly for each date. The table relationship was not working very well and that was probably because each of the data source had very different table structures (I’m not 100% sure but for right now, that is my reasoning). Also, the table relationships were a many-to-many relationship which can lead to weird results.

Image 2: CDC Data Source

Image 3: JHU Data Source

Image 4: Wikipedia Data Source

The images are hard to read but I have added links to the data sources at the end of this post so you can see them online. You will see that each data source handles the layout of the data very differently.

What I did as a workaround was to make copies of the data tables (in case I mess up very badly) and do further transformations to make them more alike. They don’t have to be perfectly alike but enough alike that I could append the tables together. In effect, I decided to make one big table with all of the data combined instead of doing a table relationship (mainly to get around the many-to-many issues); thus, I added a new column describing which source a data came from.

Image 5: New Realigned CDC Data

Image 6: New Realigned JHU Data

Image 7: New Realigned Wikipedia Data

Notice that the new CDC table has more columns than the others but that doesn’t matter. As long as there are similar columns and not too many other columns to confuse Power BI, these tables will append together. I used append rather than merge because I am not adding new columns other than a “Source” column.

I arrived at the new tables by basically deleting additional columns such as states and such, maybe moving around the date column, and adding a new column called “Source”. Then, starting from the Wikipedia query as my reasoning was I would get just the number of columns the Wikipedia query ended up with, I appended the data for CDC and JHU. Now, if I wanted to look at trending by states, I would have to re-do the tables to include the states.

Here’s the new table with all of the data combined into one table. Interestingly enough, this new table added the columns from the CDC anyway.

Image 8: Combined Table

Now this method does not really resolve the table relationship issue; I just found a completely different work around. The other issue is if I wanted to update the table with new data: it takes roughly 45 minutes.

But I do get these lovely graphics comparing CDC to JHU to Wiki, both in moving averages and daily case counts.

Image 9: Daily Cases Comparison of CDC, JHU and Wikipedia

Image 10: Comparison of Cases Moving Average for CDC, JHU and Wikipedia

Similar Posts