Home / Analysis / Experimenting with Heat Maps

Experimenting with Heat Maps

Research data - attempting to do analysis by using a form of heat map

I tried to do this post this morning but my machine wanted to do something else: either Apple was trying to push something or my security software wanted to do some updates. I can't tell which was preventing me from doing anything, either going out to the Internet or shutting off my machine. So I let the machine spend a couple of hours doing what it wanted to do and I initiated my security software to do some updates, just to be on the safe side.

And now it's evening and I'm tired.

Anyway, in today's post I want to talk about something I've been trying to do the last 2 or 3 weeks (starting from before Harvey hurricane came in and then about a week after the Irma hurricane). One of the thing I noticed this year was it seemed our managers were having more problems meeting their anticipated budget metrics. One of the thing we have on our Monday morning reporting system was some kind of color system signaling when certain budget metrics are not being met. It just seemed like more and more of our units were having problems. The problem with my "impressions" of the color system is that it can fool me: either I was not as aware of what was going on last year or my "impressions" were exaggerating the magnitude of the problems.

So I decided to try testing my impressions and see if the data supported the "impressions".

What actually prompted this exercise was a picture I saw on a book, giving me an idea of using a heat map to see if the resulting visuals indicated 2017 worsening relative to 2016 and 2015.

By the way, heat maps are new for me so I'm experimenting here.

The image below on the left is the first attempt. (These graphics are purposely small because I wanted the focus to be on the holistic effect, not the identifying details.) I first created a pivot table listing each unit on the left and the monthly date on the top row. The dates in the data are actually the weekly dates but in the pivot table, you can group dates into months or month-years. I chose month-years. The numbers in the body of the pivot table were variances - actuals minus budget metric. Then the cells and fonts in the body of the table were colored using conditional formatting, using the formula of actuals minus budget to determine the colors. If the actuals were beating the budget, then the cells would be colored some shade of green, but if the actuals were not meeting the budget, the cells would be colored with other colors.

After applying the conditional formatting, the result didn't really say that 2017 was worse than 2016 - at least to me it didn't.

Figure a: original color scheme

Figure b: new color scheme

Lesson 1: Colors Can Direct Your Impressions

After spending a week thinking about the results, I decided that the colors yellow and oranges were making me think those cells were positive rather than negative, so I changed the color scheme. Instead of yellow/orange/red gradations for negative numbers, I decided to stick with gradations of red. And for the green, I decided to try to have more contrast in the gradations because I thought some of the gradations were not differentiated very well and became a sea of light green. Figure b on the right shows the new color schema and is probably closer to reality. Gradations of red meant not meeting the budget, with the darker red being more negative. Gradations of green meant doing better than the budget, with darker green being more positive.

In the new version, 2016 appears to be worse than 2017, so the situation appears to be improving.

Now keep in mind, we're talking about variance against the budget, not year on year changes. It could be that year on year, we are improving or that the actual levels are really close to the tippy top of possibility and budget levels are in the nearly impossible level. Some sites are really in the high 90% level and really would be pressed to reach perfection.

Next, I went on to see if there was anything interesting if I did the same thing but by geographic location (city) or by supervisor.

Figure c: Variance by geography

Figure d: Variance by supervisors

Lesson 2: Sometimes you need to average your data, not sum them, in pivot tables

When you start grouping your data and you sum, you might get an answer you weren't expecting, so check to see if averaging or getting the maximum or minimum works better. So when I went from weekly data to monthly, I had to use the average. This problem became bigger when I started to look at the data from a geographic or supervisor standpoint.


Lesson 3: In pivot tables, if you are re-using a range of data for another pivot table, you will need to disconnect the two pivot tables.

To create another pivot table showing results by geography, I re-used the same set of data but chose geography rather than site to run down the left had side of the pivot table. The problem in doing that was that the first pivot table would then show geography also. The two pivot tables were connected. To disconnect the two tables, I tried making the data range to be slightly different for the second pivot table by adding one extra empty row and that worked.

Googling gave me the same approach to the problem.


Lesson 4: I suspect most of the time there is nothing interesting in the data. Maybe the interesting stuff has been "arbitraged" out.

Or I'm terrible at this. I do believe the interesting stuff is probably in the data not sitting in computers, or what is called "small data" by Martin Lindstrom. It's going to be people behavior that hasn't been captured yet.

By the way, I did look at the timeline against the unemployment rate and the oil price to see if I could visually see any relationship. At this time, I would say no.


Leave a Reply

Your email address will not be published. Required fields are marked *