Everybody talks about big data and analytics, (my understanding is that they are related but not the same thing) but we have probably been doing some form of analytics all along. Analytics is just a snazzy term for analysis which has always been around since man has a brain. Big data is a term that came about due to the proliferation of all kinds of data, not just numbers but also text, pictures, audios and who knows what else. The ability to store, retrieve and use those data is what is driving big data, but we’ve always had some kind of analytics.
What kind of analysis? It can be anything, although whether they are useful is another thing. Probably only a small subset of data or analysis is truly useful, despite all of the breathless talk about the data available.
As an example of the kinds of analysis…this past week I’ve been doing the last of the testing of the budget file which includes some interesting conundrums my boss posed to me. To me, those conundrums were analytical in nature. They were not a simple add and compute problems but more of a judging, analytical kind.
The one I’ll talk about in this post concerned the concept of contract gains/loss. This concept is basically what you could have sold (the market value) minus what you actually sold for (contract value). For the budget, January had to be input, or basically forecasted, before the rest of the months would be automatically filled in. So January was the trick because everything else was calculated based off of the initial input of January gain/loss; however, this initial input required forecasting a couple of months beforehand and everybody hated doing that. My boss said the supervisors generally did pretty good but still…
The only thing I could think of right off the bat was to use historical data but I really didn’t have that, at least not in the beginning. After working with the file and doing testing, I found that I did have data for 2013, which was better than nothing. (When my boss gets back from vacation, I will see if I can get more current data.)
From there, I had to decide on an easy way to incorporate the historical data in a fashion the supervisors could use. At first I thought a chart might help but then I decided a hard number might be more useful because the supervisors can actually use that number in the forecast for January. In the end, I decided on a moving average calculation, where the supervisors could play with how many months to do moving averages and then see the moving average gain/loss results throughout the year, with the suggested gain/loss being the last moving average for the year. The supervisors don’t have to use that number but at least they can see historical data and the calculated moving averages to help them make an initial guess.
So in this instance, I had a problem of how to help the supervisors arrive at the initial guess for January’s gain/loss. As an analysis, I provided moving averages of historical data to show them trends, if any, and provided a suggestion. In addition, I gave them flexibility to play around with how many months to do moving averages.
We’ve always have had some kind of analytics. It has just been given a sexy spin in the last few years.
[divider]Excel formulas to develop moving averages[/divider]
There is a function in Excel for moving averages but it gives you a chart of moving averages, so if you wanted the numbers, you have to develop the formulas. That is pretty simple: just sum up over the number of months you want and then divide that by the number of months. So if you wanted moving averages over four months, you would sum up January through April and divide by 4. That would provide you the average for April. May would be calculated as the sum of February through May divided by 4. And so on through the year. Please note, January, February and March would have no moving averages when the number of months is 4.
But what if you wanted the number of months to vary for experimental purposes? Ahh, then the formulas are not quite so simple. But this is where the fun begins!
First I tackled the issue of denoting some months without moving averages: so if the number of months is 3, then January and February need to be blank. I started out by setting up a row with counters 1 through 12. Let’s say this is on row 22, starting at H22. Also, let’s say the input cell for number of months is cell F26. The historical data for gain/loss is on row 24, starting in H24 for January and ending in S24 for December. Then the moving average row would contain a formula like: =If(H22<$F$26,””,other formula). So if the number of months is 6, then the first 5 months (1, 2, 3, 4, 5) would be blank because the counter on row 22 would be less than the number of months in F26.
The next problem was to actually sum up the correct months and then divide by the number of months. The way I did this was to first get a sum total up to the month in consideration. So if I’m trying to figure out the moving average for April with a given number of months of 3, I will first add up January through April. But I really want the summation of February through April so I figured out a way to subtract out January. Since April is month 4 and I only want to include 3 months in the summation, I want only months 2, 3, and 4. Another way to look at it is anything equal to or less than 4-3 (April’s count minus the number of months in F26 – in this example 3) will be excluded from the summation. As another example, a three month moving average for October will include months August, September and October (months 8, 9, 10) which is the same as saying anything less than or equal to 10-3 = 7 will be excluded from the summation of January through October.
So the final formula looks like: =IF(H22<$F$26,””,(SUM($H$24:H24)-SUMIF($H$22:H22,”<=”&H22-$F$26,$H$24:H24))/$F$26) This formula would be in the January column. The February formula would look like: =IF(I22<$F$26,””,(SUM($H$24:I24)-SUMIF($H$22:I22,”<=”&I22-$F$26,$H$24:I24))/$F$26).