Excel Trick Part 2: Converting certain downloads into flat files
(This is terribly long and technical. While I tried to proofread this, parts of it may come off garbled. Writing technical stuff is actually difficult.)
Earlier in the week I wrote a post that described some of the formulas that were created in order to convert a downloaded information into a flat file suitable for pivot tables. This post is more of the same and therefore very technical. Hopefully, I’ll wrap it up today.
To kind of set up the scene, there is a report tab containing the downloaded report and a flat file tab where I’m trying to convert the report into a flat file.
First, let’s take the months. The month header in the report tab runs across the top and was actually the first thing I tackled. The GL/site formulas came later. For the months, I used the offset formula because I could picture counting off positions from the January month. I could have done the same for the GL/sites columns but I thought the formula would get too complex.
The offset formula is structured as thus: OFFSET(a, r1, c1, r2, c2)
– where A is the home position that you want to count off of, in this case the cell holding January;
– r1 is the start row of the range of cells (or a single cell) you are considering and is counted from the home position; here we’re looking at a single cell at a time containing the month we want and it will be in row 0 (0 rows away from the January cell, if it was 1 that it would mean 1 cell down from the January cell);
-c1 is the start column of the range of cells (or a single cell) you are considering; here, it will be a single cell relating to a particular month and it will be columns from 0 to 11 as we move from month to month;
– r2 basically says how many rows there are in the range; in this instance 1 for a single cell;
– c2 says how many columns there are in the range; in this instance 1.
Frankly, you have to play with OFFSET to really understand how the formula works because you can work it using a range or work it as a single cell. For the problem of flat files, I used it as a single cell but I have done some spreadsheets where I used a range. For the problem at hand, let’s pretend the January field is located in cell E9, so the first part of the formula will be OFFSET($E$9, 0, c1, 1, 1).
The zero in the second part of the formula relates to how many rows away from the row January is sitting on. Since all of the months are on row 9, we will not be moving from that row; hence the zero in the second part.
Now we need to work on what column we’re on – c1. Remember the counting system set up in column A in the last post? We will use this to count the cycles.
To refresh your memory, column A of the flat file tab was used to create a counting system. In the report tab, I also included a cell $B$1 that counted the number of rows of data in the report, using function COUNTA.
So if you have 30 rows of data in your report (to continue the example started in the last post), your 31st item will be the first data for February and the data will be in a new column, starting the next cycle. By dividing A column in the flat file tab by the count of rows in the report tab, you can get how many times you are cycling through the items. So using the 31st item again, you would be looking at 31 divided by 30 which would be 1.0333. For the 65th item (we would be pulling data from the March column), 65/30 makes 2.1667. Column 0 is January, 1 is February, 2 is March and so on. To remove the decimal part, you will need to use the INT function. So the OFFSET formula becomes OFFSET($E$9, 0, INT(A1/COUNTA), 1, 1) where A1 refers to the column in the flat file tab where I’m counting off the rows and COUNTA is the cell in the report tab that counted the number of items in the report. A1 does not remain static; it moves down the column.
However, there is one slight problem: at what I call the inflection point, the INT function fails so I have to tweak it. The inflection point is at the point of perfect multiples of the number of rows in the report. So if you have 60 rows of data, the inflection points would be 60, 120, 180 and so on. I modify the formula using the IF statement and the MOD function to handle the inflection problem so the formula becomes OFFSET($E$9, 0, IF(MOD(A1,COUNTA)=0, INT(A1/COUNTA)-1, INT(A1/COUNTA)-1), 1, 1).
The 1’s for r2 and c2 simply reflects how large the range is. Since we’re looking at a single cell for a month, r2 = 1 and c2 = 1.
The final piece is pulling in the dollar values. Okay, I’ll try to keep this short and simple. I’m basically doing something similar to VLOOKUP but am using INDEX and MATCH. The MATCH function tells you where in the range an item exists. So in the report tab, March will be the 3rd item listed in the January through December range. For the rows in the report, there is a counting column so the 15th item will be on row 15 of the data range.
In the flat file tab, I have the following columns: Counting column A running from 1 to whatever, GL column, site column, month column, value column and off to the right, another counting column but going from 1 to the number of rows in the report tab (example, 30) and then starting over again at 1, repeating the count again.
To pull in the value for each row in the flat file tab, I will look up the month on that row and the count in the last column off to the right. I will take that month and find the position of that month in the report tab, using MATCH. I will also take the count and find the position in the report tab, again using MATCH. The formulas are 1) MATCH($Dx,’Report tab’!$E$9:$P$9,0) for the month where x means the current row and D is the month column in the flat file tab and 2) MATCH($Hx,’Report tab’!$A$11:$A$41,0) for the count position where H is the counting column in the flat file, and x is the current row. Again, in this example the report has 30 rows of data.
I won’t go into a lot of explanations into the formula for the counting column that is off to the right (column H) because I already went over it in a roundabout fashion in the first post but the formula will be:
IF($A5-INT($A5/’Report tab’!$B$1)*’Report tab’!$B$1=0,’Report tab’!$B$1,$A5-INT($A5/’Report tab’!$B$1)*’Report tab’!$B$1) where $B$1 is the cell counting the number of rows of data in the report using COUNTA function.
Finally, to pull all of this together to get the value, I will use the INDEX function. It will be in the form of INDEX(range of data in report, row, column). The specifics of the formula will be listed below.
Needless to say, if you have a lot of data, then you will have a lot of formulas which will slow down your file. I’m finding that is what is happening so the next thing I did was to write up a macro which performs essentially a lot of copying but much faster than I can ever do. That will probably be my next post.
[divider]Formulas Recap[/divider]
Second counting column in flat file tab, one repeats from 1 to the number of rows in the report tab: IF($A5-INT($A5/’Report tab’!$B$1)*’Report tab’!$B$1=0,’Report tab’!$B$1,$A5-INT($A5/’Report tab’!$B$1)*’Report tab’!$B$1). $B$1 is the cell in the report tab that contains the COUNTA to provide the number of rows in the report. This one resides in column H.
Month column in the flat file tab: OFFSET($E$9, 0, IF(MOD($A5,’Report tab’!$B$1)=0, INT($A5/’Report tab’!$B$1)-1, INT($A5/’Report tab’!$B$1)-1), 1, 1). $E$9 is the cell where January field header resides. $A5 is the start of the counting column in the flat file.
Value column: INDEX(‘Report tab’!$E$11:$P$41,MATCH($H5,’Report tab’!$A$11:$A$41,0),MATCH($E5,’Report tab’!$E$9:$P$9,0)). In the report tab, the months headers are found in E9 to P9, the counting column found in A11 to A41 (30 rows of data) and the dollar data in E11 to P41. $H5 is the second counting column in the flat file. See the first formula.
You must be logged in to post a comment.