Part 2 of Using Array Formulas
Before progressing on to the other puzzles in the HR Compliance reporting that I started in the last post, I wanted to provide another example of using arrays as a searching tool when VLOOKUPs or INDEX/MATCH do not work. The more times you use this kind of array formulas, the easier it is to understand. Arrays are not something that are commonly used, at least in places where I've worked.
Sometimes you are faced with multiple spelling variations of a word or phrase such as the Wizard of Oz, wizard of oz, wiz of oz, woz, wwiz off oz, or whatever permutations people come up. Or people give responses to a question and some responses could be clustered as a category: moving home, relocating, transferring, relocation, move, etc. You can't do a simple VLOOKUP because VLOOKUP requires a specificity of spelling and going through that list of responses and categorizing or correcting spellings would take too long. I've hit upon a workaround that works.
The above file shows two tabs: "Data" and "Dictionary Table". In "Data", column A is the downloaded data (part of a larger set) and column B provides a marketing source category. Column B has an array formula that pulls in a marketing source category from the "Dictionary Table" tab based upon the marketing source information in column A.
Create Your Categories
First, the "Dictionary Table" has to be created. You take a list of word or words and sort them in alphabetical order (in this case, column A in "Data"). Then you do an advanced Data filter when you pull in unique records. Your criteria will be a blank cell and you need to make sure you mark the "Unique records only" box as checked.
Once you have your list of unique words or phrases, you will have to go through the task of looking over them and deciding on your categories. You could create a macro to go through the list and count the words/categories so you can see what would be a good choice set of words, but that is for another post. In this one, I'm going to focus on the simpler activities: you eyeball the list and decide upon your categories. For this example, in "Dictionary Table", column B is my list of categories. Columns E through AK is my assignment of the words or phrases in the data that would go against the category. I had to either copy or type in the phrases. This is a manual work but it beats the other alternative of keying in your category to the list of data in "Data" tab.
Key Element in "Dictionary Table"
The key in "Dictionary Table" to making this search work is column C which is simply a formula that catenates all of the columns of words or phrases together into one: =E3&" "&F3&" "&G3&" "&H3&" "&I3&" "&J3&" "&K3&" "&L3&" "&M3&" "&N3&" "&O3&" "&P3&" "&Q3. This is a single "phrase" that contains all permutations of spellings or phrasing that you will search against. This will become clearer (hopefully) as we start to talk about the array formula in "Data" tab.
The Array Formula
These kind of array formulas will be long so first keep in mind this handy trick: in the formula bar (where you can see the formulas just beneath the menu system), you can highlight parts of the formula and hit F9 to see what the result is for that part of the formula. This little trick can help you debug the formula. But when you are finished looking at the results, do not hit - just escape. Otherwise, you will lose the formula and bake in the results into the formula.
The other thing to know, before we delve into the parts of the array formula, is those brackets { } are not created by you keying them in. They are created when you do Ctrl+Alt+Enter.
The full formula in cell B2 in "Data" tab is:
IF(MAX(IF(ISERROR(FIND(A2,'Dictionary Table'!$C$3:$C$35)),-1,1)*(ROW('Dictionary Table'!$C$3:$C$35)-ROW('Dictionary Table'!$C$3)+1))<0,"Other",INDEX('Dictionary Table'!$B$3:$B$35,MAX(IF(ISERROR(FIND(A2,'Dictionary Table'!$C$3:$C$35)),-1,1)*(ROW('Dictionary Table'!$C$3:$C$35)-ROW('Dictionary Table'!$C$3)+1))))
Yes, this is long but the IF part contains 3 pieces:
- MAX(IF(ISERROR(FIND(A2,'Dictionary Table'!$C$3:$C$35)),-1,1)*(ROW('Dictionary Table'!$C$3:$C$35)-ROW('Dictionary Table'!$C$3)+1))<0 - if the result of this long formula is less than zero, then do #2
- "Other" - enter into the cell the word "Other" (instead of choosing one of the categories in the "Dictionary Table") if we can't find the phrase in the "Dictionary Table".
- INDEX('Dictionary Table'!$B$3:$B$35,MAX(IF(ISERROR(FIND(A2,'Dictionary Table'!$C$3:$C$35)),-1,1)*(ROW('Dictionary Table'!$C$3:$C$35)-ROW('Dictionary Table'!$C$3)+1))) - this one will be broken down further but this is the part that will pull in the category from "Dictionary Table".
ROW Portion of the Formula
Starting from the part with the ROW: (ROW('Dictionary Table'!$C$3:$C$35)-ROW('Dictionary Table'!$C$3)+1). If you were to highlight that part of the formula in the formula bar and press F9, you will get an array of numbers going from 1 to 33: {1;2;3;4;...;32;33}. We're basically numbering the categories in the "Dictionary Table" from 1 to 33.
FIND Portion of the Formula
FIND(A2,'Dictionary Table'!$C$3:$C$35): Normally FIND is used to find a string within a larger string - FIND(string, larger string) or FIND(A2,'Dictionary Table'!$C$3). Here we are going to find a string within a column of strings $C$3:$C$35 in the "Dictionary Table". If you did F9 on the FIND portion, you would get something like {#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;1;#VALUE!;#VALUE!;...#VALUE!;#VALUE!}. What this is saying is that the first category in the "Dictionary Table" did not contain the string in cell A2 in "Data" tab. Nor did the second category. Not until we reached the 9th category did we find the string in "Dictionary Table".
IF(ISERROR Portion of the Formula
IF(ISERROR(FIND(A2,'Dictionary Table'!$C$3:$C$35)),-1,1): In order to make the entire formula work, we need to transform #VALUE! into something workable so that the entire formula does not give #VALUE! as an answer. So anywhere #VALUE! is found, #VALUE is replaced with -1. So doing F9 on the IF(ISERROR part of the formula gives you {-1;-1;-1;-1;-1;-1;-1;-1;1;-1;-1;...-1}.
Combine IF(ISERROR with ROW Portions of the Formula with *
IF(ISERROR(FIND(A2,'Dictionary Table'!$C$3:$C$35)),-1,1)*(ROW('Dictionary Table'!$C$3:$C$35)-ROW('Dictionary Table'!$C$3)+1)
What this does is basically multiply two arrays together: {-1;-1;-1;-1;-1;-1;-1;-1;1;-1;-1;...-1} * {1;2;3;4;...;32;33} to get {-1;-2;-3;-4;-5;-6;-7;-8;9;-10...;-33}.
So tack MAX in front of it and you get 9 which is the 9th category in the "Dictionary Table".
INDEX Portion of the Formula
For those of you who use INDEX/MATCH combination, the rest of the formula begins to fall into place. INDEX/MATCH is a more sophisticated and more flexible way of doing VLOOKUPs. The INDEX syntax is INDEX(range to search, row position in range, column position in range). If you have a column of data, the column position of the range would be 1. Most of the time MATCH is used to find the position in a range. I won't go into details of this here because we don't need it.
So INDEX becomes INDEX('Dictionary Table'!$B$3:$B$35, 9, 1). We are looking at the intersection of the 9th row and the 1st (and only) column of the range of categories in column B of the "Dictionary Table".
Wrap Up
This is a pretty ingenious way of doing VLOOKUPs when simple VLOOKUPs or INDEX/MATCHs fail. By the way, you could use SEARCH instead of FIND. FIND is case sensitive so if you don't mind the capital versus small letters, use SEARCH instead to give you greater flexibility.
This was a long post but I think it is useful to have another example in order to understand a complex subject such as arrays. You can use this same technique if you find that people continuously either misspell locations/departments or use acronyms and you need to group locations/departments somehow. I once used this technique when an accountant used various spellings for a department while she was doing journal entries and I needed to be able to pull together monthly expenses. I found that she had at maximum five permutations of spelling/acronyms so I created a similar "Dictionary Table" where 5 columns of spelling permutations were catenated.
The array formula is a really powerful technique.