Using Array Formula for the ACA Data Review
Since Saturday I’ve been working on reviewing data for the ACA forms (healthcare reporting for employees) which I just completed today. The forms are now in the process of being printed and put into envelopes for mailing. As part of my review I was looking at the internal logic of timelines for each employee to make sure there were no obvious red flags. One of my review was a search for employees who were terminated in the early part of the year and then rehired later in the year – what I call the missing months in coverage. I wanted to make sure that the code 1’s and 2’s were applied properly: those who were termed then hired disrupted my formulas and sometimes needed manual adjustments.
For this kind of situation, there is no simple formula; my gut instinct told me that finding such missing months (or, stated another way, months there were missing from a consecutive series of months) was probably going to require the use of arrays.
Array formulas in Excel really allows you to do some unusual things that can’t be done with normal formulas. I almost think that these arrays are like working with matrices (which as a kid I
never knew what we used them for) or like working in 3D. I’m now thinking that matrices may have been used as a way of handling numbers in computer programs.
The Easy Example of Using Array Formulas: Counting Unique Items in a List
For example, we all know how to count the number of items in a list using COUNT or COUNTA but did you know there is a way of counting unique elements in a list? It can be done with an array formula. The file below shows an example of counting unique names in a list in tab “Count Unique Names”. I got the idea from this website a long time ago, although I did slightly change the formula when trying to recreate on my own.
My formula is {=SUM(1/COUNTIF(B6:B21,B6:B21))}. The brackets {} are not keyed in; they come about when you do Ctrl+Shift+Enter rather than plain Enter. In the COUNTIF formula, note that the second part of the formula is different – it contains a range rather than a single number. I think of that as an array of numbers. I won’t explain in detail in how it works because the site ExcelJet does a good job, probably better than I could. But it would be good to understand how this formula works. A good trick in figuring out how such formulas work is by doing what I call the F9 trick: place your cursor in the formula bar and highlight a portion of the formula and then press F9. Pressing F9 shows you the results, most likely as an array. You can use the same F9 trick with regular formulas too so you can get the hang of this trick.
{=SUM(1/COUNTIF(B6:B21,B6:B21))}
{=SUM(($K$5:$K$9)*IF(ISERROR(SEARCH(B5,$L$5:$L$9))=TRUE,0,1))}
A Slightly More Difficult Example: Searching For a Variation of a Name
Another version of using an array is in tab “Name Search”. This version is when you know that the person, who does data entry every month, uses different spelling permutations of people’s names or place names. You do a download of data and you find different variations of names. Here, you can’t use a VLOOKUP because VLOOKUP generally (but not always) looks for an exact spelling whereas you may be looking for a partial, much like a SEARCH for a word in a text.
Fortunately, in my case, it turned out that there were a limited number of spelling variations and I suspect that is generally true in most name permutations: there is a limited number of name permutations. In my case, it was up to 5 spelling variations for each name but I could see it going up to 6 or 7. The first thing I did was make a “dictionary” of spelling variations, which is what columns M through R shows. In column L I concatenate the spelling variations into one cell using &.
Cells B5:B8 is a pretend download of data and cells C5:C8 contains formulas to find the accounting code for the names. For instructional purposes, in cells B12 to B17, I show pieces of the main formula and then off to the right in column H I show what the results would be if you did the F9 trick in the formula bar.
Notice that SEARCH does a search of the spelling variation in cell B5 and then searches in the array L5:L9 instead of trying to do search within a single cell. Also, notice we are multiplying two arrays: ($K$5:$K$9) and the second array result in the IF statement which is IF(ISERROR(SEARCH(B5,$L$5:$L$9))=TRUE,0,1).
The New Complicated Array for ACA: Searching for Missing Months of Coverage
The arrays in “Count Unique Names” and “Name Search” tabs are kind of the easy ones. The third tab “Missing Month” shows a more complicated situation. Here I’m looking for months where the employee was not covered, according to the data from the healthcare provider. I was checking to make sure that gap was because the employee had been termed and then later rehired and that the codes were applied correctly.
{=IF(MIN(COUNTIFS($d$15:$d$25,ROW(INDIRECT(MIN(IF($b$15:$b$25=b15,$d$15:$d$25))&":"&MAX(IF($b$15:$b$25=b15,$d$15:$d$25)))),$b$15:$b$25,b15))=0,"x","")}
In the example in the third tab, I start off with a simple version in rows 3 through 10 and got the array formula from this site. There are actually multiple sites covering this problem but I like this one because of the use of the ROW function.
The examples I found were for situations where you wanted to determine which, if any, numbers were missing from a list of numbers. The list was supposed to be a list of consecutive numbers but the list may be missing some numbers. My situation with the ACA differs in that I’ll have a long list of employees, some of whom enrolled to receive monthly benefits. Some employees were around the entire year and thus received benefits for all 12 months. Other employees were hired during the year and received benefits for part of the year. And yet, still others were terminated in the early part of the year and then rehired. It’s those employees who were terminated and then rehired later on that have missing months and I wanted to find those employees.
Rows 12 through 25 show this scenario. Person 1 must have been terminated in March and then rehired in September. Person 2 looks like he worked only May through July.
How do I explain the workings of the array formula? I don’t know. I’ll try to explain but understanding the first two tabs will help immensely.
The complicated formula is very much like the simple formula in rows 3 through 10 except 1) I have COUNTIFS instead of COUNTIF; 2) INDIRECT and MIN/MAX instead of ROW($1:$7); 3) replaced the second ROW($1:$7) with “x”; and 4) moved the IF from behind the first MIN to in front of the MIN.
The COUNTIFS portion of the formula has two criterias: look for months that falls between the min and the max and look for the social security number in question. If you didn’t include the social security number as part of the criteria, the formula would go through all of the rows and determine the min month and max month which would be 1 and 12. But you need MAX/MIN for a specific person and not everybody was employed for the entire year.
Within the COUNTIFS formula, there are ROW, INDIRECT and MIN/MAX. INDIRECT is a new function for me, and I find that this is a method of creating references that could change. I used to try referencing a cell containing the range I wanted to have in a formula but it never worked. Now I know I was missing INDIRECT. The MIN and MAX determines the smallest month and largest month for each person/SSN. The INDIRECT takes those MIN and MAX as INDIRECT(MIN month : MAX month) similar to $1:$7, but without the $ symbol. Then ROW(INDIRECT(MIN month : MAX month)) converts into something very much like ROW($1:$7) except the 1 and 7 will be different for each person.
As I said, this array formula is more complicated and requires some understanding of the use of arrays in simpler examples, which is why I included “Count Unique Names” and “Name Search” tabs. Go through those tabs first, use the F9 trick and try to get some understanding of what is going on. Then when you are ready for the “Missing Month” example, I have included a breakdown of the formulas for Person 1 in cells H17 to H22 and highlighted the results in orange (do F9 on those pieces of the formulas and you should get the results highlighted in orange).
Okay, that is as good as I can explain it. Good luck!
You must be logged in to post a comment.