Home / Data / When the Data is Messy

When the Data is Messy

When Your Data is Messy

“Some people have 2 social security numbers or others share the same security number.”

My boss just stared silently at me like, “You’ve got to be kidding me.”

While I’m waiting to move onto the next stage of the ACA electronic filing, I’m working on the reconciliation process our broker goes through and boy, is it a dilly. Our broker must be spending a lot of time researching the data when reconciling the health insurer’s invoice with what their system says. The first logistical hurdle in the reconciliation is trying to map by social security the data in the insurer system with that in the broker’s system. You can’t do it by name because people can spell names differently. And some systems will include Sr., Jr., II, etc. and others won’t. So you have to map by using social security numbers and it’s difficult to do that when one side can’t get the data straight.

What I don’t get is why the health insurer, which is surely a large insurer – not a mom and pop operation, is giving crappy data. I don’t understand how they can spit out people with 2 different social security numbers (one for medical and the other for dental) or spit out folks with the same security number. The other thing I don’t get is why the broker doesn’t tell the insurer to get their data straightened out – it looks bad. Maybe the insurer is in the more powerful position.

Anyway, I figured out a neat way to check on the security numbers and how to divine what is most likely the correct number.

First create a database with the following headings: Source, SSN, Last Name, First Name.

The order of the headings really doesn’t matter but having those fields are necessary. Source can be insurer and broker, at minimum. In my case, I used Insurer’s medical invoice, insurer’s dental invoice, and broker’s system data (field titles were shorter than this).

Second, copy over the data from your sources and place them under the appropriate headings.

So, in my case, I manually entered the source name, and then copied over the SSN, and first and last names.

Third, create a pivot table out of that new database.

I hope you see where I’m going with this. I moved the last name and first name fields under the row section. I made sure they lined up in a row as in a table. Under the columns area, I pulled over the Source field so I had Med Inv, Dent Inv and Broker as the column headings. Finally, in the data or number section, I pulled in the SSN and set it to a value of maximum, rather than sum or count. Now you have a table that shows you each person’s name and the SSN from each source.

To the right of the pivot table, create a column of formulas that shows whether that row contains multiple SSN for that person.

You can do something like, “If column A is not equal to column B or C and column B is not equal to C, then mark this row as ‘x'”

Then, in the last column, create a formula that would pull in the most likely correct SSN if that row has been marked as ‘x’.

First you would need to decide which database is more likely to contain correct SSN. In my case, we had a database from our system which seems to be more reliable (the broker’s system was a good option too but I wanted to do as much cross verification as possible so I added in our system). So, whenever there was an ‘x’ in the row, the formula would then do a vlookup on the name and pull in the correct SSN. Now, about the name – I need to provide a cautionary note: you may encounter different variations on a name. So using name is not foolproof; far from it. But I just cannot use SSN, so name it is. Names will at least get me most of the way there.

The very last thing to do is to filter on ‘x’ and see who has multiple SSN and to check on the names.

You can run your eyes across the row and see how your choice of SSN jives with the others. Here is where you can do some final adjustments, especially on the names.

And that is how I automate the search for incorrect SSN and substituting it with the “correct” number. It beats I looking up each person’s SSN, checking all of the sources, and keying in the correct one. Now I have the spreadsheet do it for me.

Leave a Reply

Your email address will not be published. Required fields are marked *

Top