I’ve begun work on a new project which actually has two components: reconciliation and automated reporting. Last two weeks were focused on reconciliations and the first thing I did was to look at the data. Usually when I get new data and I want to understand what I’ve been given, I will play around with the numbers. Sometimes it will involve looking at dates because they can trip you up.
As an example, in this particular dataset that I was given, there was a hire date, effective date, and term date. Most of the hire date was the same as the effective date but there were a few where the effective date was later. I’m thinking that could be a situation where a person comes on board on a trial basis before being truly hired. But there were a few instances where the term date was before the hire date. Um, that’s a problem. In the industry I’m in, quite often people are let go because there are no more work for them but then later they come back when other work opens up. So it could be possible that the term date would be before the hire date; this is an area that needs watching.
Once I’ve played around with this dataset, which I’ll call dataset A, I received another dataset B, that needed to be reconciled against dataset A. I needed to reconcile A and B to see if there were any differences and if so, where and why. The two datasets came in laid out differently so I couldn’t automatically build a pivot table comparing the two – the two layouts were just too dramatically different. I had to first create a macro that would rearrange dataset B into the same layout as dataset A. Once I had successfully done that, I found that there were a lot of differences, like 100K of differences. I came to the conclusion that dataset A was missing data and it turned out I was right.
After receiving a new dataset A, there were still some variances but it was much better. Now was the time to dig into the datasets and see what could be causing the variances. When I do that, I look for commonalities driving the variances. When variances stem from a common reason, it is much easier to fix the problem. Maybe I had handled the data incorrectly or maybe I should have excluded a set of data. Whatever, I look for commonalities. For example, I wanted to list all of the names that were in dataset A but were missing in dataset B so I had a formula to mark as “x” those names missing in dataset B. Then I tried a formula to pull in term dates to see if there was a pattern, and lo and behold there were. There was yet another formula to mark as “x” those names that were missing in dataset A but were present in dataset B. A corresponding formula was written to pull in hire or effective dates to see if there was a pattern for the second set of x’s. And so on, I kept looking for reasons that could be causing variances. After all of the columns of formulas were set up, a data filter was added so that I could start filtering to see if interesting things popped up.
With the filtering system in place, it was time to talk to somebody to see what I was missing in my logic. And I think that is what you need to do, especially if you are working with in an area you have no knowledge of: you need to go to the experts with the data you pulled and the various types of differences and go over it with the experts. Typically, they will immediately recognize something such as a group of people whose data you didn’t receive or whose data should be handled differently. Whatever it is, they will spot something, as long as you separate out into groups with common types of variances (not in dataset A but in dataset B, all of the missing people were terminated in December, etc.)
So I have pretty much done the first stage of the project which is the reconciliation. There is more to be done but the remaining differences have been handed over to the folks who will actually manually pull the information to find out what may be incorrect or missing.
So now, I’m on to the other part of the project, which is the figure out how to report all of this information.