Power Query and PDF: a team
Have you been keying in data from PDFs into your Excel spreadsheets? Did you know that Power Query and PDF work well together?
Yes! You no longer have to key in data from PDFs.
This helpful feature came on the scene maybe a few years ago but I was only reminded of this nifty feature when I was talking to a friend. She mentioned that she received PDFs of her bank statements and she was manually keying information from the PDFs into her Excel spreadsheet for her accountant.
If you are doing the same, maybe for your own accountant, for tax purposes, or for some kind of reporting purposes, you can stop and learn a new way of going about it.
It is like Power Query and PDF is a marriage made in heaven.
Since this post ended up being a long post, I went ahead and split it up in two: the first part, which is this post, how to pull data from PDF. The second part will show what one can do in the Power Query editor to clean up the data and make it more usable. I call the second post the Power Query capabilities.
The PDF file
So, I have some economic data sitting in a PDF which is highlighted in image 1 above. I’m going to pull in that information instead of manually inputting the information.
“Get Data” found in Excel
To initiate the power query, one uses “Get Data” to make the connection between Power Query and PDF.
Image 2 shows where the “Get Data” tool can be found. In the “Get & Transform Data” section, you will see where you can use Power Query to pull data from: text/CSV, from the web, from a table or range within your spreadsheet, from a picture (yep!), and others.
So, click on the “Get Data”.
Clicking on “Get Data” brings up a box with a couple of options and the one you want is “From File”. It is here that you will find “From PDF” option. Note that there is a “From Folder” option but we are not using that today.
Browse to where your file can be found and select it. From there, choose the “Import” button to start the importing process.
Navigator Box
The first thing you will see is the Navigator box containing two sections: the left-hand side with a number of images/tables/pages available and the right-hand side.
Images 6 and 7 provide a sneak preview of what is in the table/page. You will have to look through these and decide which one you prefer. Me, I prefer the pages because Page001 has column headings in the preview whereas Table001 just skips the column headings.
Choosing multiple tables/pages
Now that I have decided which set of images to select, I need to select multiple images to use.
Image 8 shows how I selected the multiple items. I checked the box for “Select multiple items” found at the top left of the Navigator and then I used the shift key and selected multiple Page items.
Once I have selected what I want, I then go to the bottom right to either click on “Transform Data” to clear up the data or if I like what I see, click on “Load”
Closing comments
What I have shown thus far is that Power Query and PDF marry well together. One can easily bring in data from PDFs into Excel through Power Query. I would have loved to have had this feature back a couple of years ago.
The next post will provide a couple of features in Power Query to help you clean your data. There is so much to Power Query; consequently, I will only show a small portion of its capabilities.
To move forward to the next post, just click on this link.
You must be logged in to post a comment.