|

Playing with Data Query

I thought since I’m on Thanksgiving holiday that I would go back to my climate files where I had gathered data from the web and see if I can update the data using the embedded data query. The one I was really interested in was the file pulling in temperature data history for each state because I wanted to update the file with new months of data. Last time, I thought I had found an “easier” way of pulling in the data. Initially I was pulling the data state by state in the web, copying the data from the web and pasting the data into a text file and saving the file, a file of data for each state, to later then be pulled by Excel. Whew. Later, I found a short cut where I could pull all of the states at once on the web and then pull that data from the web into Excel via data query or Get Data From Web.

Today, I couldn’t pull the data. The address itself seems to be down. There is a note at the site where parts of it may be down due to maintenance. I’m hoping that is the real reason why I can’t pull the data, so next week, I will try again. I just hope it is not because the Trump administration wants to shut down all data gathering especially since this is data that kind of relates to climate change.

So, this is a problem with trying to pull automatically information from the web: the site may be down for maintenance or, worse, the page may be moved or the page layout may have changed. So, trying to gather information from the web is not always the most reliable thing to do. If you are pulling data for work from an internal work program or database, that source might be more reliable, especially if you are pulling from large enterprise programs such as SAP. I have found SAP to be consistently reliable – I never had problems with it in the 15 years of using it. When you start pulling data from small or medium sized companies, there may be issues, the foremost being a reluctance to share that data in a form other than PDF. But there is also the issue of layout upon downloading: unlike SAP, downloads often do not come in as a flat file so the data is very difficult to pull via formulas. Doing a data query from Excel may or may not get around those issues.

For now, I’m going to talk about the pulling data from the web. Yes, a lot of screenshots are tiny. There are constraints between what Microsoft gives me when clipping and what WordPress or my theme gives me when adding pictures to this page. I have taken out the sidebar to give room but the pictures still don’t display across the screen. They still display scrunched in the middle.

Instead I’m going to use the file that pulls down data for the House of Representatives. I just ran a test of updating the data and found that the query went out to the web and pulled in the latest information. So that really worked. The thing about this website is that the data does not come in as one nice table showing all of the states and the representatives. Instead, the site is composed of 50+ tables and I had to pull in the data state by state. So, I have a file where there are 50 tabs for each state and then a 51st tab combining all of the states into one table. Furthermore, refreshing that single tab doesn’t work; you have to refresh ALL of the tables using “Refresh All”.

The initial set up requires you to create the separate tabs for all of the states and then combine those tabs into one master tab. Once that is done, in the future, you can just do “Refresh All” and everything refreshes, so long as the website itself does not change.

If you want to follow along, here’s the website: https://www.house.gov/representatives”

First thing you want to do is have that URL address handy.

Then in Excel, you can do Data – Get Data – From Web or Data – From Web. You will at this point have to enter the URL address. After that, you might run into weird things. Since I’ve gone to that site a few times, my computer may be recognizing the site so it doesn’t bring up how I want to “Sign” in such as anonymously or whatever the choice may be. Also, since the site is a government site, the site might just be allowing me to pull in data. If you were to pull data from Wikipedia, you might have additional thresholds to pass. In a nutshell, there may be additional screens that you have to pass through that I won’t be discussing today (namely because those screens are not pulling up). You will have to figure those out for the moment. So far though, the anonymous option has worked for me when I come upon those additional screens.

After I enter the URL (and if necessary, after I pass through those additional screens), I will come upon the Navigator screen where you may see a single table or a bunch of tables on the left-hand side. Click on one of those tables and watch the actual table of data appear on the right. For the House of Representatives, I clicked on a state and then “loaded” the data. After a state had loaded, I had to click on “Get Data – From Web” to bring in the next state. To speed things up, you can also click on the button “Select multiple items”, select multiple tables you want to use, and data query will pull up those tables into separate tabs. .

A table selected
Results from query

After you have pulled a couple of states, you will notice that the states are coming in separate tabs and that the individual tables themselves do not say which state the table represents. Before you go much further, you will want to add a column for the state so that later, when you combine all of the state data into one table, you will know which data is for which state.

The easy way to add a column for the state is to go into the Query tab (you might have to put your cursor inside the table before that Query tab appears) and then click on the Edit icon. This action will lead you to the query editor where there is a menu item called Add Column. You want to add a custom column.

Put in your column name as “State” and make sure you put in the same column name for each state. You want all of the tables to have the same field names for later when you combine those tables. In the custom formula, just write in the state name.

And that is all you do to add a column with the state name.

Sometimes you will want to amend your data or change something. There are two ways to get into the data editor to do extra work on the tables.

One way is to hover your cursor over the table in the “Queries and Connections box to get a screen showing you some information about the table such as when last updated. There is an edit option.

Or you can right click on the table name in the “Queries and Connections” box and you will get a box of options. (Why this image came out large? Beats me.)

Once you have pulled in all of the tables for all of the states you want to use, you will want to put them together into one master table for data analysis purposes or pivot tables. That process is called the Append. To append, just pick a table in the “Queries and Connections” box (if you don’t see it, go to Data tab and find the “Queries and Connections”), right click on your chosen table, and pick “Append”. Or you can go into the data editor using the “Edit” option in the “Query” tab.

You will get a box that pretty much walks you through the process: pick multiple tables if you have multiple tables you want to combine,

select the tables you want to add and click on Add.

After clicking on OK and loading the data, you will get a new table/tab with all of the tables combined into one big master table, generally called Append1. You can rearrange the “state” field to move it to the beginning of the table before “closing and loading” the table.

Now you have a table that you can update in the future by using “Refresh All” icon found under Data tab. Very nifty.

Similar Posts