Attempting Text Analysis

Lately, I’ve been trying to figure out how text analysis work. Last year, when I was looking at some comments, I created a dictionary of similar phrases so I could see trends in comments – what kind of comments are mentioned most frequently. In the business I’m in, certain phrases come up frequently so one could receive hundreds of the exact same phrasing – the phrases tended to cluster in the same usage. This is very different from searching and analyzing comments on Facebook where the comments could cover anything whereas the comments I was looking at was comments made by employees covering a small slice of the business. So in this situation, I was able to create a rather small dictionary of similar phrases falling into maybe about 15 – 20 categories of comments. It took a while to create this small dictionary though, although I imagine that it could be re-used from year to year.

Out of curiosity, I wanted to see if there was a way I could automate this process, or at least learn some of the process that goes into text analysis. I imagine that most text analysis is done via more robust tools, maybe Python, but I thought it would be at least instructive. The first stab was to count up the phrases which was easy enough – a pivot table is sufficient. However, what was

missing was some phrases were similar and fell into a category but the pivot table will not count those similar phrases as being a same category. For example, a pivot table will regard bought a house and buying a house as totally different but in my mind, I regard them as a similar category of buying: buying a house, buying a home, bought a home, buying a townhome, etc.

So the next attempt was to count individual words to see what the most common words were. Okay, a problem cropped up in that filler words (very important filler words though) such as “a”, “the”, “in”, etc. would rise to the top, so I had to develop a macro to count individual words but leave out filler words. But to do this I had to create a dictionary of filler words.

Next, I thought, okay, how about finding the next most common words associated to the list of “n” most common words. So for example, if the word “home” was the most common single word, let’s count the words that comes with “home” and see what is the most common. Then do the same for the second most common word after “home”, and so on down the list. I decided to restrict the count to the top 5 most common word and then set of top 5 words. I got a macro that accomplishes this but I’m not sure if this is going in the direction that will be useful.

I then started to do research using Google and did a search on Excel text analysis and found mostly word count or “n” step word count (there’s a terminology for this but I can’t find it). So this is where I’m at for right now.

Polling Review: a spectacular failure

November 11, 2024November 11, 2024

Wow! What a week! And what a spectacular failure of the pollsters. There is going to have to be a very deep review of the polling process to figure out how to change things for the current American emotional makeup. Something! I’ve been searching for reviews on what went wrong during the polling because I…

Analysis | Data | Power BI

Positive news on coronavirus – 3/4/2022

March 6, 2022

We’re in another period of high stress, so to counter it, I will just briefly write about the latest coronavirus status that I’m pulling from the data. The pandemic’s direction beginning to feel positive, although I don’t feel comfortable yet at the levels, but it is plunging in the direction that I like. World Status…

Analysis | Data | Power BI

What is going on with California?

December 20, 2020December 20, 2020

Every night when I pull in numbers for the day, California just keeps topping the list. The data lately has been just eye-popping. California was the first state to impose some kind of lockdown and they’ve had some mask mandates and some restrictions on social gatherings, and yet, the case counts are in the stratosphere….

Analysis | Excel | Uncategorized

Creating an Analytics Tool

June 12, 2016

During the next few weeks, there will probably be a slow down in posting and artwork because of 2 “must do” work projects. One is, of course, the ACA electronic submission and the other is a new project to develop a kind of analytical dashboard/reporting system.In regards to the ACA, I received feedback that there…

Excel | Exploratory | Uncategorized

Generating ideas to solve the ACA problem

February 18, 2017

These past two weeks, I spent a lot of time doing creative problem solving to get rid of empty tags in the ACA electronic submission file. Last year, I was able to submit my electronic file with a bunch of empty tags but this year, the government is not allowing the empty tags. So I…

Analysis | Finance Topic

Taxes and Human Behavior

October 17, 2019October 17, 2019

“Decreasing taxes incentivizes one to work harder because you get to keep more of your money.” I think this theory or something similar to that was what I learned in college and when I heard that, I couldn’t believe it. I never really bought into the theory that reducing taxes will make one work harder….

Related

Polling Review: a spectacular failure

Positive news on coronavirus – 3/4/2022

What is going on with California?

Creating an Analytics Tool

Generating ideas to solve the ACA problem

Taxes and Human Behavior

Current Projects

My LinkedIn Profile

Weekly Art

Share this:

Related

Similar Posts

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Current Projects

My LinkedIn Profile

Weekly Art