Graphs 2

I’m continuing on with posting on what I’m learning from Algorithms for Dummies, by John Paul Mueller and Luca Massaron. Now before proceeding further, I must point out that as I get deeper and deeper into the book, I’m not going to understand a lot of the concepts, except at a very superficial level, if even at that, so the posts will not be very helpful to readers. The posts will provide no value. These postings are just my process for hashing out my miniscule understanding and is more of a learning process rather than a teaching process. It's my way of continuing my learning process and trying to understand the algorithms running our world.

With that said, let’s first refresh my memories on what I have learned thus far regarding “graphs”.

Refresh my memory on what I’ve learned about graphs

Very briefly, I noted that graphs are more than just the normal typical line or bar graphs that I've been doing but can extend to such things as maps or social graphs. I listed quite a few examples of what could be constituted as graphs. Then, once you have your graphs, you want to see what part of the graph is really important because you don’t want to waste your time analyzing things that may not be relevant to your problem-solving goals. The authors discussed several concepts of centrality: number of nodes connections, clustering, distances or closeness, and betweenness. Finally, the ending part of the chapter gave some Python programs that converts graph into number formats that any imported numerical packages (they can be scientific packages, or mathematical packages) can use.

All of that is mentioned briefly in my last post. I want to stop and remind myself that I probably have to stick with learning the algorithms at a high level concept because the book does not go into extreme details (the authors need to be able to sell to us dummies so they can’t get too technical) or maybe I’m not getting it at this early stage of learning. I might have to be satisfied with just knowing what kind of packages are available to import and what they are trying to do.

Next Chapter: Chapter 9

This new chapter on graphs is broken down into four sections: 1) searching for connections between points or nodes; 2) sorting the nodes to create an organization so that you can view the graph the same way each time the algorithm is applied (I think); 3) reduce the size of the graph to direct the algorithm where to look; and 4) finding the shortest distance between two points.

1 Searching for connections between points or nodes

The book also calls it “traversing the graph”. Immediately, this section starts to give me problems because I could not really tell what the Python program was doing (maybe due to my inexperience with Python). For one thing, there was a Python program for traversing the graph but I had already encountered a Python program for traversing the graph in Chapter 6 (I’m now in Chapter 9) and that prior program was slightly different. Why is there a difference?

After playing around with a sample array of nodes and edges as an input to the two different programs, I found out that the earlier program was for just “traversing” the graph but could not draw the graph because the program did not include edges. The second program has some codes that added in the edges to enable drawing the graph.

But there were two extra lines of codes for drawing a graph that differed from yet another earlier Python code for drawing a graph. (I often compare different codes to see what may be different and why.) When I played around with taking out those two extra lines, I found that the same graph was still drawn. Those two extra lines seems to add nothing. So, I’m confused.

nx.draw(Graph, pos, with_labels=True)

Not sure of the following two lines. If I take them out, the program still draws the same graph.

nx.draw_networkx(Graph, pos)

plt.show()

Once I got past creating and drawing the graph, the book delved into searching through the graph for all of the connections or nodes. There appears to be two general methods for doing a search: 1) breadth-first-search (bfs) and 2) depth-first-search (dfs).

Breadth-first search (BFS)

The bfs method applies searching all of the connections to a node before going to the next level. So, you start at the root node and search for all of the nodes connecting to the root node. Let’s call these connecting nodes the second level nodes. Once you’ve noted all of the second level nodes, the program will then search for all of the third level nodes connecting to the second level nodes. So, in my lingo, the program is searching/traversing the graph on level by level nodes.

Depth-first search (DFS)

The dfs method in my mind goes down the tree path. My understanding is the program traverses from root to one of the second level nodes and then the 3rd level node connecting to the second level node and so on down before moving back up to the next second level node to travel. I think the program travels down the branches of the tree and explores each branch first before going on to the next branch.

The author provided two Python programs – one for bfs and the other for dfs – and I encountered another set of problems. In both the bfs and dfs, there were some codes with the “%” which I don’t think I have seen outside of math calculations. I went back and played around with the “%” symbols and find that they impact how you print out on screen the in-between processing steps but they don’t impact the final answer which I think is to show all of the node connections. It’s basically a way of formatting your print out of the in-between work before you arrive at the answer.

As for the resulting answers, I saw some of the nodes’ connections on a level by level basis but there were two connections missing, so I’m not 100% clear on what the program is trying to accomplish. It might be that once a node has been visited, it does not get visited again, even if the two node connections have not been captured in a list (or stack).

So, at the end of the day, it appears that these programs do not seek out every single node connections - maybe they are more for discovering nodes rather than node to node connections. The results arising out of the breadth-first-search can often differ from the depth-first-search because they deploy different strategies. The bfs is typically used to determine the shortest path because it “is guaranteed to return the shortest path between two points as the first output when used to find paths.” P177. So, GPS, spanning trees, shortest paths, and minimization algorithms utilize bfs. The dfs is typically used for finding the entire path before exploring any other path. Games use dfs as well as finding a solution to a maze.

2 Sorting nodes to create an organization

To search the graph efficiently, you have to sort it first. The “divide and conquer” search I learned earlier relied upon sorted data elements. Sorting aids in efficient searching.

This section of the chapter introduces two ideas, very briefly: DAGs (directed a-cyclic graphs) and topological sorting. Briefly, DAGs are graphs with a specific route with directions going from one vertex to the next and has no looping. Scheduling fits in with DAGs. There may be optional routes but in general the options lead to a very specific route, from vertex to vertex. Topological sorting “orders all of the vertexes of a graph on a line with the direct edges pointing from left to right.” P182.

This section didn’t go much further than that other than to say the dfs algorithms provide a topological sorting.

3 Reducing graph size to minimum spanning tree

The start of the chapter talked about reducing the size of the graph to focus the program on what or where on the graph you want to look at, but this section then talks about reducing “resources” such as time, distance, money, energy or finding an economical way to reach all points on a map. To me, that is a big difference in what we’re trying to do.

In my mind, the main focus of the third section was to find an economical solution in reaching all points in a graph rather than reducing the size of the graph to what you want to look at.

First, there is a definition of a spanning tree: a list of edges required to connect all nodes or vertexes in an undirected graph; spanning tree visits all vertexes once – there are no loops. In an unweighted, undirected graph, all edges have the same weights so there is no single minimum solution. In a weighted, undirected graph, there is at least one single minimum solution. Directed graphs, unweighted or weighted, is a different problem.

Again, the authors provides a little Python program that draws the graph but this time includes the weight. There is the same problem where a single line of code seems to be doing nothing but for the most part, I get it.

Then the book goes into two algorithms that determines the minimum spanning tree: Prim’s algorithm and Kruskal’s. The other two mentioned but not gone into detail are Burovka’s and Reverse-delete. But first, there is a foray into priority queues – data structures (much like queues and stacks are data structures) based on the heap tree that allows for fast ordering when you insert data into the tree. The discussion on priority queue is so brief that I don’t really know how this program works but you need to run it before running Prim’s or Kruskal’s because both of these program call for priority queues.

Prim's algorithm

This algorithm takes in your starting point and builds a minimum spanning tree by determining the minimum edge at each vertex, connecting each vertex with minimum edge stage by stage. This algorithm deploys a greedy approach (determining the best solution at each stage which may not necessarily lead to the best solution overall), and it also tracks which vertexes has already been visited. I did run into a problem where the program gives a minimum spanning tree including an edge that lead to what I regard as a dead end, in other words, there was an extra vertex and edge that I wouldn't have thought would be part of the minimum spanning tree. Maybe at the second vertex (level 2), the two connecting edges had the same weight and the program could not decide which one to include in the spanning tree so it included both. The program does not get to go back to this second stage and pick which one to use once it moves on to the next stage.

Kruskal's algorithm

This program also uses a greedy approach but it starts from a pool of sorted edges. (Prim’s starts from a given random vertex and builds from there.) This program is harder for me to grasp. I get the collecting the edges into a pool and sorting them but I don’t understand the creating trees of vertexes “so that the number of trees is the same as the vertexes”. So, is this just a list of vertexes? The other issue is when I ran the program: I got a different number of edges than what is shown in the graph – namely more edges than shown in the graph.

Prim's versus Kruskal's

Prim’s versus Kruskal’s: For dense graphs, Prim’s is the preferred algorithm because it starts from a given starting point and then selects the minimum edge, stage by stage. Kruskal’s first creates an ensemble of “partial solutions” which adds time and resources for dense graphs. Kruskal is better for regular, sparse graphs with fewer edges.

4 Finding shortest route

Shortest route is as the phrase implies – finding the shortest distance, but the authors points out other uses of the shortest path algorithms: image processing and gaming (achieving goals using the least number of moves). The authors mention three algorithms that performs the shortest route calculations (Dijkstra algorithm, Bellman-Ford and Floyd-Warshall), but the book only deals with Dijkstra algorithms.

Dijkstra algorithms

These algorithms apparently involve weighted and directed graphs and Dijkstra deploys priority queues. Priority queues sort by edge weight, at least the ones in this book. Dijkstra also requires a starting point and can optionally have an ending point.

As usual, I’m not totally clear on how the program works. At first, I thought the program starts from the given starting point and searches for the directed edge that gives the minimum and then proceed on to the next vertex at the end of the minimum edge, but then a question arose: what if that path did not lead to the ending point (if it’s given)? When I looked at the interim answers the program provided, it looked like the program somehow kept track of the various viable paths? The program visits every node and assigns a “distance” from starting point value to each visited node. Somehow, the program is able to keep track of the directed paths and when it reaches the end destination, know which path was the minimum path.

Refresh my memory on what I’ve learned about graphs

Next Chapter: Chapter 9