# Humanities Intensive Learning + Teaching, Day 2, Maryland Institute for Technology in the Humanities, #hilt2014

**Posted:**August 8, 2014

**Filed under:**Education |

**Tags:**betweenness, centrality, cMOOC, data visualisation, Elijah Meeks, Florence, graphs, HILT, HILT 2014, Medicis, MITH, MOOC, network, network measurements, network visualisation, visualisation, xMOOC Leave a comment

In Day 2, we looked at using the Gephi tool itself, along with the nature of how networks are tied together, looking at network degree and how we could calculate paths through the network. It’s probably important to talk about some of the key concepts of how we measure connectedness in a network, and the relevant importance of nodes.

The *degree* of a node is the number of links that connect to it. If we care about whether the links have some idea of direction associated with them then we might split this into *in-degree* and *out-degree*, being the number of links going in and the number of links going out. What do we mean by direction? Consider Facebook. If you and I are going to be Friends then we both have to agree to be Friends – I can’t be a friend unless you want to a friend to. This is an *undirected *arrangement and a connection between us implies that both us have an equal connection to each other. Now think about unrequited love: Duckie loves Andie but Andie loves Blane. In this case, the love is *directed*. Just because Duckie loves Andie, it doesn’t mean that Andie loves Duckie.

**ALTHOUGH IT SHOULD, JOHN HUGHES!!!**

(An amusing aside for network/maths people is that, sadly, love is not transitive or “Pretty in Pink” would have been way ahead of its time.)

One of the other things that we care about in networks is the relative importance of nodes in terms of how many other nodes they are connected to and what this means in terms of the *paths *we take through the network. When we talk about paths, we usually mean the shortest path, where we start somewhere and go through the minimum number of intermediate points until we get to the destination. We don’t run around in circles. We don’t go one way and then go another. This is important because the paths through a network can quickly identify the important nodes and, therefore, the links between them that are the most travelled thoroughfares.

In this world, we start to worry about the *centrality* of a node, which indicates how important it is by looking at how many other nodes it is connected to or how many other nodes have to use it to get to other places in the network. This means that we have to talk about *betweenness*, which measures how many times the shortest paths that traverse a network *have* to go through a node. By calculating the betweenness of every node, for every path, we can work out which of the elements in our network see the most traffic.

In the case of the Medicis, from yesterday, all roads in Florence lead to the Medicis, the family with the highest betweenness rather than the family with the most money or fame. The Medicis are an extreme case because they occupy their position of importance as the *only** *bridge (broker) between certain families.

If a network is made of highly connected elements and all of the betweenness is the same then no-one has an advantage. If your network can be effectively split into two highly connected groups, with a smaller number of high-betweenness elements linking them, then you are seeing a separation that may mean something significant in your particular network domain. From a power perspective, the high betweenness brokers now have the potential to have much more influence if they charge for transit and transform information that traverses them.

One of the things about creating a network from data is that the network we create may not necessarily model reality in a way that answers the questions we’re interested in, but by looking at the network and trying to work out if it’s got some different structure at the macro, meso and micro scale, then that might give us hints as to how to analyse it, to further develop our understanding of the area that we’re modelling with this network.

I’ve written about the difference between the real world and the model before, but let’s just say that “the map is not the territory” and move on. In terms of the structure of networks, while many people assume that the distribution of nodes and links associated with them would end up in some sort of Normal distribution, the truth is that we tend to see a hub-spoke tendency, where there are lots of nodes with few links and fewer nodes with lots of links. When we start to look at the detailed structure rather than the average structure, we can even talk about uniform the structure is. If a network looks the same across itself, such as the same number of connections between nodes, then it’s what we call *assortative*. If we have small clusters of highly connected nodes joined to other clusters with sparse links, then we’d thinking of it as *disassortative*. Suddenly, we are moving beyond some overall statistics to look at what we can really say about the structure of a network.

There’s also a phenomenon known as the *Matthew Effect*, where links and nodes added to a network tend to connect to better connected nodes and, as the network grows, nodes with more connections just get more connected – just like life.

Apart from a lot of Gephi, we finished the day by looking at paths in more detail, which allow us to talk about the *network diameter*, the size of the largest shortest path in the network. (Remembering that a shortest path contains no loops or false starts, so the largest shortest path shows you the largest number of unique nodes you can visit in the network and gives you an idea of how much time you can waste in that network. 🙂 )

There are some key concepts here, where having *redundant* paths in a network allows us to survive parts of a network going away (whether computer or social) and having redundant paths of the same quality allows us to endure loss without seeing a significant change in the shortest paths in the network. These are all concepts that we’re familiar with in real life but, once we start to measure them in our network models, we find out the critical components in our data model. If we’re modelling student forums, is there one person who is brokering all of the communication? What happens if she gets sick or leaves the course? Now we have numbers to discuss this – for more, tune in to the next instalment!