Gephi is an open-source program that creates network diagrams. A network diagram visualizes how data points are connected and can be grounded (such as showing how universities are connected on a map, with each node resting on the college's location) or abstract (such as a diagram showing which actors work together). These are typically used to show how your data points are connected to one another, and has a wide number of applications such as social networks, tracking spread of diseases, and mapping collaborations. Once the user selects how the data should be connected Gephi uses sorting algorithms to place each data point and build the connections between them. Gephi includes a number of sorting algorithms and the ability to modify a lot of the visual aspects of the diagram, allowing a wide number of ways to show your data. It is necessary to know how the data you are working with is structured, and what you wish to visualize. Gephi has requirements for how the data must look to be imported into this program, but there are plug-ins available that can structure your data automatically to help alleviate this process.
This data and guide was adopted from a NLI Data Essentials Workshop "Introduction to Visualizing Social and Geographical Networks with Gephi" ran by Michael Stamper and Jonathan Briganti. As the workshop name suggests, we will be building a social diagram and a geographical network using data that shows how frequently different European cities are associated together. This data has already been cleaned to be ready to enter Gephi, so a portion of this guide will go over how the CSV files should be structured. The necessary data files are linked below.

Image from: http://www.martingrandjean.ch/digital-humanities-on-twitter/

Data and Guide Overview:
This data and guide was adopted from a NLI Data Essentials Workshop "Introduction to Visualizing Social and Geographical Networks with Gephi" ran by Michael Stamper and Jonathan Briganti. As the workshop name suggests, we will be building a social diagram and a geographical network using data that shows how frequently different European cities are associated together. This data has already been cleaned to be ready to enter Gephi, so a portion of this guide will go over how the CSV files should be structured. The necessary data files are linked below. |
Data Structure:
This step is not necessary for the data we will be using today, but will be needed when you build your own network diagrams. The two most crucial terms to know are nodes and edges. A node is a single data point on your diagram, shown as dots in the network above. The edges are the lines stemming out of the nodes and show the connections in your data. Gephi expects (unless your using a plug-in) that the nodes and edges tables are in different spreadsheet (CSV or Excel are most common) files, and both are organized slightly differently.
Nodes Table: The types of information that can be added to a nodes table is listed below, with a small example table. The one crucial column in the nodes table is "Id." - "Id": Each unique ID creates an additional node, and IDs do not have to follow any specific naming convention. This means your IDs can be "1","2","3","4", and so on, or "Apple", "Pear", "Banana", and so on. This column must be titled "Id".
- "Label": Allows you to specify a label for each node if you do not wish to use the ID. This column must be titled "Label".
- Attributes: Allows for you to denote additional information about each node. This can be used to change the node shape, color, size, or location later. You can add multiple attributes to your data as necessary. These columns do not have a specific naming convention.
Id | Label | Fruit/Vegetable | Color |
---|
1 | Apple | Fruit | red | 2 | Orange | Fruit | orange | 3 | Lettuce | Vegetable | green | 4 | Carrot | Vegetable | orange | 5 | Strawberry | Fruit | red |
Edges Table: The types of information that can be added to an edges table is listed below with a small example table. The crucial columns in the edges table are "Source" and "Target". - "Source": Specifies which node "Id" the edge originates from. This column must be titled "Source".
- "Target": Specifies which node "Id" the edge connects to. This column must be titled "Source".
- "Label": Allows you to specify a label for each edge. This column must be titled "Label".
- Attributes: Allows for you to denote additional information about each edge. You can add multiple attributes to your data as necessary. These columns do not have a specific naming convention.
- "Type": Used to specify if the edge is "Directed" or "Undirected". A directed edge will have an arrow leading from the source to the target.
Source | Target | Label |
---|
1 | 2 | Fruit Salad | 1 | 5 | Fruit Salad | 2 | 5 | Orange-Strawberry Salad | 3 | 4 | Salad | 3 | 2 | Orange-Strawberry Salad | 3 | 5 | Orange-Strawberry Salad |
|
Adding Additional Plug-Ins:
Gephi has an active community of developers that add useful plugins that can enhance your networks in Gephi. We will be using "GeoLayout" which allows us to plot our points based on Longitude and Latitude (additional attribute columns included in the data). Each plugin has a specific use case, and comes with its own documentation. As Gephi is open source and the plugins are developed by dedicated users they may have specific additional steps or bugs that need to be accounted for. This will only need to be done once to install the plugin. |
- Close the "Welcome to Gephi" window that appears when you open Gephi.
- At the top of your Gephi window click on "Tools" then "Plugins"
- Tools, Plugins

- In the window that appears you will be able to see what plugins you have installed, what needs updates, and what is available. Click on "Available Plugins (#)" and search for "GeoLayout." Click on the checkbox next to "GeoLayout" then "Install", and proceed through the Installer wizard that appears.
- Available Plugins (#), GeoLayout, Install

- Once you have installed the plugins and moved through the wizard restart Gephi.
|
Importing Data Tables:
Creating a Geographical Layout
You've now made it past the data entry step and can actually begin building your visuals. This is where you will determine the layout of the network through various sorting algorithms. We will be starting with a network graphic comprised of cities as that helps ensure the nodes are in a strict layout. - If you do not see a certain window (like layout or appearance) there is a menu at the top of Gephi (or top of your computer if you use Mac) called "Window" that will let you show/hide specific windows. If you cannot find something it may be hidden. If you do not see the layout "Geo Layout" ensure that you properly installed it (steps above)
- In the Geo Layout fields you will see Latitude and Longitude should already have the proper attributes (named "latitude" and "longitude" respectively). If your data sets are not named in a way that Gephi can automatically assign a field to those options you can use the drop down menu to select the proper data points. Be sure that the fields you use are imported as "Double" in Gephi.
- We use "Noverlap" (No-overlap) after the "Geo Layout" because it ensures that no point is overlapping another. As these coordinates typically center on one location in the city these will be a large amount of points hidden in our network, using "Noverlap" forces the nodes to occupy a distinct spot on our network.
- At any time during these steps you can look at the "Preview" window to see how your visual will look when completed. The graph window shows the line as direct points, and won't look as fancy as the final product. If you see nothing in the preview window click refresh in the bottom left.
|
- SAVE. Save save, if you think about it you should save. There is no undo in Gephi. That does not mean that with one wrong keystroke your visual will need to be completely discarded, but it does mean you need to be somewhat intentional in what you do. If you are planning on testing a bunch of different options for your visual it is much easier to save a copy and experiement knowing you can jump back easily at any time. We will be using the data you entered for two different visuals, so saving now will stop you from having to re-enter it. To save, click on "File" at the top, then "Save".
- File, Save

- Step two is making sure you did step one. You saved, right?
- Great, now that you've saved (check steps one and two), we can choose our layout. In the layout window, typically in the bottom left of the page, click the drop down menu that shows "ÃÂÃÂÃÂâÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂChoose a Layout" and select "Geo Layout". Once you do so, the layout window will populate with new fields.
- Choose a Layout, Geo Layout
 
- As we are dealing with such a large geographic area we will want to increase the scale. This will expand the visual to a wider total area, ensuring the points aren't a jumbled mess like they currently are now. Click the "1000.0" next to scale and change it to "20000". Once you have done this click "Run".
- Change Scale to 20000, Run

- In the graph window click the magnifying glass in the bottom left corner to center the network diagram. In the image below it is shown with a orange box.
- Center Diagram

- Change the Layout to "Noverlap" and click "Run" again. The different settings will change how quickly Gephi carries out this step, with higher values equating to more processing. The current settings will be fine for this visual.
- Change Layout to Noverlap, Run

- Now that we have our layout in place we can focus on modifying the appearance of the nodes and edges.
|
Modifying Appearances
This step is part creative and part data-driven. As network diagrams are abstract representations of data they require that the developer ensures they can be easily understood. That may involve adding color, labels, an underlying map, additional graphs, or experimenting with different layouts. We will be starting with a network graphic comprised of cities as that helps ensure the nodes are in a strict layout. As you become familiar with network diagrams think about what you are trying to tell people. Use the diagram, and additional components, to make sure this story is fully fleshed out. It takes practice, but if you are familiar with the data it will never feel overwhelming. Below is a quick overview of the appearance tab and the options within. This option is straight forward, but does not look like a button. You can tell if you're modifying Nodes or Edges if the size icon (the nested circles in the blue box) are present. When in doubt, click on the name of what you'd like to modify again. |
This box may slightly change depending on what you are trying to edit, meaning each option may not be available for all edits. - Unique: allows user to manually select the appearance type independent of a data attribute.
- Partition: partitions the data using the attributes you made before importing your datasets. For instance, you can color the nodes differently for every city.
- Ranking: typically creates a spectrum ranging from the lowest and highest value of a specific attribute. Degree, In-degree, and Out-Degree rank the data based on how often a node is connected, how often a connection stems from a specific node, and how often an edge ends at a specific node respectively.
|
Similar to the other boxes, these do not instinctively look like buttons. Going from left to right, these allow you to edit the color, size, font type, and font size. If you are editing edges you will not be able to modify the size of the edge. |
|
 |
|
- Click on "Node" in the top left, then the color option in the top right (blue box) of the appearance window, then "Partition". Using the drop down menu select "city" and a list of all cities, along with how the percentage of nodes with that city attribute will appear. Natively, Gephi will color the top 8 cities, but that can be changed in the "Palette..." button that appears in the bottom right of the window. Click "Apply"
- Color nodes based on city partition

- To change the size of the nodes click the size option (nested circles in the top right), then "Ranking". Chose "Degree" as the attribute, and change the minimum size to 10 and the maximum to 50. I would recommend re-running the "Noverlap" layout to ensure the re-sized nodes are all still visible.
- Size based on degree ranking

- Go to the preview window and click "Refresh" in the bottom left corner. You should now have a lovely network diagram! In the preview window there are a lot more modifications you can make, such as showing labels, changing background, changing the edge curvature, and more. These settings are best learned by using, so I recommend clicking through them to see how they change the visual (after you save, of course). A picture of my finished visual is shown below.

|
Creating a Second Layout:
You should now be generally familiar with Gephi and some of its functions. There are lots more that can be done, but the foundation involves most of what we covered. We will use the data to look at other possible layouts that don't rely on coordinates. This step will have less detail than before as it will use many of the same general process. |
- Choose "Frucherman Reingold" as the layout and change the area to 20000. This is done to space the nodes out in a wider area than the starting jumble, creating cleaner completed visuals. Plus, it's fun to watch. Run the algorithm till the points look fairly situated, meaning there's only minor movement of your network.
- Layout, Frucherman Reingold, change area to 20000, run

- To better see our nodes, click the size option (nested circles) in the appearance window, then ranking, and choose "Degree" from the drop down menu. Change the minimum to 10 and the maximum to 100 and click Apply.
- Size nodes on degree ranking

- Change the layout to "Force Atlas 2". This layout groups nodes based on how connected they are to one another, meaning nodes with high number of shared edges attract and those without repel. In the options that appear change the Scaling to 50, click the box next to "Prevent Overlap", then click Run. Click stop once the nodes seem mostly situated. You can recenter the graph using the magnifying glass in the graph window.
- Layout to Force Atlas 2, change scaling to 50 and prevent overlap

- Change any additional settings until you are satisfied with the network in the preview window. I colored the nodes based on city again and changed the background to black (which isn't ideal for papers or most websites, but looks cool). My network diagram is shown below.

|
Related articles
Related articles appear here based on the labels you select. Click to edit the macro and add or change labels.
