Skip to content

A tutorial for pathway visualization using tidyverse, igraph, and ggraph

License

Notifications You must be signed in to change notification settings

cxli233/ggpathway

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ggpathway

A tutorial for pathway visualization using tidyverse, igraph, and ggraph.

Krebs cycle

DOI

Table of contents

  1. Introduction
  2. Example 1: simple linear pathway
  3. Example 2: more complex pathway
  4. Example 3: circular pathway
  5. Subsetting pathway
  6. Combinding pathways
  7. Other examples

Introduction

This markdown page describes how to make pathway diagrams using ggplot compatible functions. It requires:

  • R
  • RStudio
  • Rmarkdown, can be downloaded using install.packages("rmarkdown") in R.

Dependencies

The workflow is built upon tidyverse and igraph. Interactions between ggplot & igraph functions are achieved via ggraph.

If you want to read in excel files, you will need the readxl package.

library(tidyverse)
library(igraph)
library(ggraph)

library(readxl)

library(viridis)
library(RColorBrewer)
library(rcartocolor)

The rest of the loaded packages are for data visualization only (some nice colors in graphs).

The theory behind this workflow

To plot a pathway, we can model the pathway as a network, or a "graph" in graph theory. In mathematics, a graph is a structure that models the relationship between objects. A network can be constructed by:

  • an edge table
  • a node table

For example, we want to visualize a metabolic pathway. In this context, each metabolite is a node; each enzyme is an edge that connects the metabolites. This concept can be applied to signaling pathways as well, with modifications.

We will use tidyverse functions to handle tabular data operations regarding the edge and node tables. We will then use igraph functions to produce a network object from edge and node tables. Finally, we will use ggraph, a ggplot extension of igraph to make pretty plots.

Required input

  • Edge table - each row is an edge, with the following columns:

    • from: where the edge starts, e.g., name of metabolite (required).
    • to: where the edge ends, e.g., name of metabolite (required).
    • label: if you want the edge to be labeled, e.g., name of the enzyme.
    • other information as different columns, e.g., condition, tissue, cell types...
  • Node table - each row is a node, with the following columns:

    • name: name of the node, e.g., name of the metabolite (required).
    • x: x coordinate of the node on the graph.
    • y: y coordinate of the node on the graph.
    • other information as different columns, e.g., molecular weight, localization...

Note that the edge and node tables are tidy data frames. Each row is an observation, and each column is a variable. Also note that the union of from and to columns in the edge table should be identical to the name column of the node table.

Hopefully the above explanation will become more straightforward when do an example. Example input files can be found in the Data folder.

Example 1: simple linear pathway

We will start with a simple example, a linear pathway with 3 steps and 4 metabolites. We will use the oxidative segment of pentose phosphate pathway as an example.

This is a very short pathway, so we can actually write the tables in R by hand. We can write the tables row-by-row using the tribble() function in tidyverse.

Edge table

example1_edge_table <- tribble(
  ~from, ~to,  ~label,
  "Glc6P", "6P-gluconolactone",  "Glc6PHD",
  "6P-gluconolactone", "6P-glucoconate",  "6P-gluconolactonase",
  "6P-glucoconate", "Ru5P", "6P-gluconateDH"
)

head(example1_edge_table)

Node table

example1_nodes_table <- tribble(
  ~name, ~x,  ~y,
  "Glc6P", 1, 0,
  "6P-gluconolactone", 2, 0,  
  "6P-glucoconate", 3, 0,
  "Ru5P", 4, 0
)

head(example1_nodes_table)

Notice here I provided a manual layout; each node is given an x and y coordinate. For example, Glc6P will show up at (1, 0) on the graph and so on.

Make network object and graph

Once the node and edge tables are written, we can combined them into a network object. We use the graph_from_data_frame() function from igraph.

example1_network <- graph_from_data_frame(
  d = example1_edge_table,
  vertices = example1_nodes_table,
  directed = T
)

Note that the directed argument is set to TRUE.

Once the network object is made, we can visualize it using ggraph()

ggraph(example1_network, layout = "manual", 
      x = x, y = y) +
  geom_node_text(aes(label = name), hjust = 0.5) +
  geom_edge_link(aes(label = example1_edge_table$label), 
                   angle_calc = 'along',
                   label_dodge = unit(2, 'lines'),
                   arrow = arrow(length = unit(0.5, 'lines')), 
                   start_cap = circle(4, 'lines'),
                   end_cap = circle(4, 'lines')) +
  theme_void()  

ggsave("../Results/Pentose_1.svg", height = 2, width = 6.5, bg = "white")
ggsave("../Results/Pentose_1.png", height = 2, width = 6.5, bg = "white")

OPPP_short

And there it is! Not very sophisticated, but now we have the frame work to build more complex pathways.

Example 2: more complex pathway

For the 2nd example, let's do a more complex pathway. By more complex I mean more edges and more nodes, as well as branches. We will use the rest of the pentose phosphate pathway.

Once the pathway gets complex enough, it's better to prepare edge & node tables in Excel. Once they are written, you can load them into R.

example2_edges <- read_excel("../Data/OPPP_edges.xlsx")
example2_nodes <- read_excel("../Data/OPPP_nodes.xlsx")

head(example2_edges)
head(example2_nodes)

Important! If a compound appears multiple times in the pathway at different locations, each instance must have a different name.

In this example, Xu5P, Glyceral-3P, and Frc-6P all appear twice. So I named them {name}{1} or {name}{2}. For aesthetic purposes, we can make a new column in the node table called "label", such that different nodes can have the same label, but they must have unique names.

example2_nodes <- example2_nodes %>% 
  mutate(label = str_remove(name, "_\\d"))


head(example2_nodes)

I think we are all good to go.

example2_network <- graph_from_data_frame(
  d = example2_edges,
  vertices = example2_nodes,
  directed = T
)

For a complex pathway with multiple branch points, instead of manual layout, we can also use the layout methods provides by igraph and ggraph. Read more about layouts here.

ggraph(example2_network, layout = "kk") +
  geom_node_point(size = 3, aes(fill = as.factor(carbons)), 
                  alpha = 0.8, shape = 21, color = "grey20") +
  geom_node_text(aes(label = label), hjust = 0.5, repel = T) +
  geom_edge_link(#aes(label = example2_edges$label), 
                   #angle_calc = 'along',
                   label_dodge = unit(2, 'lines'),
                   arrow = arrow(length = unit(0.4, 'lines')), 
                   start_cap = circle(1, 'lines'),
                   end_cap = circle(2, 'lines')) +
  scale_fill_manual(values = carto_pal(7, "Vivid")) +
  labs(fill = "Carbons") +
  theme_void()  

ggsave("../Results/Pentose_2.svg", height = 5, width = 4, bg = "white")
ggsave("../Results/Pentose_2.png", height = 5, width = 4, bg = "white")

OPPP_2

That looks fine to me. I turned off the edge labels, because it's too much text to look at. We can incorporate other info on the graph, such as number of carbons each metabolite has. A purpose of the pentose phosphate pathway is to toggle between 6 or 3 carbon molecules for glycolysis and 5 carbon molecules for nucleotide biosynthesis.

Example 3: circular pathway

For the next example, let's do a circular pathway. An archtypal example is the TCA cycle, aka the Krebs cycle. Let's read in the nodes and edges.

example3_edges <- read_excel("../Data/TCA_cycle_edges.xlsx")
example3_nodes <- read_excel("../Data/TCA_cycle_nodes.xlsx")

head(example3_edges)
head(example3_nodes)

In this example, I also included co-factors (Co-enzymeA, NAD+/NADH, ATP...). Again, when a molecule appears multiple times, each instance must have unique names. For aesthetics only, let's make a label column.

example3_nodes <- example3_nodes %>% 
  mutate(label = str_remove(name, "_\\d"))


head(example3_nodes)

I did some high school math to layout the pathway around a circle.

example3_network <- graph_from_data_frame(
  d = example3_edges,
  vertices = example3_nodes,
  directed = T
)
ggraph(example3_network, layout = "manual",
       x = x, y = y) +
  geom_node_point(size = 3, aes(fill = as.factor(carbons)), 
                  alpha = 0.8, shape = 21, color = "grey20") +
  geom_edge_link(arrow = arrow(length = unit(0.4, 'lines')), 
                   start_cap = circle(0.5, 'lines'),
                   end_cap = circle(0.5, 'lines'), 
                 width = 1.1, alpha = 0.5) +
  geom_node_text(aes(label = label), hjust = 0.5, repel = T) +
  annotate(geom = "text", label = "TCA Cycle", 
           x = 0, y = 0, size = 5, fontface = "bold") +
  scale_fill_manual(values = carto_pal(7, "Vivid")) +
  labs(fill = "Carbons") +
  theme_void() +
  coord_fixed()

ggsave("../Results/TCA_1.svg", height = 4, width = 5, bg = "white")
ggsave("../Results/TCA_1.png", height = 4, width = 5, bg = "white")

TCA1

This looks fine to me. I had to play around with the line and arrow size. Maybe I was too ambitious to put all the cofactors on this.

Subsetting pathway

We can subset a pathway by removing nodes and edges.

example3_nodes_trim <- example3_nodes %>% 
  filter(carbons != "cofactor")

example3_edges_trim <- example3_edges %>% 
  filter(from %in% example3_nodes_trim$name &
           to %in% example3_nodes_trim$name)

Now re-make the network object

example3_network_trim <- graph_from_data_frame(
  d = example3_edges_trim,
  vertices = example3_nodes_trim,
  directed = T
)
ggraph(example3_network_trim, layout = "manual",
       x = x, y = y) +
  geom_node_point(size = 3, aes(fill = as.factor(carbons)), 
                  alpha = 0.8, shape = 21, color = "grey20") +
  geom_edge_link(arrow = arrow(length = unit(0.4, 'lines')), 
                   start_cap = circle(0.5, 'lines'),
                   end_cap = circle(1, 'lines'), 
                 width = 1.1, alpha = 0.5) +
  geom_node_text(aes(label = label), hjust = 0.5, repel = T) +
  annotate(geom = "text", label = "TCA Cycle", 
           x = 0, y = 0, size = 5, fontface = "bold") +
  scale_fill_manual(values = carto_pal(7, "Vivid")) +
  labs(fill = "Carbons") +
  theme_void() +
  coord_fixed()

ggsave("../Results/TCA_2.svg", height = 4, width = 5, bg = "white")
ggsave("../Results/TCA_2.png", height = 4, width = 5, bg = "white")

TCA_2

That's it!

Combining pathways

When we need to combine two pathways, the new edge and node tables are the unions of edges and nodes, respectively. This can be achieved by binding the tables as rows and then removing redundant rows using distinct(..., .keep.all = T) .

First read in data.

calvin_edges <- read_excel("../Data/Calvin_cycle_edges.xlsx")
calvin_nodes <- read_excel("../Data/Calvin_cycle_nodes.xlsx")

PR_edges <- read_excel("../Data/Photorespiration_edges.xlsx")
PR_nodes <- read_excel("../Data/Photorespiration_nodes.xlsx")

Then combine edges and remove redundant ones.

combined_edges <- rbind(
  calvin_edges, 
  PR_edges
) %>% 
  distinct(from, to, .keep_all = T)

Then combine nodes and remove redundant ones.

combined_nodes <- rbind(
  calvin_nodes %>% 
    select(-carbon),
  PR_nodes
) %>% 
  distinct( .keep_all = T)

Then make graph object.

combined_network <- graph_from_data_frame(
  d = combined_edges,
  vertices = combined_nodes,
  directed = T
)

Finally, plot!

ggraph(combined_network, layout = "kk") +
  geom_node_point(size = 3, aes(fill = localization), 
                  alpha = 0.8, shape = 21, color = "grey70") +
  geom_edge_link(label_dodge = unit(2, 'lines'),
                   arrow = arrow(length = unit(0.4, 'lines')), 
                   start_cap = circle(0.75, 'lines'),
                   end_cap = circle(0.75, 'lines'),
                 alpha = 0.5, width = 1.1, color = "grey30") +
  geom_node_text(aes(label = name), hjust = 0.5, repel = T) +
  scale_fill_manual(values = carto_pal(7, "Vivid")[c(4, 2, 5)],
                    limits = c("chloroplast", "peroxisome", "mitochondria")) +
  labs(fill = "Localization",
       title = "Calvin cycle & photorespiration") +
  theme_void() +
  theme(
    legend.position = c(0.8, 0.2)
  ) +
  scale_y_reverse()

ggsave("../Results/Calvin_PS_comb.svg", height = 4.5, width = 5.5, bg = "white")
ggsave("../Results/Calvin_PS_comb.png", height = 4.5, width = 5.5, bg = "white")

combined_pathway

Done! Example script on Calvin cycle, photorespiration, and combined can be found here.

Other examples

Pipeline/workflow visualized as network

Pipelines and workflows can be visualized as a network using ggraph.

Example pipeline

Example script for this pipeline visualization can be found here.

Signaling pathway with inhibitory edges

Signaling pathway with inhibitory edges requires additional customization, as activating and repressive interactions require distinct edge shapes. Activating interactions are usually represented by arrows (->), and repressive interactions are usually represented by bars (-|). We can use the ggarrow package to customize arrow shapes. However, it requires additional tinkering, and for small pathways, I am not sure this is more effective than making the diagram in powerpoint. Here is an example:

Ethylene signaling pathway

Example script for this signaling pathway visualization can be found here.

About

A tutorial for pathway visualization using tidyverse, igraph, and ggraph

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published