3C data measuring the interaction frequency between two genomic coordinates can be represented graphically in different ways. Often, simple barplots are used and comics are drawn to indicate the genomic position of anchors and restriction fragments of interest. However, this can be tedious and errorprone. 3C data can also be represented as arcs spanning from the anchor (aka viewpoint) to the regions of interest (aka restriction fragments) rendering a precise illustration of the genomic context. In these plots, the interaction frequency correlates with the hight of the arc. A published example of such a plot can be found here. To my knowledge, there is no R package that conveniently generates this kind of plot. Below is my ggplot solution to generate ‘3C arc plots’. Feel free to email me with any questions, improvements or comments.
Load packages and set ggplot themes
# load required packages library(tidyverse) # these are general theme settings I commonly use for publication-grade plots theme_linedraw_noframe <- theme_linedraw()+ theme(panel.grid = element_blank(), panel.border = element_blank(), axis.line = element_line(), legend.justification = c(1, 1), legend.position = c(1, 1)) # move the legend into the plot
The dummy datasets is an example of an 3C experiment you would do in the lab: From one specific anchor, the interaction frequencies to 4 different fragments of interest are tested in two different conditions (control and knock-down). For each condition we have 3 replicates. If you would like to follow along, download the dummy data here.
# read 3C data df <- read_tsv("../downloads/2020-01-03-3C-dummy-data.txt")
The following steps first convert the data to long format (generally
required for plotting with ggplot) using dplyr’s
gather. We then
generate the midpoint between anchor and fragment of interest - this is
essentially the coordinate where the interaction frequency data will be
We then summarize the data from all three replicates in each group for
each assay (i.e. anchor - fragment pair) using dplyr’s
To plot a nice arc from anchor, over midpoint to fragment of interest
coordinates, we need to ‘add back’ the anchor and fragment rows to the
long format (this is done by another
gather command) and set the
interaction frequency at these points to 0 (once you look at the final
plot, this will make sense).
geom_line needs groups to know which points should be
connected, a new column combining condition and assay.name is generated.
# gather to convert from wide to long format dfg <- df %>% gather(-assay.name,-anchor.coord,-fragment.coord,key = "sample",value = "frequency") # extract condition from sample name dfg$condition <- gsub("\\..*","",dfg$sample) # calculate midpoint between each anchor and bait coordinate (the interaction frequency is going to be plotted over this coordinate) dfg$mid.coord=(dfg$fragment.coord+dfg$anchor.coord)/2 # summarize data dfs <- dfg %>% group_by(condition,anchor.coord,fragment.coord,mid.coord,assay.name) %>% summarise(mean.frequency=mean(frequency),sd.frequency=sd(frequency)) # 'add back' anchor and fragment rows using gather and rename key / points dfsg <- dfs %>% gather(-condition,-mean.frequency,-sd.frequency,-assay.name,key = point,value = coord) %>% mutate(point=gsub(".coord","",point)) # set anchor and probe frequency to 0: dfsg$mean.frequency[dfsg$point %in% c("anchor","fragment")] <- 0 # add group column (for ggplot's geom_line) dfsg$group <- paste0(dfsg$condition,dfsg$assay.name)
3C arc plot
The basic principle here is to use ggplots
stat_smooth to plot a
second degree polynomial regression line from the anchor over the
midpoint to the fragment of interest genomic coordinates. Note that the
interaction frequency at the anchor and the fragment are set to 0. The
actual interaction frequency data is plotted to the midpoint.
p <- ggplot(dfsg,aes(x=coord,y=mean.frequency,color=condition,group=group))+ geom_line(stat="smooth",method = "lm",formula = y ~ poly(x, 2), se = FALSE,lineend="round",size=2,alpha=0.8)+ scale_color_manual(values = c("firebrick","grey40"))+ # pick colors manually labs(y="Relative interaction\nfrequency",x="genomic coordinate",color="")+ # rename axis labels theme_linedraw_noframe # these are the theme settings defined above p
3C arc plot with errorbars
If you like, you could add errorbars to this plot - depending on how many fragments you are looking at, this could get messy though.
p+geom_errorbar(dfsg %>% filter(point=="mid"),inherit.aes = F,size=0.5, mapping = aes(x=coord,ymin=mean.frequency-sd.frequency,ymax=mean.frequency+sd.frequency,color=condition))
3C barplot with arcs
with arcs Of course, 3C data can be represented as simple barplots. Here is a barplot where the 3C arcs are added in. Arcs are only represented for the most important condition. This could be useful, for example, when you have multiple conditions that would overcrowd the 3C arc plot above.
ggplot()+ geom_line(dfsg %>% filter(condition=="control"),mapping=aes(x=coord,y=mean.frequency,color=condition,group=group), stat="smooth",method = "lm",formula = y ~ poly(x, 2), se = FALSE,lineend="round",size=2,alpha=0.8)+ geom_bar(dfs, mapping = aes(x=fragment.coord,y=mean.frequency,fill=condition), stat='identity',position=position_dodge())+ scale_color_manual(values = c("firebrick","grey40"))+ # pick colors manually scale_fill_manual(values = c("firebrick","grey40"))+ # pick colors manually labs(y="Relative interaction\nfrequency",x="genomic coordinate",fill="")+ # rename axis labels theme_linedraw_noframe+ # these are the theme settings defined above guides(color=F)
Please feel free to email me with any questions, comments or suggestions and I’ll be happy to post them here.
info at jchellmuth.com