Plot read distribution across genomic features

Function to visualize the distribution of reads across different feature types for many alignment files in parallel. The plots are stacked bar plots representing the raw or normalized read counts for the sense and antisense strand of each feature. The graphics results are generated with ggplot2. Typically, the expected input is generated with the affiliated featuretypeCounts function.

plotfeaturetypeCounts(x, graphicsfile, graphicsformat = "pdf", scales = "fixed", anyreadlength = FALSE, 
                      drop_N_total_aligned = TRUE, scale_count_val = 10^6, scale_length_val = NULL)

Arguments

x: data.frame with feature counts generated by the featuretypeCounts function.
graphicsfile: Path to file where to write the output graphics. Note, the function returns the graphics instructions from ggplot2 for interactive plotting in R. However, due to the complexity of the graphics generated here, the finished results are written to a file directly.
graphicsformat: Graphics file format. Currently, supported formats are: pdf, png or jpeg. Argument accepts one of them as character string.
scales: Scales setting passed on to the facet_wrap function of ggplot2. For details see ggplot2::facet_wrap. The default fixed assures a constant scale across all bar plot panels, while free uses the optimum scale within each bar plot panel. To evaluate plots in all their details, it may be necessary to generate two graphics files one for each scaling option.
anyreadlength: If set to TRUE read length specific read counts will be summed up to a single count value to plot read counts for any read length. Otherwise the bar plots will show the counts for each read length value.
drop_N_total_aligned: If set to TRUE the special feature count N_total_aligned will not be included as a separate feature in the plots. However, the information will still be used internally for scaling the read counts to a fixed value if this option is requested under the scale_count_val argument.
scale_count_val: Scales (normalizes) the read counts to a fixed value of aligned reads in each sample such as counts per million aligned reads (default is 10^6). For this calculation the N_total_aligned values are used that are reported in the input data.frame generated by the upstream featuretypeCounds function. Assign NULL to turn off scaling by aligned reads.
scale_length_val: Allows to adjust the raw or scaled read counts to a constant length interval (e.g. scale_length_val=10^3 in bps) considering the total genomic length of the corresponding feature type. The required genomic length information for each feature type is obtained from the Featuretypelength column of the input data.frame generated by the featuretypeCount function. To turn off feature length adjustment, assign NULL (default).

Value

The function returns bar plot graphics for aligned read counts with read length resolution if the input contains this information and argument anyreadlength is set to FALSE. If the input contains counts for any read length and/or anyreadlength=TRUE then there will be only one bar per feature and sample. Due to the complexity of the plots, the results are directly written to file in the chosen graphics format. However, the function also returns the plotting instructions returned by ggplot2 to display the result components using R's plotting device.

Author

Thomas Girke

Examples

## Construct SYSargs2 object from param and targets files 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
args
#> Instance of 'SYSargs2':
#>    Slot names/accessors: 
#>       targets: 18 (M1A...V12B), targetsheader: 4 (lines)
#>       modules: 1
#>       wf: 0, clt: 1, yamlinput: 7 (inputs)
#>       input: 18, output: 18
#>       cmdlist: 18
#>    Sub Steps:
#>       1. hisat2-mapping-se (rendered: TRUE)
#> 
#> 

if (FALSE) {
## Run alignments
args <- runCommandline(args, dir = FALSE, make_bam = TRUE)
outpaths <- subsetWF(args, slot = "output", subset = 1, index = 1)

## Features from sample data of systemPipeRdata package
library(GenomicFeatures)
file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis")
feat <- genFeatures(txdb, featuretype="all", reduce_ranges=TRUE, upstream=1000, downstream=0, verbose=TRUE)

## Generate and plot feature counts for specific read lengths
fc <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=c(74:76,99:102), type="data.frame")
p <- plotfeaturetypeCounts(x=fc, graphicsfile="featureCounts.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=FALSE)

## Generate and plot feature counts for any read length  
fc2 <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=NULL, type="data.frame")
p2 <- plotfeaturetypeCounts(x=fc2, graphicsfile="featureCounts2.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=TRUE)
}

Arguments

Value

Author

See also

Examples