featuretypeCounts.Rd
Counts how many reads in short read alignment files (BAM format) overlap with
entire annotation categories. This utility is useful for analyzing the
distribution of the read mappings across feature types, e.g. coding versus non-coding
genes. By default the read counts are reported for the sense and antisense
strand of each feature type separately. To minimize memory consumption, the
BAM files are processed in a stream using utilities from the Rsamtools
and GenomicAlignment
packages. The counts can be reported for each
read length separately or as a single value for reads of any length.
Subsequently, the counting results can be plotted with the associated
plotfeaturetypeCounts
function.
featuretypeCounts(bfl, grl, singleEnd = TRUE, readlength = NULL, type = "data.frame")
BamFileList
object containing the paths to the target BAM files stored
on disk. Note, memory-efficient processing is achieved by streaming through
BAM files rather than reading entire files into memory at once. The maximum
number of alignments to process in each iteration is determined by the
yieldSize
value passed on to the BamFileList
function. For
details see ?BamFileList
.
GRangesList
object containing in each list component the range set of
a feature type. Typically, this object is generated with the genFeatures
function. For details see the example section below and ?genFeatures
.
Specifies whether the targets BAM files contain alignments for single-end (SE) or paired-end
read data. TRUE
is for SE and FALSE
for PE data.
Integer vector specifying the read length values for which to report counts
separately. If readlength=NULL
the length of the reads will be ignored
resulting in a single value for each feature type and strand. Note, for PE data
the two reads in a pair may differ in length. In those cases the length of the
two reads is averaged and then assigned to the corresponding length category
after rounding the mean length to the closest integer. This is not an ideal
solution but a reasonable compromise for the purpose of the summary statistics
generated by featuretypeCounts
.
Determines whether the results are returned as data.frame
(type="data.frame"
)
or as list
(type="list"
). Each list component contains the counting
results for one BAM file and is named after the corresponding sample. The
data.frame
result contains this sample assignment information in a separate
column.
The results are returned as data.frame
or list
of data.frames
.
For details see above under types
argument. The result data.frames
contain
the following columns in the given order:
Sample names obtained from BamFileList
object.
Sense or antisense strand of read mappings.
Name of feature type provided by GRangesList
object. Note, the total number
of aligned reads is reported under the special feature type 'N_total_aligned'. This value
is useful for scaling/normalization purposes in plots, e.g. counts per million reads.
Total genomic length of each reduced feature type in bases. This value is useful to normalize the read counts by genomic length units, e.g. in plots.
Counts for reads of any length or for individual read lengths.
plotfeaturetypeCounts
, genFeatures
## Construct SYSargs2 object from param and targets files
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl",
input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
args
#> Instance of 'SYSargs2':
#> Slot names/accessors:
#> targets: 18 (M1A...V12B), targetsheader: 4 (lines)
#> modules: 1
#> wf: 0, clt: 1, yamlinput: 7 (inputs)
#> input: 18, output: 18
#> cmdlist: 18
#> Sub Steps:
#> 1. hisat2-mapping-se (rendered: TRUE)
#>
#>
if (FALSE) {
## Run alignments
args <- runCommandline(args, dir = FALSE, make_bam = TRUE)
outpaths <- subsetWF(args, slot = "output", subset = 1, index = 1)
## Features from sample data of systemPipeRdata package
library(GenomicFeatures)
file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis")
feat <- genFeatures(txdb, featuretype="all", reduce_ranges=TRUE, upstream=1000, downstream=0, verbose=TRUE)
## Generate and plot feature counts for specific read lengths
fc <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=c(74:76,99:102), type="data.frame")
p <- plotfeaturetypeCounts(x=fc, graphicsfile="featureCounts.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=FALSE)
## Generate and plot feature counts for any read length
fc2 <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=NULL, type="data.frame")
p2 <- plotfeaturetypeCounts(x=fc2, graphicsfile="featureCounts2.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=TRUE)
}