- Introduction
- Motivation
- Design
- Templates
- Getting started
July 22, 2015
systemPipeR
systemPipeR
Workflow steps with input/output file operations are controlled by SYSargs
objects.
Each SYSargs
instance is constructed from a targets
file and a param
file.
Only input provided by user is initial targets
file. Subsequent targets
instances are created automatically.
Any number of predefined or custom workflow steps are supported.
systemPipeRdata
: template workflowssystemPipeR
.rsubread
, Bowtie2/Tophat2
edgeR
or DESeq2
gsnap
, bwa
VariantTools
, GATK
, BCFtools
VariantTools
and VariantAnnotation
VariantAnnotation
rsubread
, Bowtie2
MACS2
, BayesPeak
Tophat2
(or any other RNA-Seq aligner)Workflow templates for:
Install required packages
if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("systemPipeR") # Install systemPipeR from Bioconductor BiocManager::install("tgirke/systemPipeRdata", build_vignettes=TRUE, dependencies=TRUE) # From github
Load packages and accessing help
library("systemPipeR") library("systemPipeRdata")
Access help
library(help="systemPipeR") vignette("systemPipeR")
Targets
file organizes samplesStructure of targets
file for single-end (SE) library
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR") read.delim(targetspath, comment.char = "#")[1:3,1:5]
## FileName SampleName Factor SampleLong Experiment ## 1 ./data/SRR446027_1.fastq M1A M1 Mock.1h.A 1 ## 2 ./data/SRR446028_1.fastq M1B M1 Mock.1h.B 1 ## 3 ./data/SRR446029_1.fastq A1A A1 Avr.1h.A 1
Structure of targets
file for paired-end (PE) library
targetspath <- system.file("extdata", "targetsPE.txt", package="systemPipeR") read.delim(targetspath, comment.char = "#")[1:3,1:4]
## FileName1 FileName2 SampleName Factor ## 1 ./data/SRR446027_1.fastq ./data/SRR446027_2.fastq M1A M1 ## 2 ./data/SRR446028_1.fastq ./data/SRR446028_2.fastq M1B M1 ## 3 ./data/SRR446029_1.fastq ./data/SRR446029_2.fastq A1A A1
SYSargs
: targets
& param
SYSargs
instances are constructed from a targets
file and a param
file. The param
file contains the settings for running command-line software.
parampath <- system.file("extdata", "tophat.param", package="systemPipeR") (args <- suppressWarnings(systemArgs(sysma=parampath, mytargets=targetspath)))
## An instance of 'SYSargs' for running 'tophat' on 18 samples
Slots and accessor functions have the same names
names(args)[c(5,8,13)]
## [1] "software" "reference" "sysargs"
Return command-line arguments for given software, here Tophat2
for 1st sample.
sysargs(args)[1]
## tophat -p 4 -o SRR446027_1.fastq.tophat tair10.fasta SRR446027_1.fastq .SRR446027_2.fastq
Run command-line tool, here Tophat2
, on single machine. Command-line tool needs to be installed for this.
runCommandline(args)
Submit command-line or R processes to a computer cluster with a queueing system.
clusterRun(args, ...)
The last step requires additional resource allocation arguments. For details please visit the main manual here.
Generate workflow template, e.g. "rnaseq", "varseq" or "chipseq"
genWorkenvir(workflow="varseq", mydirname=NULL) setwd("varseq")
Command-line alternative for generating workflow environments
$ echo 'library(systemPipeRdata); genWorkenvir(workflow="varseq", mydirname=NULL)' | R --slave
The workflow templates generated by genWorkenvir
contain the following preconfigured directory structure:
workflow_name/ # *.Rnw/*.Rmd scripts, targets file, etc. param/ # parameter files for command-line software data/ # inputs e.g. FASTQ, reference, annotations results/ # analysis result files
The above structure can be customized as needed, but for first-time users it is easier to keep changes to a minimum.
*.Rnw
template file (or *.Rmd
or *.R
versions).$ make -B
Analysis reports in PDF or HTML format are autogenerated when running a workflow using standard R resources for scientific report generation including knitr
and rmarkdown
, respectively.
Integration of ReportingTools is also straightforward.
.param
filesGirke, Thomas. 2014. “systemPipeR: NGS Workflow and Report Generation Environment.” UC Riverside. https://github.com/tgirke/systemPipeR.
Huber, Wolfgang, Vincent J Carey, Robert Gentleman, Simon Anders, Marc Carlson, Benilton S Carvalho, Hector Corrada Bravo, et al. 2015. “Orchestrating High-Throughput Genomic Analysis with Bioconductor.” Nat. Methods 12 (2): 115–21. doi:10.1038/nmeth.3252.