Workflow Templates

Workflow templates

The intended way of running systemPipeR workflows is via *.Rmd files, which can be executed either line-wise in interactive mode or with a single command from R or the command-line. This way comprehensive and reproducible analysis reports can be generated in PDF or HTML format in a fully automated manner by making use of the highly functional reporting utilities available for R. The following shows how to execute a workflow (e.g., systemPipeRNAseq.Rmd) from the command-line.

Rscript -e "rmarkdown::render('systemPipeRNAseq.Rmd')"

Templates for setting up custom project reports are provided as **.Rmd* files by the helper package *systemPipeRdata* and in the vignettes subdirectory of systemPipeR. The corresponding HTML of these report templates are available here: systemPipeRNAseq, systemPipeRIBOseq, systemPipeChIPseq and systemPipeVARseq. To work with *.Rnw* or *.Rmd* files efficiently, basic knowledge of Sweave or knitr and Latex or R Markdown v2 is required.

RNA-Seq sample

Load the RNA-Seq sample workflow into your current working directory.

library(systemPipeRdata)
genWorkenvir(workflow = "rnaseq")
setwd("rnaseq")

Run workflow

Next, run the chosen sample workflow systemPipeRNAseq (PDF, Rmd) by executing from the command-line make -B within the rnaseq directory. Alternatively, one can run the code from the provided *.Rmd template file from within R interactively.

The workflow includes following steps:

  1. Read preprocessing
    • Quality filtering (trimming)
    • FASTQ quality report
  2. Alignments: Tophat2 (or any other RNA-Seq aligner)
  3. Alignment stats
  4. Read counting
  5. Sample-wise correlation analysis
  6. Analysis of differentially expressed genes (DEGs)
  7. GO term enrichment analysis
  8. Gene-wise clustering

ChIP-Seq sample

Load the ChIP-Seq sample workflow into your current working directory.

library(systemPipeRdata)
genWorkenvir(workflow = "chipseq")
setwd("chipseq")

Run workflow

Next, run the chosen sample workflow systemPipeChIPseq_single (PDF, Rmd) by executing from the command-line make -B within the chipseq directory. Alternatively, one can run the code from the provided *.Rmd template file from within R interactively.

The workflow includes the following steps:

  1. Read preprocessing
    • Quality filtering (trimming)
    • FASTQ quality report
  2. Alignments: Bowtie2 or rsubread
  3. Alignment stats
  4. Peak calling: MACS2, BayesPeak
  5. Peak annotation with genomic context
  6. Differential binding analysis
  7. GO term enrichment analysis
  8. Motif analysis

VAR-Seq sample

VAR-Seq workflow for the single machine

Load the VAR-Seq sample workflow into your current working directory.

library(systemPipeRdata)
genWorkenvir(workflow = "varseq")
setwd("varseq")

Run workflow

Next, run the chosen sample workflow systemPipeVARseq_single (PDF, Rmd) by executing from the command-line make -B within the varseq directory. Alternatively, one can run the code from the provided *.Rmd template file from within R interactively.

The workflow includes following steps:

  1. Read preprocessing
    • Quality filtering (trimming)
    • FASTQ quality report
  2. Alignments: gsnap, bwa
  3. Variant calling: VariantTools, GATK, BCFtools
  4. Variant filtering: VariantTools and VariantAnnotation
  5. Variant annotation: VariantAnnotation
  6. Combine results from many samples
  7. Summary statistics of samples

VAR-Seq workflow for computer cluster

The workflow template provided for this step is called systemPipeVARseq.Rmd (PDF, Rmd). It runs the above VAR-Seq workflow in parallel on multiple compute nodes of an HPC system using Slurm as the scheduler.

Ribo-Seq sample

Load the Ribo-Seq sample workflow into your current working directory.

library(systemPipeRdata)
genWorkenvir(workflow = "riboseq")
setwd("riboseq")

Run workflow

Next, run the chosen sample workflow systemPipeRIBOseq (PDF, Rmd) by executing from the command-line make -B within the ribseq directory. Alternatively, one can run the code from the provided *.Rmd template file from within R interactively.

The workflow includes following steps:

  1. Read preprocessing
    • Adaptor trimming and quality filtering
    • FASTQ quality report
  2. Alignments: Tophat2 (or any other RNA-Seq aligner)
  3. Alignment stats
  4. Compute read distribution across genomic features
  5. Adding custom features to the workflow (e.g. uORFs)
  6. Genomic read coverage along with transcripts
  7. Read counting
  8. Sample-wise correlation analysis
  9. Analysis of differentially expressed genes (DEGs)
  10. GO term enrichment analysis
  11. Gene-wise clustering
  12. Differential ribosome binding (translational efficiency)

Version information

Note: the most recent version of this tutorial can be found here.

sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /home/dcassol/src/R-4.0.3/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] magrittr_2.0.1              batchtools_0.9.15          
##  [3] ape_5.4-1                   ggplot2_3.3.3              
##  [5] systemPipeR_1.25.6          ShortRead_1.48.0           
##  [7] GenomicAlignments_1.26.0    SummarizedExperiment_1.20.0
##  [9] Biobase_2.50.0              MatrixGenerics_1.2.1       
## [11] matrixStats_0.58.0          BiocParallel_1.24.1        
## [13] Rsamtools_2.6.0             Biostrings_2.58.0          
## [15] XVector_0.30.0              GenomicRanges_1.42.0       
## [17] GenomeInfoDb_1.26.2         IRanges_2.24.1             
## [19] S4Vectors_0.28.1            BiocGenerics_0.36.0        
## [21] BiocStyle_2.18.1           
## 
## loaded via a namespace (and not attached):
##   [1] colorspace_2.0-0         rjson_0.2.20             hwriter_1.3.2           
##   [4] ellipsis_0.3.1           bit64_4.0.5              AnnotationDbi_1.52.0    
##   [7] xml2_1.3.2               codetools_0.2-18         splines_4.0.3           
##  [10] cachem_1.0.3             knitr_1.31               jsonlite_1.7.2          
##  [13] annotate_1.68.0          GO.db_3.12.1             dbplyr_2.1.0            
##  [16] png_0.1-7                pheatmap_1.0.12          graph_1.68.0            
##  [19] BiocManager_1.30.10      compiler_4.0.3           httr_1.4.2              
##  [22] GOstats_2.56.0           backports_1.2.1          assertthat_0.2.1        
##  [25] Matrix_1.3-2             fastmap_1.1.0            limma_3.46.0            
##  [28] formatR_1.7              htmltools_0.5.1.1        prettyunits_1.1.1       
##  [31] tools_4.0.3              gtable_0.3.0             glue_1.4.2              
##  [34] GenomeInfoDbData_1.2.4   Category_2.56.0          dplyr_1.0.4             
##  [37] rsvg_2.1                 rappdirs_0.3.3           V8_3.4.0                
##  [40] Rcpp_1.0.6               vctrs_0.3.6              nlme_3.1-152            
##  [43] blogdown_1.1.7           rtracklayer_1.50.0       xfun_0.21               
##  [46] stringr_1.4.0            lifecycle_1.0.0.9000     XML_3.99-0.5            
##  [49] edgeR_3.32.1             zlibbioc_1.36.0          scales_1.1.1            
##  [52] BSgenome_1.58.0          VariantAnnotation_1.36.0 hms_1.0.0               
##  [55] RBGL_1.66.0              RColorBrewer_1.1-2       yaml_2.2.1              
##  [58] curl_4.3                 memoise_2.0.0            biomaRt_2.46.3          
##  [61] latticeExtra_0.6-29      stringi_1.5.3            RSQLite_2.2.3           
##  [64] genefilter_1.72.1        checkmate_2.0.0          GenomicFeatures_1.42.1  
##  [67] DOT_0.1                  rlang_0.4.10             pkgconfig_2.0.3         
##  [70] bitops_1.0-6             evaluate_0.14            lattice_0.20-41         
##  [73] purrr_0.3.4              bit_4.0.4                tidyselect_1.1.0        
##  [76] GSEABase_1.52.1          AnnotationForge_1.32.0   bookdown_0.21           
##  [79] R6_2.5.0                 generics_0.1.0           base64url_1.4           
##  [82] DelayedArray_0.16.1      DBI_1.1.1                withr_2.4.1             
##  [85] pillar_1.4.7             survival_3.2-7           RCurl_1.98-1.2          
##  [88] tibble_3.0.6             crayon_1.4.1             BiocFileCache_1.14.0    
##  [91] rmarkdown_2.6            jpeg_0.1-8.1             progress_1.2.2          
##  [94] locfit_1.5-9.4           grid_4.0.3               data.table_1.13.6       
##  [97] blob_1.2.1               Rgraphviz_2.34.0         digest_0.6.27           
## [100] xtable_1.8-4             brew_1.0-6               openssl_1.4.3           
## [103] munsell_0.5.0            askpass_1.1

References

Last modified 2021-04-26 : fix images path; no_render (7cf5004c)