Authors: Daniela Cassol (), Le Zhang (), Thomas Girke ().

Institution: Institute for Integrative Genome Biology, University of California, Riverside, California, USA.

Overview

Workshop Description

This workshop introduces systemPipe (SP), a generic toolkit for designing and running reproducible data analysis workflows. The environment consists of three major modules implemented as R/Bioconductor packages. systemPipeR (SPR) provides core functionalities for defining workflows, interacting with command-line software, and executing both R and/or command-line software, as well as generating publication-quality analysis reports. systemPipeShiny (SPS) integrates a graphical user interface for managing workflows and visualizing results interactively. systemPipeWorkflow (SPW) offers a collection of pre-configured workflow templates. This hand-on event will include the following topics: (1) brief overview of the design principles and functionalities of the SP toolkit; (2) design and usage of SPR’s command-line interface based on an object-oriented R implementation of CWL; (3) configuration and execution of workflows; (4) construction of custom workflows; (5) configuration and execution of a pre-configured workflow example from start to finish, e.g. RNA-Seq template; (6) parallel execution of workflows on HPC and cloud systems with and without schedulers; (7) generation of technical and scientific analysis reports including visualization; and (8) demonstration of SPS’ core functionalities, the project’s Shiny App.

Pre-requisites

  • Basic knowledge of R and usage of Bioconductor packages for NGS analysis
  • Basic knowledge of running command-line software
  • Basic knowledge of parallelization concepts

Non-essential background reading:

Workshop Participation

Participants will be able to perform all analysis components of this workshop hands-on. Active user participation throughout the event is highly encouraged, including but not limited to lecture material, hands-on sections, and final discussion about package improvements. Participants are encouraged to ask questions at any time during the workshop.

R / Bioconductor packages used

Time outline

1h 45m total

Activity Time
Overview of systemPipe toolkit 05m
Introduction to SPR’s command-line interface 15m
Showcase RNA-Seq workflow 20m
Configuration and execution of workflows 20m
Generation of technical and scientific analysis reports 5m
Overview of systemPipeShiny core functionalities 10m
systemPipeShiny Showcase 20m
Q&A 5m

Workshop goals and objectives

Learning goals

  • Recognize the benefits of a generic R-based workflow construction environment that is both scalable and reproducible
  • Integration of command-line tools via the CWL community standard
  • Rendering of R markdown reports and critical assessment of scientific analysis reports
  • Parallelization of big data analysis tasks

Learning objectives

  • Identify and practice how to make analysis workflows more robust, reproducible, and portable across heterogeneous computing systems
  • Usage of new workflow control class for designing, configuring, and running workflows
  • Optimize and debug workflows
  • Inspection of technical reports and log files
  • Design of new and fully customized workflows
  • Practice interactive workflow management and visualization