SPR and CWL
How to connect CWL description files within systemPipeR
This section will demonstrate how to connect CWL parameters files to create
workflows. In addition, we will show how the workflow can be easily scalable
with systemPipeR
.
SYSargsList
container stores all the information and instructions needed for processing
a set of input files with a single or many command-line steps within a workflow
(i.e. several components of the software or several independent software tools).
The SYSargsList
object is created and fully populated with the SYSargsList
construct
function.
Full documentation of SYSargsList
management instances can be found here
and here.
The following imports a .cwl
file (here example.cwl
) for running the echo Hello World!
example.
HW <- SYSargsList(wf_file = "example/workflow_example.cwl", input_file = "example/example_single.yml",
dir_path = system.file("extdata/cwl", package = "systemPipeR"))
HW
## Instance of 'SYSargsList':
## WF Steps:
## 1. Step_x --> Status: Pending
## Total Files: 1 | Existing: 0 | Missing: 1
## 1.1. echo
## cmdlist: 1 | Pending: 1
##
cmdlist(HW)
## $Step_x
## $Step_x$defaultid
## $Step_x$defaultid$echo
## [1] "echo Hello World! > results/M1.txt"
However, we are limited to run just one command-line or one sample in this example.
To scale the command-line over many samples, a simple solution offered by systemPipeR
is to provide a variable
for each of the parameters that we want to run with multiple samples.
Let’s explore the example:
dir_path <- system.file("extdata/cwl", package = "systemPipeR")
yml <- yaml::read_yaml(file.path(dir_path, "example/example.yml"))
yml
## $message
## [1] "_STRING_"
##
## $SampleName
## [1] "_SAMPLE_"
##
## $results_path
## $results_path$class
## [1] "Directory"
##
## $results_path$path
## [1] "./results"
For the message
and SampleName
parameter, we are passing a variable connecting
with a third file called targets.
Now, let’s explore the targets
file structure:
targetspath <- system.file("extdata/cwl/example/targets_example.txt", package = "systemPipeR")
read.delim(targetspath, comment.char = "#")
## Message SampleName
## 1 Hello World! M1
## 2 Hello USA! M2
## 3 Hello Bioconductor! M3
The targets
file defines all input files or values and sample ids of an analysis workflow.
For this example, we have defined a string message for the echo
command-line tool,
in the first column that will be evaluated, and the second column is the
SampleName
id for each one of the messages.
Any number of additional columns can be added as needed.
Users should note here, the usage of targets
files is optional when using
systemPipeR's
new CWL interface. Since for organizing experimental variables targets
files are extremely useful and user-friendly. Thus, we encourage users to keep using them.
How to connect the parameter files and targets
file information?
The constructor function creates an SYSargsList
S4 class object connecting three input files:
- CWL command-line specification file (
wf_file
argument); - Input variables (
input_file
argument); - Targets file (
targets
argument).
As demonstrated above, the latter is optional for workflow steps lacking input files.
The connection between input variables (here defined by input_file
argument)
and the targets
file are defined under the inputvars
argument.
A named vector is required, where each element name needs to match with column
names in the targets
file, and the value must match the names of the .yml
variables. This is used to replace the CWL variable and construct all the command-line
for that particular step.
The variable pattern _XXXX_
is used to distinguish CWL variables that target
columns will replace. This pattern is recommended for consistency and easy identification
but not enforced.
The following imports a .cwl
file (same example demonstrated above) for running
the echo Hello World
example. However, now we are connecting the variable defined
on the .yml
file with the targets
file inputs.
HW_mul <- SYSargsList(step_name = "echo", targets = targetspath, wf_file = "example/workflow_example.cwl",
input_file = "example/example.yml", dir_path = dir_path, inputvars = c(Message = "_STRING_",
SampleName = "_SAMPLE_"))
HW_mul
## Instance of 'SYSargsList':
## WF Steps:
## 1. echo --> Status: Pending
## Total Files: 3 | Existing: 0 | Missing: 3
## 1.1. echo
## cmdlist: 3 | Pending: 3
##
cmdlist(HW_mul)
## $echo
## $echo$M1
## $echo$M1$echo
## [1] "echo Hello World! > results/M1.txt"
##
##
## $echo$M2
## $echo$M2$echo
## [1] "echo Hello USA! > results/M2.txt"
##
##
## $echo$M3
## $echo$M3$echo
## [1] "echo Hello Bioconductor! > results/M3.txt"
WConnectivity between CWL param files and targets files.
Creating the CWL param files
In the next two sections, we will discuss how to use createParam
from SPR
to create CWL param files. In createParam
, there are two versions of syntax:
- version 1: pseudo-bash script format, easy to write
- version 2:
;
separated format, has more rules, but support a lot more functionalities.
Contribute new CWL files to systemPipeR
systemPipeR
organizes a collection of CWL CommandLineTool and
Workflow descriptions for a variety of applications, that can be
found on Github cwl_collection.
If you have new cwl files would like to add to this collection, submit a pull request.
After adding, new files will automatically trigger a push to systemPipeRdata (SPRdata) and systemPipeR (SPR) repositories master branch shortly.