Design and run Workflows
About this section
In this section, we will discuss following topics:
- How to create SPR data analysis projects.
- How to build workflow step by step interactively or use use a template as starting point.
- After step design, how to run a workflow.
- After workflow finished running, how we can check the status, visualize the workflow.
- Different options for managing the workflow, e.g. resume, restart, overwrite a SPR project.
- How to explore the workflow object (methods).
- Finally, how to generate some data analysis reports.
Project initialization
To create a workflow within systemPipeR
, we can start by defining an empty
container and checking the directory structure:
sal <- SPRproject()
## Creating directory: /home/lab/Desktop/spr/systemPipeR.github.io/content/en/sp/spr/sp_run/data
## Creating directory: /home/lab/Desktop/spr/systemPipeR.github.io/content/en/sp/spr/sp_run/param
## Creating directory: /home/lab/Desktop/spr/systemPipeR.github.io/content/en/sp/spr/sp_run/results
## Creating directory '/home/lab/Desktop/spr/systemPipeR.github.io/content/en/sp/spr/sp_run/.SPRproject'
## Creating file '/home/lab/Desktop/spr/systemPipeR.github.io/content/en/sp/spr/sp_run/.SPRproject/SYSargsList.yml'
Internally, SPRproject
function will create a hidden folder called .SPRproject
,
by default, to store all the log files.
A YAML
file, here called SYSargsList.yml
, has been created, which initially
contains the basic location of the project structure; however, every time the
workflow object sal
is updated in R, the new information will also be store in this
flat-file database for easy recovery.
If you desire different names for the logs folder and the YAML
file, these can
be modified as follows:
sal <- SPRproject(logs.dir = ".SPRproject", sys.file = ".SPRproject/SYSargsList.yml")
Also, this function will check and/or create the basic folder structure if missing,
which means data
, param
, and results
folder, as described here.
If the user wants to use a different names for these directories, can be specified
as follows:
sal <- SPRproject(data = "data", param = "param", results = "results")
It is possible to separate all the R objects created within the workflow analysis
from the current environment. SPRproject
function provides the option to create
a new environment, and in this way, it is not overwriting any object you may want
to have at your current section.
sal <- SPRproject(envir = new.env())
In this stage, the object sal
is a empty container, except for the project information. The project information can be accessed by the projectInfo
method:
sal
## Instance of 'SYSargsList':
## No workflow steps added
projectInfo(sal)
## $project
## [1] "/home/lab/Desktop/spr/systemPipeR.github.io/content/en/sp/spr/sp_run"
##
## $data
## [1] "data"
##
## $param
## [1] "param"
##
## $results
## [1] "results"
##
## $logsDir
## [1] ".SPRproject"
##
## $sysargslist
## [1] ".SPRproject/SYSargsList.yml"
Also, the length
function will return how many steps this workflow contains,
and in this case, it is empty, as follow:
length(sal)
## [1] 0
Workflow Design
systemPipeR
workflows can be designed and built from start to finish with a single command, importing from an R Markdown file or stepwise in interactive mode from the R console.
In the next section, we will demonstrate how to build the workflow in an interactive mode, and in the following section, we will show how to build from a file.
New workflows are constructed, or existing ones modified, by connecting each step
via appendStep
method. Each SYSargsList
instance contains instructions needed
for processing a set of input files with a specific command-line and the paths to
the corresponding outfiles generated.
The constructor function Linewise
is used to build the R code-based step.
For more details about this S4 class container, see here.
Build workflow interactive
This tutorial shows a straightforward example for describing and explaining all main features available within systemPipeR to design, build, manage, run, and visualize the workflow. In summary, we are exporting a dataset to multiple files, compressing and decompressing each one of the files, importing to R, and finally performing a statistical analysis.
In the previous section, we initialize the project by building the sal
object.
Until this moment, the container has no steps:
sal
## Instance of 'SYSargsList':
## No workflow steps added
In the next subsection, we will discuss how to populate the object created with the first step in the workflow interactively.