Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

And nf-core, as well as Nextflow, should be available to use.

Creating/Using Custom Nextflow Pipelines

Unfortunately, HMS IT is unable to support custom workflow creation below a surface level due to high amounts of customization involved. If a user is interested in creating their own Nextflow workflow, please see the Nextflow documentation for guidance on how to set up the structure correctly. HMS IT may be able to make recommendations related to resource requirements and such on O2.

Executing nf-core Pipelines

Users that are interested in leveraging existing nf-core workflows may do so using the nf-core utility that they had installed via the instructions above. Generally, these workflows are invoked with the singularity profile for reproducibility purposes, though the conda profile is also supported on O2.If attempting to use an established Nextflow workflow that is independent of official nf-core repositories, please refer to the instructions provided by the workflow maintainer.

Preparing Pipelines for Execution (using Singularity containers)

Note

O2 does not officially support software execution profiles other than singularity at this time. conda may work, but may require additional configuration.

If using the singularity profile, it is necessary to move the associated containers to a whitelisted directory, per O2 containerization policy. With the

Official nf-core

...

Code Block
breakoutModewide
(nf-core)$ nf-core download -x none --container-system singularity --parallel-downloads 8 nf-core/PIPELINENAME

...

Pipelines

Users that are interested in leveraging existing nf-core workflows may do so using the nf-core utility that they had installed via the instructions above. Generally, these workflows are invoked with the singularity profile for reproducibility purposes.

If the pipeline is part of the official nf-core repositories (e.g., it is listed at https://nf-co.re/pipelines/ ), then please contact HMS Research Computing at rchelp@hms.harvard.edu with the pipeline name and version for which you would like the containers to be installed.

Custom or non-`nf-core` pipelines

Users attempting to set up a Nextflow pipeline that is not an official nf-core pipeline will need to download the containers associated with the pipeline using whatever means is suggested by the pipeline maintainers.

At this point, please contact HMS IT Research Computing at rchelp@hms.harvard.edu for assistance with moving these containers to the whitelisted location, and please indicate the path to which you downloaded these containers, as well as whether the pipeline is going to be for your personal use or if it will be shared with fellow lab members.Once the containers are in the appropriate location,

After containers are installed

If the requested containers were associated with an official nf-core pipeline, they will be installed to

Code Block
/n/app/singularity/containers/nf-core/PIPELINENAME/PIPELINEVERSION

For other pipelines, they will be installed to

Code Block
/n/app/singularity/containers/HMSID/

or

Code Block
/n/app/singularity/containers/shared/LABNAME

and possibly within some descriptive subdirectory, depending on preference.

For both cases, once the containers are installed, it is required to set the NXF_SINGULARITY_CACHEDIR environment variable needs prior to be set before executing the pipelineworkflow:

breakoutMode
Code Block
widelanguagebash
(nf-core)$ export NXF_SINGULARITY_CACHEDIR=/n/app/singularity/containers/HMSID/nf-core/PIPELINENAME/PIPELINEVERSION

This will allow you to execute containers associated with PIPELINEVERSION of PIPELINENAME without having to re-download the containers locally. This directory may change - desired paths can be negotiated on a per-request basis, but we recommend the nf-core/PIPELINENAME/PIPELINEVERSION organization method such that users and labs can juggle multiple pipelines and versions if desired (though we request that if a user or lab no longer has need for an older version, they request that those containers be deleted/removed).

CORRECTPATH

Obviously, this variable will need to be reset depending on the pipeline being executed.

From there, modification of the pipeline configuration files may be necessary. To start, there should be a nextflow.config file located at $HOME/.nextflow/assets/nf-coreCATEGORY/PIPELINENAME/nextflow.config. This file will contain parameter settings associated with various steps in the workflow, as well as global maximum resource requirements.

...

  • profiles describes methods by which the pipeline can be invoked. This is specified at execution time via nextflow ... -p profilename1,profilename2,.... At least one profile name must be specified. The profile names in this file are in addition to the default profiles (the singularity profile in this file augments the default singularity profile implemented by Nextflow, etc.).

    • the singularity profile sets parameters to allow usage of Singularity containers on O2 to execute pipeline steps. You shouldn’t need to mess with this profile.

    • the cluster profile sets parameters to allow submission of pipeline steps via O2’s slurm scheduler.

      • the only parameter you may be interested in is the queue parameter, which governs which partition a pipeline step is submitted to.

        • If a pipeline step requires less than 12 hours, it is submitted to short. If less than 5 days, medium. Otherwise, long.

        • If you have access to additional partitions (such as mpi, highmem, contributed partitions, etc.), set queue accordingly.

          • Keep in mind that such special partitions do not have the same time governances (other than the 30 day limit) on them, so if you would like to integrate one or more of these partitions with the existing short / medium / long paradigm, you will likely need to modify one or more of the pipeline-specific configuration files as well. Please contact rchelp@hms.harvard.edu with inquiries about this.

          • If you are planning to use a specialized partition exclusively, then simply overwrite the queue specification with that partition name.

    • the local profile invites the pipeline to be executed within your existing resource allocation (e.g., inside the active interactive session). You need to make sure you have requested the MAXIMUM of cores and memory desired by any one step of the pipeline in order for this profile to execute successfully.

  • executor describes how the pipeline processes will be run (such as on local compute resources, on cloud resources, or by interacting with a cluster compute scheduler). The executor will keep track of each of the processes, and if they succeed or fail.

    • When using the slurm executor, Nextflow can submit each process in the workflow as an sbatch job.

      • Additional parameters that govern the Slurm job submission process are queueSize and submitRateLimit. The queueSize is how many tasks can be processed at one time; here we use 1900 tasks. The submitRateLimitis the maximum number of jobs that will be submitted for a specified time interval. In our file, we limit it to 20 jobs submitted per second.

    • When using the local executor, Nextflow will run each process using the resources available on the current compute node.

Executing Nextflow Pipelines

Once the NXF_SINGULARITY_CACHEDIR environment variable is set (assuming you are using the singularity profile), you have two options for invoking your pipeline:

  1. if the pipeline is an official nf-core pipeline, you can simply paste the command from the website and modify it to use the correct input, output, and profile settings.

  2. Otherwise, use nextflow run. A typical nextflow run command may look something like this:

    Code Block
    nextflow run REPONAME/PIPELINENAME -profile cluster,singularity -c /path/to/slurm.config -input /path/to/input -outdir /path/to/output

    You may need to refer to execution instructions provided by the pipeline maintainer.

To view a list of all pipelines you have ever downloaded/run, you can invoke the nextflow list command. These pipelines are located at $HOME/.nextflow/assets.

Cleaning Up After Execution

After your pipeline completes, there will be a work directory and .nextflow directories at the location where you executed the workflow (not to be confused with your output directory). You may find it useful to occasionally delete this directorythese directories, especially if you find that you are using far more space than anticipated. You can keep track of your utilization with the quota-v2 tool (see https://harvardmed.atlassian.net/wiki/spaces/O2/pages/1588662343/Filesystem+Quotas#Checking-Usage ).

Note that there these directories will be a work directory present at every location where you have ever executed a pipeline, so you may need to remove multiple work directories from different locations if you do not have an established means of organization for juggling multiple workflows.

...