Run Nextflow¶

Nextflow usage¶

The Nextflow command has lot of sub-command to handle workflows. A workflow in Nextflow is called 'project'.

Usage: nextflow [options] COMMAND [arg...]

Options:
  -C
     Use the specified configuration file(s) overriding any defaults
  -D
     Set JVM properties
  -bg
     Execute nextflow in background
  -c, -config
     Add the specified file to configuration set
  -config-ignore-includes
     Disable the parsing of config includes
  -h
     Print this help
  -log
     Set nextflow log file path
  -q, -quiet
     Do not print information messages
  -remote-debug
     Enable JVM interactive remote debugging (experimental)
  -syslog
     Send logs to syslog server (eg. localhost:514)
  -trace
     Enable trace level logging for the specified package name - multiple packages can be provided separating them with a comma e.g. '-trace nextflow,io.seqera'
  -v, -version
     Print the program version

Commands:
  clean         Clean up project cache and work directories
  clone         Clone a project into a folder
  config        Print a project configuration
  console       Launch Nextflow interactive console
  drop          Delete the local copy of a project
  help          Print the usage help for a command
  info          Print project and system runtime information
  inspect       Inspect process settings in a pipeline project
  kuberun       Execute a workflow in a Kubernetes cluster (experimental)
  list          List all downloaded projects
  log           Print executions log and runtime info
  plugin        Execute plugin-specific commands
  plugins       Execute plugin-specific commands
  pull          Download or update a project
  run           Execute a pipeline project
  secrets       Manage pipeline secrets (preview)
  self-update   Update nextflow runtime to the latest available version
  view          View project script file(s)

To get help on a particular Nextflow subcommand :

nextflow help COMMAND

Question

Launch the sub-commands : list, info

Run a workflow¶

To run a workflow you can either use :

a nextflow file (.nf)
a project name (from a repository)
an url repository

In this part we will see how to run from a file or a project name, but also how to change parameter value, to resume a workflow, to use a scheduler.

Here is the usage of the command run.

$ nextflow run -help
Execute a pipeline project
Usage: run [options] Project name or repository url
Options:
  -E
     Exports all current system environment
     Default: false

  [...]

  -without-docker
     Disable process execution with Docker
     Default: false
  -without-podman
     Disable process execution in a Podman container
  -w, -work-dir
     Directory where intermediate result files are stored

Info

All the options associate to nextflow are prefixed by only one '-'

Run from a file `*.nf`¶

Here we are going to execute a workflow defined in a file.

Download the file with the following command :

wget https://genotoul-bioinfo.pages.mia.inra.fr/use-nextflow-nfcore-course/nextflow/tutorial.nf
more tutorial.nf

The workflow contains 2 main steps (called process), the first process splits a string into 6-character chunks, writing each one to a file with the prefix chunk_, and the second receives these files and transforms their contents to uppercase letters.

nextflow run tutorial.nf

It will output something similar to the text shown below:

Nextflow 24.04.3 is available - Please consider updating your version to it

 N E X T F L O W   ~  version 24.04.2

Launching `tutorial.nf` [disturbed_gauss] DSL2 - revision: ddf5f40139

executor >  local (3)
[e7/441d9c] process > SPLITLETTERS (1)   [100%] 1 of 1 ✔
[bd/38959b] process > CONVERTTOUPPER (1) [100%] 2 of 2 ✔
WORLD!
HELLO

You can see that the first process is executed once, and the second twice. Finally the result string is printed.

It's worth noting that the process convertToUpper is executed in parallel, so there's no guarantee that the instance processing the first split (the chunk Hello) will be executed before the one processing the second split (the chunk world!).

Thus, it is perfectly possible that you will get the final result printed out in a different order:

WORLD!
HELLO

What does this create ?

Create a work directory which contains temporary files
.nextflow directory which contains cache of execution
.nextflow.log: log of the last execution

$ ls -altr
total 18
-rw-r--r-- 1 pervenche formation  372 17 janv. 17:04 tutorial.nf
drwx--x--x 5 pervenche formation 8192 20 janv. 14:29 ..
drwxr-xr-x 4 pervenche formation 4096 20 janv. 15:08 .
drwxr-xr-x 5 pervenche formation 4096 20 janv. 15:08 work
drwxr-xr-x 3 pervenche formation 4096 20 janv. 15:08 .nextflow
-rw-r--r-- 1 pervenche formation 5306 20 janv. 15:08 .nextflow.log

The content of these directory and files is explained in section outputs.

Change parameter value¶

This workflow has one parameter named greeting. See help with command :

nextflow run tutorial.nf --help

To change the default value use --greeting in command line.

nextflow run tutorial.nf --greeting "mon texte a mettre en majuscule"

Info

The parameter of the workflow is prefixed by two dash: '--'.

It will output something similar to the text shown below:

Nextflow 24.04.2 is available - Please consider updating your version to it
N E X T F L O W  ~  version 23.10.0
Launching `tutorial.nf` [fervent_dijkstra] DSL2 - revision: cf991824f7
executor >  local (7)
[a1/868e57] process > SPLITLETTERS (1)   [100%] 1 of 1 ✔
[e7/6f64ab] process > CONVERTTOUPPER (5) [100%] 6 of 6 ✔
XTE A 
E
METTRE
MON TE
 EN MA
JUSCUL

Resume a workflow¶

Nextflow keeps track of all the processes executed in your pipeline. With -resume option the execution of the processes that are not changed will be skipped and the cached result used instead.

nextflow run tutorial.nf --greeting "mon texte a mettre en majuscule" -resume

Nextflow 24.04.2 is available - Please consider updating your version to it
N E X T F L O W  ~  version 23.10.0
Launching `tutorial.nf` [focused_swirles] DSL2 - revision: cf991824f7
[a1/868e57] process > SPLITLETTERS (1)   [100%] 1 of 1, cached: 1 ✔
[e7/6f64ab] process > CONVERTTOUPPER (5) [100%] 6 of 6, cached: 6 ✔
 EN MA
METTRE
MON TE
XTE A 
E
JUSCUL

All the processes are retrieved from the cached as shown above.

Warning

nextflow options are prefixed by only one -
workflow parameters are prefixed by --

Run from a repository¶

While Nextflow run a pipeline, if the pipeline is not locally available, it is downloaded from a BitBucket, GitHub, and GitLab repositories, more info here.

nextflow run nextflow-io/hello

Nextflow 24.04.2 is available - Please consider updating your version to it
N E X T F L O W  ~  version 23.10.0
Pulling nextflow-io/hello ...
 downloaded from https://github.com/nextflow-io/hello.git
Launching `https://github.com/nextflow-io/hello` [hopeful_edison] DSL2 - revision: 7588c46ffe [master]
executor >  local (4)
[f0/6c0524] process > sayHello (3) [100%] 4 of 4 ✔
Ciao world!

Hola world!

Bonjour world!

Hello world!

Where does the workflow is downloaded ?

If you don't find try nextflow info nextflow-io/hello

Solution

Check local path given by the command nextflow info nextflow-io/hello :

/home/$USER/.nextflow/assets/nextflow-io/hello

Use slurm¶

Nextflow is designed to work on many executors such as SGE, SLURM, ... or even on clouds such as Kubernates, Amazon, ...

On Genotoul, we have the SLURM batch scheduler. To enable it, create a file named nextflow.config in current directory and write the following lines:

process.executor = 'slurm'

Run the workflow

nextflow run nextflow-io/hello

Where does the processes are run ?

You should see the line executor > slurm (4)

Nextflow run options¶

Nextflow run command has a lot of options, here are the main useful options :

Configuration

-profile Choose a configuration profile pipelines can provide several profiles. With this option you can overload parameters. (see next paragraph)[#nextflow-config]

Execution

-resume Execute the script using the cached results, useful to continue executions that were stopped by an error
-w, -work-dir Directory where intermediate result files are stored

Trace

-with-dag Create pipeline DAG file
-with-report Create processes execution html report -> really useful to get execution report on memory and cpu usage in order to calibrate pipeline parameters
-with-timeline Create processes execution timeline file
-with-trace Create processes execution tracing file

Dependancies

-with-conda Use the specified Conda environment package or file (must end with .yml|.yaml suffix)
-with-docker Enable process execution in a Docker container
-with-singularity Enable process execution in a Singularity container.
-without-docker Disable process execution with Docker Default: false

Workflow version

-latest Pull latest changes before run Default: false
-r, -revision Revision of the project to run (either a git branch, tag or commit SHA number)

Question

Execute the following command with options and explore the new generated files.

nextflow run tutorial.nf -with-timeline -with-trace -with-report -with-dag

This command will generate following files :

trace.txt
report.html
timeline.html
dag.dot

Here is the content of trace.txt

more trace-20240603-70106879.txt 
task_id hash    native_id   name    status  exit    submit  duration    realtime    %cpu    peak_rss    peak_vmem   rchar   wchar
1   a8/ce12e7   8239525 SPLITLETTERS (1)    COMPLETED   0   2024-06-03 19:28:27.624 4.8s    77ms    22.7%   3.3 MB  12.7 MB 609.1 KB    2 KB
2   78/b0115e   8239526 CONVERTTOUPPER (1)  COMPLETED   0   2024-06-03 19:28:32.511 4.9s    71ms    37.0%   3.2 MB  12.7 MB 614.3 KB    2 KB
3   4a/b4d152   8239527 CONVERTTOUPPER (2)  COMPLETED   0   2024-06-03 19:28:32.536 29.9s   84ms    30.6%   3.4 MB  12.7 MB 615.4 KB    2 KB

Question

Go to the section trace and find the meaning of each column

Note

If you want to always have a trace file you can enable it in nextflow.config with following lines:

trace {
   enabled = true
   file = 'pipeline_trace.txt'
   fields = 'task_id,hash,name,status,exit,duration,realtime,%cpu,%mem,rss'
}

then running the command without parameter will genenerate pipeline_trace.txt

nextflow run tutorial.nf

more pipeline_trace.txt
task_id hash    name    status  exit    duration    realtime    %cpu    %mem    rss
1   b4/80e053   SPLITLETTERS (1)    COMPLETED   0   4.8s    76ms    33.7%   0.0%    3.3 MB
3   44/17974f   CONVERTTOUPPER (2)  COMPLETED   0   4.8s    71ms    23.8%   0.0%    3.4 MB
2   9a/087528   CONVERTTOUPPER (1)  COMPLETED   0   4.8s    93ms    28.6%   0.0%    0

Warning

In nf-core, thoses reports are always created in directory pipeline_info

To view the others html files, you have several ways :

use mobaXterm, find the file at the left panel, right click on the file, open it with your web browser.
Or copy html file into your directory ~/save/public_html and visit page https://web-genobioinfo.toulouse.inrae.fr/~username

Here is how to configure you public_html directory

mkdir ~/save/public_html
ln -s ~/save/public_html ~/
chmod 711 /home/$USER 
chmod 711 ~/save/public_html
chmod 755 /save/users/$USER

Default permissions for public_html folder are drwxr-xr-x: everyone can read and access contents (upload for exemple).

To remove read access to the directory base: chmod o-r .

To make file or folder world readable: chmod o+r filename or foldername.

Nextflow log¶

We run several workflows in the current directory, the command nextflow log give an overview of each execution.

nextflow help log
Print executions log and runtime info
Usage: log [options] Run name or session id
  Options:
    -after
       Show log entries for runs executed after the specified one
    -before
       Show log entries for runs executed before the specified one
    -but
       Show log entries of all runs except the specified one
    -f, -fields
       Comma separated list of fields to include in the printed log -- Use the
       `-l` option to show the list of available fields
    -F, -filter
       Filter log entries by a custom expression e.g. process =~ /foo.*/ &&
       status == 'COMPLETED'
    -h, -help
       Print the command usage
       Default: false
    -l, -list-fields
       Show all available fields
       Default: false
    -q, -quiet
       Show only run names
       Default: false
    -s
       Character used to separate column values
       Default: \t
    -t, -template
       Text template used to each record in the log

Launch the following command

nextflow log

TIMESTAMP           DURATION    RUN NAME            STATUS  REVISION ID SESSION ID                              COMMAND                                                                   
2024-06-11 11:42:32 5.3s        trusting_bose       OK      ddf5f40139  04fc2849-5460-4f6f-977c-041214494716    nextflow run tutorial.nf                                                  
2024-06-11 11:43:38 5.7s        determined_faraday  OK      ddf5f40139  a6331287-405b-4e5b-84d2-05ce6bfc55e1    nextflow run tutorial.nf --greeting 'mon texte a mettre en majuscule'     
2024-06-11 11:45:36 3.9s        special_keller      OK      7588c46ffe  8084caca-bb17-4c20-b06e-ac074f87a7ba    nextflow run nextflow-io/hello                                            
2024-06-11 12:48:38 11.4s       insane_engelbart    OK      7588c46ffe  3de038ea-87dd-4e08-8a8b-6198c820a612    nextflow run nextflow-io/hello                                            
2024-06-11 12:56:36 32.8s       focused_blackwell   OK      ddf5f40139  dd17aba3-b8e6-4e39-8a4e-e1e19d21dd57    nextflow run tutorial.nf -with-timeline -with-trace -with-report -with-dag

Get log on a particular run ... for example focused_blackwell

nextflow log focused_blackwell

Information a not realy releavant, so to improve the log information , show the list of available fields.

nextflow log -l dreamy_mahavira 
attempt
  complete
  container
  cpu_model
  cpus
  disk
  duration
  env
  error_action
  exit
  hash
  hostname
  inv_ctxt
  log
  memory
  module
  name
  native_id
  pcpu
  peak_rss
  peak_vmem
  pmem
  process
  queue
  rchar
  read_bytes
  realtime
  rss
  scratch
  script
  start
  status
  stderr
  stdout
  submit
  syscr
  syscw
  tag
  task_id
  time
  vmem
  vol_ctxt
  wchar
  workdir
  write_bytes

You can retrieve all the field definition on this page

Try with the following options:

nextflow log -f task_id,hash,name,status,exit,duration,realtime,pcpu,pmem [RUN NAME]

1   ea/3fc18c   SPLITLETTERS (1)    COMPLETED   0   19.3s   102ms   34.9%   0.0%
2   5f/c88394   CONVERTTOUPPER (1)  COMPLETED   0   4.7s    101ms   43.6%   0.0%
3   c2/8270e1   CONVERTTOUPPER (2)  COMPLETED   0   4.6s    86ms    30.6%   0.0%

Key Points

Here we had an overview on many options of nextflow :

nextflow info on a workflow
run a workflow with nextflow run tutorial.nf
Change parameters with --OPTION
use file nextflow.config to define executor
generate reports : nextflow run tutorial.nf -with-timeline -with-trace -with-report -with-dag
get detailed log with : nextflow log -f task_id,hash,name,status,exit,duration,realtime,pcpu,pmem
nextflow options are prefixed by only one -
workflow parameters are prefixed by --

Run Nextflow¶

Nextflow usage¶

Run a workflow¶

Run from a file *.nf¶

Change parameter value¶

Resume a workflow¶

Run from a repository¶

Use slurm¶

Nextflow run options¶

Nextflow log¶

Run from a file `*.nf`¶