Run Nextflow¶
Nextflow usage¶
The Nextflow
command has lot of sub-command to handle workflows. A workflow in Nextflow
is called 'project'.
Usage: nextflow [options] COMMAND [arg...]
Options:
-C
Use the specified configuration file(s) overriding any defaults
-D
Set JVM properties
-bg
Execute nextflow in background
-c, -config
Add the specified file to configuration set
-config-ignore-includes
Disable the parsing of config includes
-h
Print this help
-log
Set nextflow log file path
-q, -quiet
Do not print information messages
-remote-debug
Enable JVM interactive remote debugging (experimental)
-syslog
Send logs to syslog server (eg. localhost:514)
-trace
Enable trace level logging for the specified package name - multiple packages can be provided separating them with a comma e.g. '-trace nextflow,io.seqera'
-v, -version
Print the program version
Commands:
clean Clean up project cache and work directories
clone Clone a project into a folder
config Print a project configuration
console Launch Nextflow interactive console
drop Delete the local copy of a project
help Print the usage help for a command
info Print project and system runtime information
inspect Inspect process settings in a pipeline project
kuberun Execute a workflow in a Kubernetes cluster (experimental)
list List all downloaded projects
log Print executions log and runtime info
plugin Execute plugin-specific commands
plugins Execute plugin-specific commands
pull Download or update a project
run Execute a pipeline project
secrets Manage pipeline secrets (preview)
self-update Update nextflow runtime to the latest available version
view View project script file(s)
To get help on a particular Nextflow subcommand :
nextflow help COMMAND
Question
Launch the sub-commands : list
, info
Run a workflow¶
To run a workflow you can either use :
- a nextflow file (
.nf
) - a project name (from a repository)
- an url repository
In this part we will see how to run from a file or a project name, but also how to change parameter value, to resume a workflow, to use a scheduler.
Here is the usage of the command run.
$ nextflow run -help
Execute a pipeline project
Usage: run [options] Project name or repository url
Options:
-E
Exports all current system environment
Default: false
[...]
-without-docker
Disable process execution with Docker
Default: false
-without-podman
Disable process execution in a Podman container
-w, -work-dir
Directory where intermediate result files are stored
Info
All the options associate to nextflow are prefixed by only one '-'
Run from a file *.nf
¶
Here we are going to execute a workflow defined in a file.
Download the file with the following command :
wget https://genotoul-bioinfo.pages.mia.inra.fr/use-nextflow-nfcore-course/nextflow/tutorial.nf
more tutorial.nf
The workflow contains 2 main steps (called process), the first process splits a string into 6-character chunks, writing each one to a file with the prefix chunk_, and the second receives these files and transforms their contents to uppercase letters.
nextflow run tutorial.nf
It will output something similar to the text shown below:
Nextflow 24.04.3 is available - Please consider updating your version to it
N E X T F L O W ~ version 24.04.2
Launching `tutorial.nf` [disturbed_gauss] DSL2 - revision: ddf5f40139
executor > local (3)
[e7/441d9c] process > SPLITLETTERS (1) [100%] 1 of 1 ✔
[bd/38959b] process > CONVERTTOUPPER (1) [100%] 2 of 2 ✔
WORLD!
HELLO
It's worth noting that the process convertToUpper is executed in parallel, so there's no guarantee that the instance processing the first split (the chunk Hello) will be executed before the one processing the second split (the chunk world!).
Thus, it is perfectly possible that you will get the final result printed out in a different order:
WORLD!
HELLO
What does this create ?
- Create a work directory which contains temporary files
- .nextflow directory which contains cache of execution
- .nextflow.log: log of the last execution
$ ls -altr
total 18
-rw-r--r-- 1 pervenche formation 372 17 janv. 17:04 tutorial.nf
drwx--x--x 5 pervenche formation 8192 20 janv. 14:29 ..
drwxr-xr-x 4 pervenche formation 4096 20 janv. 15:08 .
drwxr-xr-x 5 pervenche formation 4096 20 janv. 15:08 work
drwxr-xr-x 3 pervenche formation 4096 20 janv. 15:08 .nextflow
-rw-r--r-- 1 pervenche formation 5306 20 janv. 15:08 .nextflow.log
Change parameter value¶
This workflow has one parameter named greeting
.
See help with command :
nextflow run tutorial.nf --help
To change the default value use --greeting
in command line.
nextflow run tutorial.nf --greeting "mon texte a mettre en majuscule"
Info
The parameter of the workflow is prefixed by two dash: '--'.
It will output something similar to the text shown below:
Nextflow 24.04.2 is available - Please consider updating your version to it
N E X T F L O W ~ version 23.10.0
Launching `tutorial.nf` [fervent_dijkstra] DSL2 - revision: cf991824f7
executor > local (7)
[a1/868e57] process > SPLITLETTERS (1) [100%] 1 of 1 ✔
[e7/6f64ab] process > CONVERTTOUPPER (5) [100%] 6 of 6 ✔
XTE A
E
METTRE
MON TE
EN MA
JUSCUL
Resume a workflow¶
Nextflow keeps track of all the processes executed in your pipeline.
With -resume
option the execution of the processes that are not changed will
be skipped and the cached result used instead.
nextflow run tutorial.nf --greeting "mon texte a mettre en majuscule" -resume
Nextflow 24.04.2 is available - Please consider updating your version to it
N E X T F L O W ~ version 23.10.0
Launching `tutorial.nf` [focused_swirles] DSL2 - revision: cf991824f7
[a1/868e57] process > SPLITLETTERS (1) [100%] 1 of 1, cached: 1 ✔
[e7/6f64ab] process > CONVERTTOUPPER (5) [100%] 6 of 6, cached: 6 ✔
EN MA
METTRE
MON TE
XTE A
E
JUSCUL
Warning
- nextflow options are prefixed by only one
-
- workflow parameters are prefixed by
--
Run from a repository¶
While Nextflow run a pipeline, if the pipeline is not locally available, it is downloaded from a BitBucket, GitHub, and GitLab repositories, more info here.
nextflow run nextflow-io/hello
Nextflow 24.04.2 is available - Please consider updating your version to it
N E X T F L O W ~ version 23.10.0
Pulling nextflow-io/hello ...
downloaded from https://github.com/nextflow-io/hello.git
Launching `https://github.com/nextflow-io/hello` [hopeful_edison] DSL2 - revision: 7588c46ffe [master]
executor > local (4)
[f0/6c0524] process > sayHello (3) [100%] 4 of 4 ✔
Ciao world!
Hola world!
Bonjour world!
Hello world!
Where does the workflow is downloaded ?
If you don't find try nextflow info nextflow-io/hello
Solution
Check local path
given by the command nextflow info nextflow-io/hello
:
/home/$USER/.nextflow/assets/nextflow-io/hello
Use slurm¶
Nextflow is designed to work on many executors such as SGE, SLURM, ... or even on clouds such as Kubernates, Amazon, ...
On Genotoul, we have the SLURM batch scheduler. To enable it, create a file named nextflow.config
in current directory and write the following lines:
process.executor = 'slurm'
Run the workflow
nextflow run nextflow-io/hello
Where does the processes are run ?
You should see the line executor > slurm (4)
Nextflow run options¶
Nextflow run command has a lot of options, here are the main useful options :
Configuration
-profile
Choose a configuration profile pipelines can provide several profiles. With this option you can overload parameters. (see next paragraph)[#nextflow-config]
Execution
-resume
Execute the script using the cached results, useful to continue executions that were stopped by an error-w, -work-dir
Directory where intermediate result files are stored
Trace
-
-with-dag
Create pipeline DAG file -
-with-report
Create processes execution html report -> really useful to get execution report on memory and cpu usage in order to calibrate pipeline parameters -
-with-timeline
Create processes execution timeline file -
-with-trace
Create processes execution tracing file
Dependancies
-
-with-conda
Use the specified Conda environment package or file (must end with .yml|.yaml suffix) -
-with-docker
Enable process execution in a Docker container -
-with-singularity
Enable process execution in a Singularity container. -
-without-docker
Disable process execution with Docker Default: false
Workflow version
-
-latest
Pull latest changes before run Default: false -
-r, -revision
Revision of the project to run (either a git branch, tag or commit SHA number)
Question
Execute the following command with options and explore the new generated files.
nextflow run tutorial.nf -with-timeline -with-trace -with-report -with-dag
- trace.txt
- report.html
- timeline.html
- dag.dot
Here is the content of trace.txt
more trace-20240603-70106879.txt
task_id hash native_id name status exit submit duration realtime %cpu peak_rss peak_vmem rchar wchar
1 a8/ce12e7 8239525 SPLITLETTERS (1) COMPLETED 0 2024-06-03 19:28:27.624 4.8s 77ms 22.7% 3.3 MB 12.7 MB 609.1 KB 2 KB
2 78/b0115e 8239526 CONVERTTOUPPER (1) COMPLETED 0 2024-06-03 19:28:32.511 4.9s 71ms 37.0% 3.2 MB 12.7 MB 614.3 KB 2 KB
3 4a/b4d152 8239527 CONVERTTOUPPER (2) COMPLETED 0 2024-06-03 19:28:32.536 29.9s 84ms 30.6% 3.4 MB 12.7 MB 615.4 KB 2 KB
Question
Go to the section trace and find the meaning of each column
Note
If you want to always have a trace file you can enable it in nextflow.config
with following lines:
trace {
enabled = true
file = 'pipeline_trace.txt'
fields = 'task_id,hash,name,status,exit,duration,realtime,%cpu,%mem,rss'
}
nextflow run tutorial.nf
more pipeline_trace.txt
task_id hash name status exit duration realtime %cpu %mem rss
1 b4/80e053 SPLITLETTERS (1) COMPLETED 0 4.8s 76ms 33.7% 0.0% 3.3 MB
3 44/17974f CONVERTTOUPPER (2) COMPLETED 0 4.8s 71ms 23.8% 0.0% 3.4 MB
2 9a/087528 CONVERTTOUPPER (1) COMPLETED 0 4.8s 93ms 28.6% 0.0% 0
Warning
In nf-core, thoses reports are always created in directory pipeline_info
To view the others html files, you have several ways :
-
use mobaXterm, find the file at the left panel, right click on the file, open it with your web browser.
-
Or copy html file into your directory
~/save/public_html
and visit pagehttps://web-genobioinfo.toulouse.inrae.fr/~username
Here is how to configure you public_html
directory
mkdir ~/save/public_html
ln -s ~/save/public_html ~/
chmod 711 /home/$USER
chmod 711 ~/save/public_html
chmod 755 /save/users/$USER
Default permissions for public_html
folder are drwxr-xr-x
:
everyone can read and access contents (upload for exemple).
To remove read access to the directory base: chmod o-r
.
To make file or folder world readable: chmod o+r filename or foldername
.
Nextflow log¶
We run several workflows in the current directory, the command nextflow log
give an overview of each execution.
nextflow help log
Print executions log and runtime info
Usage: log [options] Run name or session id
Options:
-after
Show log entries for runs executed after the specified one
-before
Show log entries for runs executed before the specified one
-but
Show log entries of all runs except the specified one
-f, -fields
Comma separated list of fields to include in the printed log -- Use the
`-l` option to show the list of available fields
-F, -filter
Filter log entries by a custom expression e.g. process =~ /foo.*/ &&
status == 'COMPLETED'
-h, -help
Print the command usage
Default: false
-l, -list-fields
Show all available fields
Default: false
-q, -quiet
Show only run names
Default: false
-s
Character used to separate column values
Default: \t
-t, -template
Text template used to each record in the log
Launch the following command
nextflow log
TIMESTAMP DURATION RUN NAME STATUS REVISION ID SESSION ID COMMAND
2024-06-11 11:42:32 5.3s trusting_bose OK ddf5f40139 04fc2849-5460-4f6f-977c-041214494716 nextflow run tutorial.nf
2024-06-11 11:43:38 5.7s determined_faraday OK ddf5f40139 a6331287-405b-4e5b-84d2-05ce6bfc55e1 nextflow run tutorial.nf --greeting 'mon texte a mettre en majuscule'
2024-06-11 11:45:36 3.9s special_keller OK 7588c46ffe 8084caca-bb17-4c20-b06e-ac074f87a7ba nextflow run nextflow-io/hello
2024-06-11 12:48:38 11.4s insane_engelbart OK 7588c46ffe 3de038ea-87dd-4e08-8a8b-6198c820a612 nextflow run nextflow-io/hello
2024-06-11 12:56:36 32.8s focused_blackwell OK ddf5f40139 dd17aba3-b8e6-4e39-8a4e-e1e19d21dd57 nextflow run tutorial.nf -with-timeline -with-trace -with-report -with-dag
Get log on a particular run ... for example focused_blackwell
nextflow log focused_blackwell
Information a not realy releavant, so to improve the log information , show the list of available fields.
nextflow log -l dreamy_mahavira
attempt
complete
container
cpu_model
cpus
disk
duration
env
error_action
exit
hash
hostname
inv_ctxt
log
memory
module
name
native_id
pcpu
peak_rss
peak_vmem
pmem
process
queue
rchar
read_bytes
realtime
rss
scratch
script
start
status
stderr
stdout
submit
syscr
syscw
tag
task_id
time
vmem
vol_ctxt
wchar
workdir
write_bytes
You can retrieve all the field definition on this page
Try with the following options:
nextflow log -f task_id,hash,name,status,exit,duration,realtime,pcpu,pmem [RUN NAME]
1 ea/3fc18c SPLITLETTERS (1) COMPLETED 0 19.3s 102ms 34.9% 0.0%
2 5f/c88394 CONVERTTOUPPER (1) COMPLETED 0 4.7s 101ms 43.6% 0.0%
3 c2/8270e1 CONVERTTOUPPER (2) COMPLETED 0 4.6s 86ms 30.6% 0.0%
Key Points
Here we had an overview on many options of nextflow :
nextflow info
on a workflow- run a workflow with
nextflow run tutorial.nf
- Change parameters with --OPTION
- use file
nextflow.config
to define executor - generate reports :
nextflow run tutorial.nf -with-timeline -with-trace -with-report -with-dag
- get detailed log with :
nextflow log -f task_id,hash,name,status,exit,duration,realtime,pcpu,pmem
- nextflow options are prefixed by only one -
- workflow parameters are prefixed by --