Chap III. Tutorial nf-core for Genotoul / Correction
Exercise 8
- Load latest version of nf-core module
search_module nfcore module load bioinfo/nfcore-Nextflow-v20.11.0-edge
- List all pipelines
nf-core list
- List content of directory
~/.nextflow/assets/nf-core/
ls -altr ~/.nextflow/assets/nf-core/
- Fetch one of the pipelines using
nextflow pull nf-core/PIPELINE
nextflow pull nf-core/methylseq
List content of directory
~/.nextflow/assets/nf-core/
ls -altr ~/.nextflow/assets/nf-core/
New directory is created with methylseq pipeline.
Use
nf-core list
to see if the pipeline you pulled is up to datenf-core list
- Get info on this pipeline with
nextflow info
commandnextflow info nf-core/methylseq
Exercise 9:
Remove the file
nextflow.config
in your current directory.rm nextflow.config
Run the
nf-core/rnaseq
pipeline with profilesgenotoul
andtest
and follow the execution on cluster withsqueue -u USERNAME
in another terminal.
$ nextflow run nf-core/rnaseq -r 3.0 -profile test,genotoul
N E X T F L O W ~ version 20.11.0-edge
Launching `nf-core/rnaseq` [lonely_euler] - revision: 3643a94411 [3.0]
------------------------------------------------------
,--./,-.
___ __ __ __ ___ /,-._.--~'
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
nf-core/rnaseq v3.0
------------------------------------------------------
Core Nextflow options
revision : 3.0
runName : lonely_euler
containerEngine : singularity
launchDir : /work/cnoirot/nextflow_tutorial
workDir : /work/cnoirot/nextflow_tutorial/work
projectDir : /home/cnoirot/.nextflow/assets/nf-core/rnaseq
userName : cnoirot
profile : test,genotoul
configFiles : /home/cnoirot/.nextflow/assets/nf-core/rnaseq/nextflow.config
Input/output options
input : https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/samplesheet/samplesheet.csv
UMI options
umitools_bc_pattern : NNNN
Reference genome options
fasta : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genome.fa
gtf : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genes.gtf.gz
gff : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genes.gff.gz
transcript_fasta : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/transcriptome.fasta
additional_fasta : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/gfp.fa.gz
star_index : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/star.tar.gz
hisat2_index : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/hisat2.tar.gz
rsem_index : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/rsem.tar.gz
salmon_index : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/salmon.tar.gz
save_reference : true
igenomes_ignore : true
Alignment options
pseudo_aligner : salmon
Institutional config options
config_profile_description: Minimal test dataset to check pipeline function
config_profile_contact : support.bioinfo.genotoul@inra.fr
config_profile_url : http://bioinfo.genotoul.fr/
Max job request options
max_cpus : 2
max_memory : 6 GB
max_time : 2d
------------------------------------------------------
If you use nf-core/rnaseq for your analysis please cite:
* The pipeline
https://doi.org/10.5281/zenodo.1400710
* The nf-core framework
https://dx.doi.org/10.1038/s41587-020-0439-x
https://rdcu.be/b1GjZ
* Software dependencies
https://github.com/nf-core/rnaseq/blob/master/CITATIONS.md
------------------------------------------------------
WARN: =============================================================================
Both '--gtf' and '--gff' parameters have been provided.
Using GTF file as priority.
===================================================================================
[- ] process > RNASEQ:PREPARE_GENOME:GUNZIP_GTF -
[- ] process > RNASEQ:PREPARE_GENOME:GUNZIP_ADDITIONAL_FASTA -
[- ] process > RNASEQ:PREPARE_GENOME:CAT_ADDITIONAL_FASTA -
[- ] process > RNASEQ:PREPARE_GENOME:GTF2BED -
[- ] process > RNASEQ:PREPARE_GENOME:GET_CHROM_SIZES -
[- ] process > RNASEQ:PREPARE_GENOME:UNTAR_STAR_INDEX -
executor > slurm (3)
[e2/c1a158] process > RNASEQ:PREPARE_GENOME:GUNZIP_GTF (genes.gtf.gz) [ 0%] 0 of 1
[- ] process > RNASEQ:PREPARE_GENOME:GUNZIP_ADDITIONAL_FASTA [ 0%] 0 of 1
[- ] process > RNASEQ:PREPARE_GENOME:CAT_ADDITIONAL_FASTA -
[- ] process > RNASEQ:PREPARE_GENOME:GTF2BED -
[- ] process > RNASEQ:PREPARE_GENOME:GET_CHROM_SIZES -
[- ] process > RNASEQ:PREPARE_GENOME:UNTAR_STAR_INDEX
- List the options for rnaseq pipeline. Note that all options of all pipeline are detailled in each web page of each workflow.
$ nextflow run nf-core/rnaseq -r 3.0 --help
Copy directory
/usr/local/bioinfo/src/NextflowWorkflows/example_on_cluster/data
in current directory nextflow-tutorialcp -r /usr/local/bioinfo/src/NextflowWorkflows/example_on_cluster/data .
Create a samplesheet with following lines
samples.csv
:group,replicate,fastq_1,fastq_2,strandedness mutant,1,data/MT_rep1_1_Ch6.fastq.gz,data/MT_rep1_2_Ch6.fastq.gz,unstranded wild,1,data/WT_rep1_1_Ch6.fastq.gz,data/WT_rep1_2_Ch6.fastq.gz,unstranded
Run the pipeline with following data and parameters:
--fasta data/ITAG2.3_genomic_Ch6.fasta
--gtf data/ITAG2.3_genomic_Ch6.gtf
--input samples.csv
--outdir ResultsItag
- aligner star & rsem
- profile genotoul
- revision 3.0
nextflow run nf-core/rnaseq -r 3.0 -profile genotoul --input samples.csv --fasta data/ITAG2.3_genomic_Ch6.fasta --gtf data/ITAG2.3_genomic_Ch6.gtf --aligner star_rsem
- Monitor job execution
squeue -u USERNAME
. On which queue the jobs are launched ?squeue -u USERNAME
- Copy ResultsItag directory into
~/public_html
and visit http://genoweb.toulouse.inra.fr/~USERNAME/cp -r ResultsItag ~/public_html
Does the memory was sufficient ? Which process took the longest ? (see execution_report.html) Which job has consumed the most memory ? Yes, memory was sufficient for all jobs, the
rsem_calculateexpression
was the longest job and picard_markduplicate consumed the most memory.Visit the nf-core slack and check if a channel of the workflow you are interested in is open, check last discussions. What was the last discussion about?
Exercise 10:
We are going to use the methylseq pipeline for this exercices.
- Read the doc about genomes in methylseq pipelines : https://nf-co.re/methylseq/1.5/usage#reference-genomes
Create the
nextflow.config
file with only fasta and gtf fileparams { genomes { 'Itag' { fasta = './data/ITAG2.3_genomic_Ch6.fasta' fasta_index = './ResultsItag/genome/ITAG2.3_genomic_Ch6.fasta.fai' } } }
Try to run workflow nf-core/methylseq with following options:
--genome Itag
--reads 'data/*_{1,2}_Ch6.fastq.gz'
--outdir ResultsMeth
-profile genotoul
nextflow run nf-core/methylseq --genome Itag --reads 'data/*_{1,2}_Ch6.fastq.gz' --outdir ResultsMeth -profile genotoul
N E X T F L O W ~ version 20.11.0-edge
Launching `nf-core/methylseq` [admiring_gilbert] - revision: 4f31ed1792 [master]
WARN: Access to undefined parameter `readPaths` -- Initialise it to a default value eg. `params.readPaths = some_value`
----------------------------------------------------
,--./,-.
___ __ __ __ ___ /,-._.--~'
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
nf-core/methylseq v1.5
----------------------------------------------------
Run Name : admiring_gilbert
Reads : data/*_{1,2}_Ch6.fastq.gz
Aligner : bismark
Data Type : Paired-End
Genome : Itag
Fasta Ref : ./data/ITAG2.3_genomic_Ch6.fasta
Trimming : 5'R1: 0 / 5'R2: 0 / 3'R1: 0 / 3'R2: 0
Deduplication : Yes
Directional Mode : Yes
All C Contexts : No
Cytosine report : No
Save Intermediates: Reference genome build
Output dir : ResultsMeth
Launch dir : /work/cnoirot/nextflow_tutorial
Working dir : /work/cnoirot/nextflow_tutorial/work
Pipeline dir : /home/cnoirot/.nextflow/assets/nf-core/methylseq
User : cnoirot
Config Profile : genotoul
Container : singularity - nfcore/methylseq:1.5
Config Description: The Genotoul cluster profile
Config Contact : support.bioinfo.genotoul@inra.fr
Config URL : http://bioinfo.genotoul.fr/
Max Resources : 120 GB memory, 48 cpus, 4d time per job
Which fasta file is used as a reference ? Look at the parameters summary in console. See lines :
Genome : Itag Fasta Ref : ./data/ITAG2.3_genomic_Ch6.fasta
How could you do to re-use the bismark index generated by the pipeline ? Add a line in nextflow.config
params {
genomes {
'Itag' {
fasta = './data/ITAG2.3_genomic_Ch6.fasta'
fasta_index = './ResultsItag/genome/ITAG2.3_genomic_Ch6.fasta.fai'
bismark = './ResultsMeth/reference_genome/BismarkIndex/'
}
}
}
Or in command line add --bismark_index './ResultsMeth/reference_genome/BismarkIndex/'
- Copy the output directory
ResultsMeth
onto your ~/public_html and visualise results at http://genoweb.toulouse.inra.fr/~USERNAME/
Exercise 11:
We are going to resume Methylseq workflow after donwgrading memory available for bismark step, and removing the results.
- Check the files
ResultsMeth/pipeline_info/execution_report.html
of your last execution of methylseq - Which value of memory could be set for bismark ? 1GB
Edit
~/.nextflow/assets/nf-core/methylseq/conf/base.config
, and set memory of bismark_align to 800.MBwithName:bismark_align { cpus = { check_max( 12 * task.attempt, 'cpus') } memory = { check_max( 800.MB * task.attempt, 'memory') } time = { check_max( 8.d * task.attempt, 'time') } }
Delete job directory of the previously completed bismark_align process of MT_rep1 (use the file
ResultsMeth/pipeline_info/execution_trace.txt
to find the working path): ``` $ cut -f 2,4,5 ResultsMeth/pipeline_info/execution_trace.txt
hash name status
35/1a236f makeBismarkIndex (1) CACHED
...
64/8e430a bismark_align (MT_rep1) COMPLETED
a2/ed8201 get_software_versions COMPLETED
...
``
Complete path for this job is
./work/64/8e430ac54a4eebb11f81e34aceb154/, path start with key
64/8e430a`, use tabulation to get complete path name.
- rerun methylseq workflow with option
-resume