Chap III. Tutorial nf-core for Genotoul / Correction

Exercise 8

Load latest version of nf-core module

search_module nfcore
module load bioinfo/nfcore-Nextflow-v20.11.0-edge

List all pipelines
```
nf-core list
```
List content of directory ~/.nextflow/assets/nf-core/
```
ls -altr ~/.nextflow/assets/nf-core/
```
Fetch one of the pipelines using nextflow pull nf-core/PIPELINE
```
nextflow pull nf-core/methylseq
```
List content of directory ~/.nextflow/assets/nf-core/
```
ls -altr ~/.nextflow/assets/nf-core/
```
New directory is created with methylseq pipeline.
Use nf-core list to see if the pipeline you pulled is up to date
```
nf-core list
```
Get info on this pipeline with nextflow info command
```
nextflow info nf-core/methylseq
```

Exercise 9:

Remove the file nextflow.config in your current directory.
```
rm nextflow.config
```
Run the nf-core/rnaseq pipeline with profiles genotoul and test and follow the execution on cluster with squeue -u USERNAME in another terminal.

$ nextflow run nf-core/rnaseq -r 3.0 -profile test,genotoul
N E X T F L O W  ~  version 20.11.0-edge
Launching `nf-core/rnaseq` [lonely_euler] - revision: 3643a94411 [3.0]


------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/rnaseq v3.0
------------------------------------------------------

Core Nextflow options
    revision                  : 3.0
    runName                   : lonely_euler
    containerEngine           : singularity
    launchDir                 : /work/cnoirot/nextflow_tutorial
    workDir                   : /work/cnoirot/nextflow_tutorial/work
    projectDir                : /home/cnoirot/.nextflow/assets/nf-core/rnaseq
    userName                  : cnoirot
    profile                   : test,genotoul
    configFiles               : /home/cnoirot/.nextflow/assets/nf-core/rnaseq/nextflow.config

Input/output options
    input                     : https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/samplesheet/samplesheet.csv

UMI options
    umitools_bc_pattern       : NNNN

Reference genome options
    fasta                     : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genome.fa
    gtf                       : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genes.gtf.gz
    gff                       : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genes.gff.gz
    transcript_fasta          : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/transcriptome.fasta
    additional_fasta          : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/gfp.fa.gz
    star_index                : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/star.tar.gz
    hisat2_index              : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/hisat2.tar.gz
    rsem_index                : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/rsem.tar.gz
    salmon_index              : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/salmon.tar.gz
    save_reference            : true
    igenomes_ignore           : true

Alignment options
    pseudo_aligner            : salmon

Institutional config options
    config_profile_description: Minimal test dataset to check pipeline function
    config_profile_contact    : support.bioinfo.genotoul@inra.fr
    config_profile_url        : http://bioinfo.genotoul.fr/

Max job request options
    max_cpus                  : 2
    max_memory                : 6 GB
    max_time                  : 2d

------------------------------------------------------

If you use nf-core/rnaseq for your analysis please cite:

* The pipeline
  https://doi.org/10.5281/zenodo.1400710

* The nf-core framework
  https://dx.doi.org/10.1038/s41587-020-0439-x
  https://rdcu.be/b1GjZ

* Software dependencies
  https://github.com/nf-core/rnaseq/blob/master/CITATIONS.md

------------------------------------------------------
WARN: =============================================================================
  Both '--gtf' and '--gff' parameters have been provided.
  Using GTF file as priority.
===================================================================================
[-        ] process > RNASEQ:PREPARE_GENOME:GUNZIP_GTF                                         -
[-        ] process > RNASEQ:PREPARE_GENOME:GUNZIP_ADDITIONAL_FASTA                            -
[-        ] process > RNASEQ:PREPARE_GENOME:CAT_ADDITIONAL_FASTA                               -
[-        ] process > RNASEQ:PREPARE_GENOME:GTF2BED                                            -
[-        ] process > RNASEQ:PREPARE_GENOME:GET_CHROM_SIZES                                    -
[-        ] process > RNASEQ:PREPARE_GENOME:UNTAR_STAR_INDEX                                   -
executor >  slurm (3)
[e2/c1a158] process > RNASEQ:PREPARE_GENOME:GUNZIP_GTF (genes.gtf.gz)                          [  0%] 0 of 1
[-        ] process > RNASEQ:PREPARE_GENOME:GUNZIP_ADDITIONAL_FASTA                            [  0%] 0 of 1
[-        ] process > RNASEQ:PREPARE_GENOME:CAT_ADDITIONAL_FASTA                               -
[-        ] process > RNASEQ:PREPARE_GENOME:GTF2BED                                            -
[-        ] process > RNASEQ:PREPARE_GENOME:GET_CHROM_SIZES                                    -
[-        ] process > RNASEQ:PREPARE_GENOME:UNTAR_STAR_INDEX

List the options for rnaseq pipeline. Note that all options of all pipeline are detailled in each web page of each workflow.
```
$ nextflow run nf-core/rnaseq -r 3.0 --help
```
Copy directory /usr/local/bioinfo/src/NextflowWorkflows/example_on_cluster/data in current directory nextflow-tutorial
```
cp -r /usr/local/bioinfo/src/NextflowWorkflows/example_on_cluster/data .
```

Create a samplesheet with following lines samples.csv:

group,replicate,fastq_1,fastq_2,strandedness
mutant,1,data/MT_rep1_1_Ch6.fastq.gz,data/MT_rep1_2_Ch6.fastq.gz,unstranded
wild,1,data/WT_rep1_1_Ch6.fastq.gz,data/WT_rep1_2_Ch6.fastq.gz,unstranded

Run the pipeline with following data and parameters:
- --fasta data/ITAG2.3_genomic_Ch6.fasta
- --gtf data/ITAG2.3_genomic_Ch6.gtf
- --input samples.csv
- --outdir ResultsItag
- aligner star & rsem
- profile genotoul
- revision 3.0

nextflow run nf-core/rnaseq -r 3.0 -profile genotoul --input samples.csv --fasta data/ITAG2.3_genomic_Ch6.fasta --gtf data/ITAG2.3_genomic_Ch6.gtf --aligner star_rsem

Monitor job execution squeue -u USERNAME. On which queue the jobs are launched ?
```
squeue -u USERNAME
```
Copy ResultsItag directory into ~/public_html and visit http://genoweb.toulouse.inra.fr/~USERNAME/
```
cp -r ResultsItag ~/public_html
```
Does the memory was sufficient ? Which process took the longest ? (see execution_report.html) Which job has consumed the most memory ? Yes, memory was sufficient for all jobs, the rsem_calculateexpression was the longest job and picard_markduplicate consumed the most memory.
Visit the nf-core slack and check if a channel of the workflow you are interested in is open, check last discussions. What was the last discussion about?

Exercise 10:

We are going to use the methylseq pipeline for this exercices.

Read the doc about genomes in methylseq pipelines : https://nf-co.re/methylseq/1.5/usage#reference-genomes

Create the nextflow.config file with only fasta and gtf file

params {
 genomes {
    'Itag' {
       fasta = './data/ITAG2.3_genomic_Ch6.fasta'
       fasta_index = './ResultsItag/genome/ITAG2.3_genomic_Ch6.fasta.fai'
    }
 }
}

Try to run workflow nf-core/methylseq with following options:
- --genome Itag
- --reads 'data/*_{1,2}_Ch6.fastq.gz'
- --outdir ResultsMeth
- -profile genotoul

nextflow run nf-core/methylseq --genome Itag --reads 'data/*_{1,2}_Ch6.fastq.gz'  --outdir ResultsMeth -profile genotoul
N E X T F L O W  ~  version 20.11.0-edge
Launching `nf-core/methylseq` [admiring_gilbert] - revision: 4f31ed1792 [master]
WARN: Access to undefined parameter `readPaths` -- Initialise it to a default value eg. `params.readPaths = some_value`
----------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/methylseq v1.5
----------------------------------------------------

Run Name          : admiring_gilbert
Reads             : data/*_{1,2}_Ch6.fastq.gz
Aligner           : bismark
Data Type         : Paired-End
Genome            : Itag
Fasta Ref         : ./data/ITAG2.3_genomic_Ch6.fasta
Trimming          : 5'R1: 0 / 5'R2: 0 / 3'R1: 0 / 3'R2: 0
Deduplication     : Yes
Directional Mode  : Yes
All C Contexts    : No
Cytosine report   : No
Save Intermediates: Reference genome build
Output dir        : ResultsMeth
Launch dir        : /work/cnoirot/nextflow_tutorial
Working dir       : /work/cnoirot/nextflow_tutorial/work
Pipeline dir      : /home/cnoirot/.nextflow/assets/nf-core/methylseq
User              : cnoirot
Config Profile    : genotoul
Container         : singularity - nfcore/methylseq:1.5
Config Description: The Genotoul cluster profile
Config Contact    : support.bioinfo.genotoul@inra.fr
Config URL        : http://bioinfo.genotoul.fr/
Max Resources     : 120 GB memory, 48 cpus, 4d time per job

Which fasta file is used as a reference ? Look at the parameters summary in console. See lines :
```
Genome            : Itag
Fasta Ref         : ./data/ITAG2.3_genomic_Ch6.fasta
```
How could you do to re-use the bismark index generated by the pipeline ? Add a line in nextflow.config

params {
   genomes {
      'Itag' {
         fasta = './data/ITAG2.3_genomic_Ch6.fasta'
           fasta_index = './ResultsItag/genome/ITAG2.3_genomic_Ch6.fasta.fai'
         bismark = './ResultsMeth/reference_genome/BismarkIndex/'
      }
   }
 }

Or in command line add --bismark_index './ResultsMeth/reference_genome/BismarkIndex/'

Copy the output directory ResultsMeth onto your ~/public_html and visualise results at http://genoweb.toulouse.inra.fr/~USERNAME/

Exercise 11:

We are going to resume Methylseq workflow after donwgrading memory available for bismark step, and removing the results.

Check the files ResultsMeth/pipeline_info/execution_report.html of your last execution of methylseq
Which value of memory could be set for bismark ? 1GB

Edit ~/.nextflow/assets/nf-core/methylseq/conf/base.config, and set memory of bismark_align to 800.MB

withName:bismark_align {
  cpus = { check_max( 12 * task.attempt, 'cpus') }
  memory = { check_max( 800.MB * task.attempt, 'memory') }
  time = { check_max( 8.d * task.attempt, 'time') }
}

Delete job directory of the previously completed bismark_align process of MT_rep1 (use the file ResultsMeth/pipeline_info/execution_trace.txt to find the working path): ``` $ cut -f 2,4,5 ResultsMeth/pipeline_info/execution_trace.txt

hash name status 35/1a236f makeBismarkIndex (1) CACHED ... 64/8e430a bismark_align (MT_rep1) COMPLETED a2/ed8201 get_software_versions COMPLETED ... `` Complete path for this job is./work/64/8e430ac54a4eebb11f81e34aceb154/, path start with key64/8e430a`, use tabulation to get complete path name.

rerun methylseq workflow with option -resume

Exercises correction

Chap III. Tutorial nf-core for Genotoul / Correction

Exercise 8

Exercise 9:

Exercise 10:

Exercise 11:

results matching ""

No results matching ""