Chap III. Tutorial nf-core for Genotoul / Correction

image

Exercise 8

  • Load latest version of nf-core module
    search_module nfcore
    module load bioinfo/nfcore-Nextflow-v20.11.0-edge
    
  • List all pipelines
    nf-core list
    
  • List content of directory ~/.nextflow/assets/nf-core/
    ls -altr ~/.nextflow/assets/nf-core/
    
  • Fetch one of the pipelines using nextflow pull nf-core/PIPELINE
    nextflow pull nf-core/methylseq
    
  • List content of directory ~/.nextflow/assets/nf-core/

    ls -altr ~/.nextflow/assets/nf-core/
    

    New directory is created with methylseq pipeline.

  • Use nf-core list to see if the pipeline you pulled is up to date

    nf-core list
    
  • Get info on this pipeline with nextflow info command
    nextflow info nf-core/methylseq
    

Exercise 9:

  • Remove the file nextflow.config in your current directory.

    rm nextflow.config
    
  • Run the nf-core/rnaseq pipeline with profiles genotoul and test and follow the execution on cluster with squeue -u USERNAME in another terminal.

$ nextflow run nf-core/rnaseq -r 3.0 -profile test,genotoul
N E X T F L O W  ~  version 20.11.0-edge
Launching `nf-core/rnaseq` [lonely_euler] - revision: 3643a94411 [3.0]


------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/rnaseq v3.0
------------------------------------------------------

Core Nextflow options
    revision                  : 3.0
    runName                   : lonely_euler
    containerEngine           : singularity
    launchDir                 : /work/cnoirot/nextflow_tutorial
    workDir                   : /work/cnoirot/nextflow_tutorial/work
    projectDir                : /home/cnoirot/.nextflow/assets/nf-core/rnaseq
    userName                  : cnoirot
    profile                   : test,genotoul
    configFiles               : /home/cnoirot/.nextflow/assets/nf-core/rnaseq/nextflow.config

Input/output options
    input                     : https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/samplesheet/samplesheet.csv

UMI options
    umitools_bc_pattern       : NNNN

Reference genome options
    fasta                     : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genome.fa
    gtf                       : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genes.gtf.gz
    gff                       : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genes.gff.gz
    transcript_fasta          : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/transcriptome.fasta
    additional_fasta          : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/gfp.fa.gz
    star_index                : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/star.tar.gz
    hisat2_index              : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/hisat2.tar.gz
    rsem_index                : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/rsem.tar.gz
    salmon_index              : https://github.com/nf-core/test-datasets/raw/rnaseq/reference/salmon.tar.gz
    save_reference            : true
    igenomes_ignore           : true

Alignment options
    pseudo_aligner            : salmon

Institutional config options
    config_profile_description: Minimal test dataset to check pipeline function
    config_profile_contact    : support.bioinfo.genotoul@inra.fr
    config_profile_url        : http://bioinfo.genotoul.fr/

Max job request options
    max_cpus                  : 2
    max_memory                : 6 GB
    max_time                  : 2d

------------------------------------------------------

If you use nf-core/rnaseq for your analysis please cite:

* The pipeline
  https://doi.org/10.5281/zenodo.1400710

* The nf-core framework
  https://dx.doi.org/10.1038/s41587-020-0439-x
  https://rdcu.be/b1GjZ

* Software dependencies
  https://github.com/nf-core/rnaseq/blob/master/CITATIONS.md

------------------------------------------------------
WARN: =============================================================================
  Both '--gtf' and '--gff' parameters have been provided.
  Using GTF file as priority.
===================================================================================
[-        ] process > RNASEQ:PREPARE_GENOME:GUNZIP_GTF                                         -
[-        ] process > RNASEQ:PREPARE_GENOME:GUNZIP_ADDITIONAL_FASTA                            -
[-        ] process > RNASEQ:PREPARE_GENOME:CAT_ADDITIONAL_FASTA                               -
[-        ] process > RNASEQ:PREPARE_GENOME:GTF2BED                                            -
[-        ] process > RNASEQ:PREPARE_GENOME:GET_CHROM_SIZES                                    -
[-        ] process > RNASEQ:PREPARE_GENOME:UNTAR_STAR_INDEX                                   -
executor >  slurm (3)
[e2/c1a158] process > RNASEQ:PREPARE_GENOME:GUNZIP_GTF (genes.gtf.gz)                          [  0%] 0 of 1
[-        ] process > RNASEQ:PREPARE_GENOME:GUNZIP_ADDITIONAL_FASTA                            [  0%] 0 of 1
[-        ] process > RNASEQ:PREPARE_GENOME:CAT_ADDITIONAL_FASTA                               -
[-        ] process > RNASEQ:PREPARE_GENOME:GTF2BED                                            -
[-        ] process > RNASEQ:PREPARE_GENOME:GET_CHROM_SIZES                                    -
[-        ] process > RNASEQ:PREPARE_GENOME:UNTAR_STAR_INDEX
  • List the options for rnaseq pipeline. Note that all options of all pipeline are detailled in each web page of each workflow.
    $ nextflow run nf-core/rnaseq -r 3.0 --help
    
  • Copy directory /usr/local/bioinfo/src/NextflowWorkflows/example_on_cluster/data in current directory nextflow-tutorial

    cp -r /usr/local/bioinfo/src/NextflowWorkflows/example_on_cluster/data .
    
  • Create a samplesheet with following lines samples.csv:

    group,replicate,fastq_1,fastq_2,strandedness
    mutant,1,data/MT_rep1_1_Ch6.fastq.gz,data/MT_rep1_2_Ch6.fastq.gz,unstranded
    wild,1,data/WT_rep1_1_Ch6.fastq.gz,data/WT_rep1_2_Ch6.fastq.gz,unstranded
    
  • Run the pipeline with following data and parameters:

    • --fasta data/ITAG2.3_genomic_Ch6.fasta
    • --gtf data/ITAG2.3_genomic_Ch6.gtf
    • --input samples.csv
    • --outdir ResultsItag
    • aligner star & rsem
    • profile genotoul
    • revision 3.0
nextflow run nf-core/rnaseq -r 3.0 -profile genotoul --input samples.csv --fasta data/ITAG2.3_genomic_Ch6.fasta --gtf data/ITAG2.3_genomic_Ch6.gtf --aligner star_rsem
  • Monitor job execution squeue -u USERNAME. On which queue the jobs are launched ?
    squeue -u USERNAME
    
  • Copy ResultsItag directory into ~/public_html and visit http://genoweb.toulouse.inra.fr/~USERNAME/
    cp -r ResultsItag ~/public_html
    
  • Does the memory was sufficient ? Which process took the longest ? (see execution_report.html) Which job has consumed the most memory ? Yes, memory was sufficient for all jobs, the rsem_calculateexpression was the longest job and picard_markduplicate consumed the most memory.

  • Visit the nf-core slack and check if a channel of the workflow you are interested in is open, check last discussions. What was the last discussion about?

Exercise 10:

We are going to use the methylseq pipeline for this exercices.

  • Read the doc about genomes in methylseq pipelines : https://nf-co.re/methylseq/1.5/usage#reference-genomes
  • Create the nextflow.config file with only fasta and gtf file

    params {
     genomes {
        'Itag' {
           fasta = './data/ITAG2.3_genomic_Ch6.fasta'
           fasta_index = './ResultsItag/genome/ITAG2.3_genomic_Ch6.fasta.fai'
        }
     }
    }
    
  • Try to run workflow nf-core/methylseq with following options:

    • --genome Itag
    • --reads 'data/*_{1,2}_Ch6.fastq.gz'
    • --outdir ResultsMeth
    • -profile genotoul
nextflow run nf-core/methylseq --genome Itag --reads 'data/*_{1,2}_Ch6.fastq.gz'  --outdir ResultsMeth -profile genotoul
N E X T F L O W  ~  version 20.11.0-edge
Launching `nf-core/methylseq` [admiring_gilbert] - revision: 4f31ed1792 [master]
WARN: Access to undefined parameter `readPaths` -- Initialise it to a default value eg. `params.readPaths = some_value`
----------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/methylseq v1.5
----------------------------------------------------

Run Name          : admiring_gilbert
Reads             : data/*_{1,2}_Ch6.fastq.gz
Aligner           : bismark
Data Type         : Paired-End
Genome            : Itag
Fasta Ref         : ./data/ITAG2.3_genomic_Ch6.fasta
Trimming          : 5'R1: 0 / 5'R2: 0 / 3'R1: 0 / 3'R2: 0
Deduplication     : Yes
Directional Mode  : Yes
All C Contexts    : No
Cytosine report   : No
Save Intermediates: Reference genome build
Output dir        : ResultsMeth
Launch dir        : /work/cnoirot/nextflow_tutorial
Working dir       : /work/cnoirot/nextflow_tutorial/work
Pipeline dir      : /home/cnoirot/.nextflow/assets/nf-core/methylseq
User              : cnoirot
Config Profile    : genotoul
Container         : singularity - nfcore/methylseq:1.5
Config Description: The Genotoul cluster profile
Config Contact    : support.bioinfo.genotoul@inra.fr
Config URL        : http://bioinfo.genotoul.fr/
Max Resources     : 120 GB memory, 48 cpus, 4d time per job
  • Which fasta file is used as a reference ? Look at the parameters summary in console. See lines :

    Genome            : Itag
    Fasta Ref         : ./data/ITAG2.3_genomic_Ch6.fasta
    
  • How could you do to re-use the bismark index generated by the pipeline ? Add a line in nextflow.config

params {
   genomes {
      'Itag' {
         fasta = './data/ITAG2.3_genomic_Ch6.fasta'
           fasta_index = './ResultsItag/genome/ITAG2.3_genomic_Ch6.fasta.fai'
         bismark = './ResultsMeth/reference_genome/BismarkIndex/'
      }
   }
 }

Or in command line add --bismark_index './ResultsMeth/reference_genome/BismarkIndex/'

Exercise 11:

We are going to resume Methylseq workflow after donwgrading memory available for bismark step, and removing the results.

  • Check the files ResultsMeth/pipeline_info/execution_report.html of your last execution of methylseq
  • Which value of memory could be set for bismark ? 1GB
  • Edit ~/.nextflow/assets/nf-core/methylseq/conf/base.config, and set memory of bismark_align to 800.MB

    withName:bismark_align {
      cpus = { check_max( 12 * task.attempt, 'cpus') }
      memory = { check_max( 800.MB * task.attempt, 'memory') }
      time = { check_max( 8.d * task.attempt, 'time') }
    }
    
  • Delete job directory of the previously completed bismark_align process of MT_rep1 (use the file ResultsMeth/pipeline_info/execution_trace.txt to find the working path): ``` $ cut -f 2,4,5 ResultsMeth/pipeline_info/execution_trace.txt

hash name status 35/1a236f makeBismarkIndex (1) CACHED ... 64/8e430a bismark_align (MT_rep1) COMPLETED a2/ed8201 get_software_versions COMPLETED ... `` Complete path for this job is./work/64/8e430ac54a4eebb11f81e34aceb154/, path start with key64/8e430a`, use tabulation to get complete path name.

  • rerun methylseq workflow with option -resume

results matching ""

    No results matching ""