Handling genome


In nf-core genome can be provided througth iGenome, this system is not available on genotoul, you can still supply reference genome paths on the command line via the pipeline's parameters e.g. --fasta or --gtf.

Otherwise you can improve your own configuration nextflow.config and store your favorite genomes, as shown here:

params {
  genomes {
    'GENOME_NAME' {
      fasta  = '/usr/local/bioinfo/src/NextflowWorkflows/example_on_cluster/data/ITAG2.3_genomic_Ch6.fasta'
      gtf = '/usr/local/bioinfo/src/NextflowWorkflows/example_on_cluster/data/ITAG2.3_genomic_Ch6.gtf'
    }
    'OTHER-GENOME' {
      // [..]
    }
  }
  // Optional - default genome. Ignored if --genome 'OTHER-GENOME' specified on command line
  genome = 'YOUR-ID'
}

Then you will be able to use the pipeline's parameters --genome GENOME_NAME

How to re-use indexes generated by nf-core?

When you are running a nf-core pipeline without providing indexes, the pipeline will compute it. To avoid to compute it each time you are running the pipeline, you can re-use the results of results/genome folder.

Use command ̀nextflow config, to find which parameters are expected to generate your genome.config file

Here is the igenomes config of GRCh37:

genomes {
      GRCh37 {
         fasta = 's3://ngi-igenomes/igenomes//Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa'
         bwa = 's3://ngi-igenomes/igenomes//Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa'
         bowtie2 = 's3://ngi-igenomes/igenomes//Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/'
         star = 's3://ngi-igenomes/igenomes//Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/'
         bismark = 's3://ngi-igenomes/igenomes//Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/'
         gtf = 's3://ngi-igenomes/igenomes//Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf'
         bed12 = 's3://ngi-igenomes/igenomes//Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed'
         readme = 's3://ngi-igenomes/igenomes//Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt'
         mito_name = 'MT'
         macs_gsize = '2.7e9'
         blacklist = '/home/cnoirot/.nextflow/assets/nf-core/rnaseq/assets/blacklists/GRCh37-blacklist.bed'
      }
 }

By looking at https://nf-co.re/rnaseq/3.0/parameters#reference-genome-options and listing the files of genome output dir you can generate your genome file such as :

 params {
   genomes {
      'Itag' {
         fasta = './data/ITAG2.3_genomic_Ch6.fasta'
         gtf = './data/ITAG2.3_genomic_Ch6.gtf'
         fai = './ResultsItag/genome/ITAG2.3_genomic_Ch6.fasta.fai'
         bed12 = './ResultsItag/genome/ITAG2.3_genomic_Ch6.bed'
      }
   }
 }

Actually, rsem indexes are not reused with rnaseq pipeline issue #568

Exercise 10:

We are going to use the methylseq pipeline for this exercices.

  • Read the doc about genomes in methylseq pipelines : https://nf-co.re/methylseq/1.5/usage#reference-genomes
  • Create the nextflow.config file with only fasta and gtf file

    params {
     genomes {
        'Itag' {
           fasta = './data/ITAG2.3_genomic_Ch6.fasta'
           fasta_index = './ResultsItag/genome/ITAG2.3_genomic_Ch6.fasta.fai'
        }
     }
    }
    
  • Try to run workflow nf-core/methylseq with following options:

    • --genome Itag
    • --reads 'data/*_{1,2}_Ch6.fastq.gz'
    • --outdir ResultsMeth
    • -profile genotoul
  • Which fasta file is used as a reference ? Look at the parameters summary in console.

  • How could you do to re-use the bismark index generated by the pipeline ? Do it, and relaunch the pipeline. Does the process 'makeBismarkIndex' is done ?

results matching ""

    No results matching ""