Handling genome
In nf-core genome can be provided througth iGenome,
this system is not available on genotoul, you can still supply reference genome paths on
the command line via the pipeline's parameters e.g. --fasta
or --gtf
.
Otherwise you can improve your own configuration nextflow.config and store your favorite genomes, as shown here:
params {
genomes {
'GENOME_NAME' {
fasta = '/usr/local/bioinfo/src/NextflowWorkflows/example_on_cluster/data/ITAG2.3_genomic_Ch6.fasta'
gtf = '/usr/local/bioinfo/src/NextflowWorkflows/example_on_cluster/data/ITAG2.3_genomic_Ch6.gtf'
}
'OTHER-GENOME' {
// [..]
}
}
// Optional - default genome. Ignored if --genome 'OTHER-GENOME' specified on command line
genome = 'YOUR-ID'
}
Then you will be able to use the pipeline's parameters --genome GENOME_NAME
How to re-use indexes generated by nf-core?
When you are running a nf-core pipeline without providing indexes, the pipeline will compute it. To avoid to compute it each time you are running the pipeline, you can re-use the results of results/genome
folder.
Use command ̀nextflow config
, to find which parameters are expected to generate your genome.config
file
Here is the igenomes config of GRCh37:
genomes {
GRCh37 {
fasta = 's3://ngi-igenomes/igenomes//Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa'
bwa = 's3://ngi-igenomes/igenomes//Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa'
bowtie2 = 's3://ngi-igenomes/igenomes//Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/'
star = 's3://ngi-igenomes/igenomes//Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/'
bismark = 's3://ngi-igenomes/igenomes//Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/'
gtf = 's3://ngi-igenomes/igenomes//Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf'
bed12 = 's3://ngi-igenomes/igenomes//Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed'
readme = 's3://ngi-igenomes/igenomes//Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt'
mito_name = 'MT'
macs_gsize = '2.7e9'
blacklist = '/home/cnoirot/.nextflow/assets/nf-core/rnaseq/assets/blacklists/GRCh37-blacklist.bed'
}
}
By looking at https://nf-co.re/rnaseq/3.0/parameters#reference-genome-options and listing the files of genome output dir you can generate your genome file such as :
params {
genomes {
'Itag' {
fasta = './data/ITAG2.3_genomic_Ch6.fasta'
gtf = './data/ITAG2.3_genomic_Ch6.gtf'
fai = './ResultsItag/genome/ITAG2.3_genomic_Ch6.fasta.fai'
bed12 = './ResultsItag/genome/ITAG2.3_genomic_Ch6.bed'
}
}
}
Actually, rsem indexes are not reused with rnaseq pipeline issue #568
Exercise 10:
We are going to use the methylseq pipeline for this exercices.
- Read the doc about genomes in methylseq pipelines : https://nf-co.re/methylseq/1.5/usage#reference-genomes
Create the
nextflow.config
file with only fasta and gtf fileparams { genomes { 'Itag' { fasta = './data/ITAG2.3_genomic_Ch6.fasta' fasta_index = './ResultsItag/genome/ITAG2.3_genomic_Ch6.fasta.fai' } } }
Try to run workflow nf-core/methylseq with following options:
--genome Itag
--reads 'data/*_{1,2}_Ch6.fastq.gz'
--outdir ResultsMeth
-profile genotoul
Which fasta file is used as a reference ? Look at the parameters summary in console.
How could you do to re-use the bismark index generated by the pipeline ? Do it, and relaunch the pipeline. Does the process 'makeBismarkIndex' is done ?
- Copy the output directory
ResultsMeth
onto your ~/public_html and visualise results at http://genoweb.toulouse.inra.fr/~USERNAME/