Skip to content

TP 1.2: Single execution on a cluster

Goal: Identify genes of a transcript fasta file thanks to the alignment software blast (NCBI) by using cluster compute nodes.

Simple submission command

We will use sequence alignement with NCBI_Blast+ as a use case.

Interactive job

Question

Connect to a node in interactive mode.

Warning

When you connect on a cluster node in interactive mode you are systematically placed in your home directory

Grumpy administrator

Never run a calculation on a login node! Use an interactive job or a batch job.

Solution
srun --pty bash

Prerequisite

Load the NCBI_Blast+ module

module load bioinfo/NCBI_Blast+/2.10.0+

Run blast

Question

Launch a blast against ensembl_danio_rerio_pep databank in interactive mode on the cluster.

Your query is nucleic, your databank is proteic so you need to use the blastx program.

Tip

For more help on blast, type

blastx -help

Solution

blastx -query contigs.fasta -db ensembl_danio_rerio_pep -evalue 10e-10 -out contigs.blastx_dr
On genobioinfo cluster, NCBI blast databanks are available in /bank/blastdb, however the cluster was configured in a way that you don't need to specify the path.

Look for running jobs

Question

Open a new terminal and check all the jobs running or waiting on the cluster. Check your own job.

Solution
squeue
squeue -t R
squeue -t PD
squeue -u <username>

Question

On which node are you running ?

Solution
squeue -u mtrotard

  JOBID    PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
1232823        workq     bash     user  R       0:05      1 node129       

Stop a running job

Question

Kill your job.

Solution
scancel 1232823

Batch mode

Question

Use a text editor to create a command file blastn.sh with the same module load and almost the same blast command line (replace blastx with blastn). The first line of the file must be :

1
#!/bin/sh

Launch it in batch mode.

Solution

File blastn.sh contains:

blastn.sh
1
2
3
#!/bin/sh
module load bioinfo/NCBI_Blast+/2.10.0+
blastn -db ensembl_danio_rerio_cdna -query contigs.fasta -evalue 10e-10 -out contigs.blastn_dr

Launch it with :

sbatch blastn.sh

Check running job

Question

Check the execution. When it's over, look at the blast output file and the 2 execution trace files slurm-xxxxx.out.

Has the job finished correctly ?

Solution
squeue -u <username>
less contigs.blastn_dr
less slurm-XXXXX.out

Batch mode with inline command

Question

Launch the same command without using a file ( option --wrap='command').

Check the execution.

When it's over, look at the blast output file and the execution trace file (slurm-xxxxx.out).

Has the job finished correctly ?

Solution
sbatch -J blastdr --wrap='module load bioinfo/NCBI_Blast+/2.10.0+; blastn -db ensembl_danio_rerio_cdna -query contigs.fasta -evalue 10e-10 -out contigs.blastn_dr'

Look at the trace file

If you didn't have any error until now, redo the previous submission with an error in the command. Have a look to the trace file.

How much ressources

Question

Look at the ressources used by previous jobs. In particular, pay attention to CPU and Memory usage.

Solution

seff <job_id>
where you replace the by your previous job id