TP 2: Multithreading and ecological behavior
Objective: Speed up a job by using many cpu on one node. Create efficent jobs in a context of digital sobriety/ecological practices.
Going faster¶
Multithreading¶
We use the work we did in TP 1 as a script basis:
Question
Create a script file named blastx.sh
with following content:
blastx.sh | |
---|---|
1 2 3 4 5 6 7 |
|
Edit blastx.sh
in order to run blastx
with 8 cpus on the same node.
Check the execution in detail while running.
Solution
The script file:
blastx.sh | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
-
It will define the number of cpu reserved by slurm
-
Two things
- The
\
at the end of line 8 allows to split a command on many lines for readability $SLURM_CPUS_PER_TASK
is the value defined by--cpus-per-task
(line 2).
- The
The script blastx.sh
is submitted as a job with the following command:
sbatch blastx.sh
An inline version, without relying on a script file is also possible by using --wrap
option:
sbatch --cpus-per-task 8 -J blastx_dr \
--wrap="blastx -num_threads 8 -db ensembl_danio_rerio_pep -query contigs.fasta -evalue 10e-10 -out contigs.blastx_dr2"
Running script can be checked with one of the following command:
squeue
squeue -u "$(whoami)" # (1)!
$()
is called a subshell. It means 'run the command and get back the result'. Herewhoami
returns the username.
How much faster?¶
Question
When the job is ended, take a look at the ressources used. How much time and memory were consumed?
Here is an extract of the seff
command for the blastx
job on 1 cpu. What is the speedup provided by the blastx
job on 8 cpus ? Compare the memory consumption.
Job ID: ...
Cluster: genobioinfo
User/Group: ...
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 00:05:57
CPU Efficiency: 95.20% of 00:06:15 core-walltime
Job Wall-clock time: 00:06:15
Memory Utilized: 18.14 MB
Memory Efficiency: 0.89% of 2.00 GB
Tip
It is a good practice to check the resources a job has consumed
Solution
8x cpus doesn't means 8x faster (~3.6x for this example). For blast
, 4 cpu is a good tradoff (~2.7x for this example).
Digital sobriety¶
Genotoul-bioinfo provides some ressources about digital sobriety applied to bioinformatics.
Alternative tools¶
Question
Some alternatives can go faster than blast
on proteins. Create a script diamondx.sh
where blastx
is replaced with diamond
.
When the job has ended, look at the ressources used. What could you conclude regarding time and memory?
Solution
The script:
diamondx.sh | |
---|---|
1 2 3 4 5 6 7 8 9 10 |
|
Run with:
sbatch diamondx.sh
The speedup provided (x100 to x1000 faster) makes diamond
a good tool when targeting digital sobriety.
Tunning the slurm parameters¶
Question
Reduce the amount of memory used by diamond job. What appends if you reduce the amount too much?
Tip
Setting resources correctly (#cpus, memory, max time) ensures a job don't waste ressources. The side effect is that your jobs may aslo start sooner. However it requires knowledge to set them before hand.
We provide a page to help you with some tools. In addition, ask community to help you to choose the right tools and set efficient parameters.
Solution
Breakpoint is under 170 MB
sbatch --mem=150M diamondx.sh
or edit the script diamondx.sh
to keep track of parameters
diamondx.sh | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 |
|
Even with x10 memory consumption, diamond
still a good tool for digital sobriety. Some diamond
options allow to reduce memory consumption by trading off speed.