TP 1.1: Prepare data
Goal: Refresh your mind about linux commands.
Prepare¶
Connect to cluster¶
Start your machine and open a terminal (please use mobaXterm for window). You can now try to access the genotoul server by using ssh
.
ssh -X <username>@genobioinfo.toulouse.inrae.fr
Don't forget to replace <username>
with your own username.
Create project¶
Question
In the work
directory, create new a directory named cluster
and go inside it.
Solution
cd work/
mkdir cluster
cd cluster/
Get data¶
Question
Download the transcript file from https://web-genobioinfo.toulouse.inrae.fr/~formation/cluster/data/contigs.fasta.gz
Solution
wget http://web-genobioinfo.toulouse.inrae.fr/~formation/cluster/data/contigs.fasta.gz
Uncompress files¶
Question
Un-compress the file.
Solution
gunzip contigs.fasta.gz
Note
Manipulating files (compress, zip, ...) can use a lot of resources, it's necessary to perform it on a cluster node when possible. We will learn how to connect to a node in next practices
Look at data¶
Question
Display the ten first lines of contigs.fasta
file, then the twenty first lines.
Which is the format file ?
Which is the kind of data ?
Solution
The commands:
-
The ten first lines:
head contigs.fasta
-
The twenty first lines
head -n 20 contigs.fasta
-
The file format:
file contigs.fasta # (1)!
- Will return
contigs.fasta: ASCII text
- Will return
The file contigs.fasta
is a fasta file. It is a text file that contains some blocks of data. Each block begins with a >
followed by a description of the data (all in a single line). The lines immediately following the description line are the sequence data. It could be nucleic or proteic.
Here contigs.fasta
is a nucleic file.