Skip to content

TP 1.1: Prepare data

Goal: Refresh your mind about linux commands.

Prepare

Connect to cluster

Start your machine and open a terminal (please use mobaXterm for window). You can now try to access the genotoul server by using ssh.

ssh -X <username>@genobioinfo.toulouse.inrae.fr

Don't forget to replace <username> with your own username.

Create project

Question

In the work directory, create new a directory named cluster and go inside it.

Solution
cd work/
mkdir cluster
cd cluster/

Get data

Solution
wget http://web-genobioinfo.toulouse.inrae.fr/~formation/cluster/data/contigs.fasta.gz

Uncompress files

Question

Un-compress the file.

Solution
gunzip contigs.fasta.gz

Note

Manipulating files (compress, zip, ...) can use a lot of resources, it's necessary to perform it on a cluster node when possible. We will learn how to connect to a node in next practices

Look at data

Question

Display the ten first lines of contigs.fasta file, then the twenty first lines.

Which is the format file ?

Which is the kind of data ?

Solution

The commands:

  • The ten first lines:

    head contigs.fasta
    
  • The twenty first lines

    head -n 20 contigs.fasta
    
  • The file format:

    file contigs.fasta # (1)!
    
    1. Will return contigs.fasta: ASCII text

The file contigs.fasta is a fasta file. It is a text file that contains some blocks of data. Each block begins with a > followed by a description of the data (all in a single line). The lines immediately following the description line are the sequence data. It could be nucleic or proteic.

Here contigs.fasta is a nucleic file.