Welcome to the hands-on ETE Toolkit Phylogenomics practical course!
This document and the attached repository are intended to provide a practical guided tour covering most important features in the ETE Toolkit. For this, we use a fake biological dataset, so all examples and exercises are motivated by biological questions. You can see this course as a support material to the already available technical ETE documentation:
The course is divided into several sections organized by notebooks under the nb/
folder. Each section represents a common phylogenomic question and covers different aspects of the ETE toolkit.
All necessary data to run the exercise is in data.tar.gz
. You should decompress the file in the root directory of this repository.
IMPORTANT: Note that data used in this course is manually manipulated to ensure the expected results Do not use it for real work!
To start practicing you just need to:
- [Setup the environment](Environment setup)
After many years of work, your lab has just isolated and sequenced a very interesting strain of the Aquifex aeolicus bacterium. This strain possesses a remarkable resistance to sulfur-rich environment.
To investigate it further, you decide to address an in depth phylogenomic study where your strain is analyzed in the context of other known sulfur-related organisms and reference species.
TaxID Sci.Name
224324999 Aquifex aeolicus (your new strain. TAXID does not exist)
224324 Aquifex aeolicus VF5
263820 Picrophilus torridus DSM 9790
273063 Sulfurisphaera tokodaii str. 7
525897 Desulfomicrobium baculatum DSM 4028
555778 Halothiobacillus neapolitanus c2
637389 Acidithiobacillus caldus ATCC 51756
673860 Aciduliprofundum sp. MAR08-339
713587 Thioalkalivibrio thiocyanoxidans ARh 4
743299 Acidithiobacillus ferrivorans SS3
933801 Acidianus hospitalis W1
1051632 Sulfobacillus acidophilus TPY
1121405 Desulfococcus multivorans DSM 2059
1158165 Thioalkalivibrio sp. ALMg11
1255043 Thioalkalivibrio nitratireducens DSM 14787
Start a blank notebook and try to work on the tasks proposed in the following notebooks:
Topic | Tasks | ETE features |
---|---|---|
Preparing Genomic data | General advides | None |
Building gene families | clustering homologous sequences | Tree basics, ete-build, ete-view |
Building gene phylogenies | building and handling phylogenetic gene tree | Tree basics, ete-build, ete-view |
Building Concatenated species trees | building contatenated species trees | ete-build supermatrix, PhyloTree collections |
Comparing trees | Comparing tree topologies | ete-compare, Tree.compare |
Linking trees to alignments | binding MSAs and PhyloTrees | SeqGroup, PhyloTree |
Testing selection | Testing evolutionary hypothesis | ete-evol, codeml, visualization |
Predicting evolutionary events | Duplication and Speciation event detection | Rooting, Standardizing, evol events |
Linking to NCBI taxonomy | Querying NCBI taxonomy | ete-ncbiquery, NCBITaxa, Tree.annotate_ncbi_taxa |
Annotation trees | NA | NA |
Programmatic visualization | Custom visualization | tree.render, TreeStyle, NodeStyle, Faces |
this tutorial requires the following:
-
git (to clone and update this repository)
-
Python 3.6 (for full compatibility, avoid 3.7 and 3.8)
-
ete3 3.1.2 (Install lastest from etetookit conda channel, other channels might not be updated.
-
ete_toolchain (install from etetoolkit conda channel)
-
jupyter notebook and nbextensions
-
The recipe to obtain a clean environment to follow all steps bellow:
# makes a separate dir for this course and enter it
mkdir etecourse/
cd etecourse/
# clone this repository
git clone https://github.com/etetoolkit/course
- You can skip this step if you already have a conda env and know how to use it
- Rembember to initialize your conda env. i.e. the Miniconda installation script will ask you about this.
Do you wish the installer to initialize Miniconda3
by running conda init? [yes|no]
[no] >>> yes
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -p ~/eteconda/
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh > Miniconda3-latest-MaxOSX-x86_64.sh
bash Miniconda3-latest-MaxOSX-x86_64.sh -p ~/eteconda/
- open a new shell (IMPORTANT!), so you have your new conda env initialized.
- Then create an enviroment to run this tutorial.
conda env create -f course/REQUIREMENTS_conda.yml
conda activate etecourse
pip install -I ete3
# Enter data directory and uncompress data
cd course/data/
tar zxf fasta.tar.gz
tar zxf phylo.tar.gz
tar zxf evol.tar.gz
# Start jupyter in the root directory of this repo
cd ../
jupyter notebook