Skip to content
esayyari edited this page Aug 10, 2017 · 2 revisions

parameters

Under the parameters folder, we have all the parameter files:

  • annotation-1.txt: Anootation file for the first hypothesis structure
  • annotation-2.txt: Anootation file for the second hypothesis structure
  • annotation-3.txt: Anootation file for the third hypothesis structure
  • annotation-4.txt: Anootation file for the forth hypothesis structure
  • annotation.txt: The overall anootation file. The main annotation file, where it maps every species available in your dataset (species tree) to a clade name. Note that you would assign each species to only one clade.
  • clade-defs.txt: Clade definition file. In this file you would define all the clade definitions accourding to the instruction or the code provided previousely. Note that the field seperator in this file is tab.
  • names.txt: Names file. List the names of the species you have in
  • newModel.txt: Model condition definition file. In this file you would specify the old model condition names with exactly the ordering you want them get displayed on the x-axis of the species tree analysis on the first line. On the second line, you have the new naming for those model conditions with the same ordering. Note that the names are seperated with tabs instead of spaces.
  • newOrders.txt: Orders definition file. In this file you would specify the old clades names with exactly the ordering you want them get displayed on the first line, and the ordered new clades names on the second line of this file. Note that the names are seperated with tabs instead of spaces.
  • rooting.txt: Rooting definition file. In this rooting file you should specify the outgroup species, but you don't have to just list them in one line. You have this option to list them with respect to their distances to the ingroup species. For example, the rooting definition available here, has 3 lines, where on the first line we have the most distant species to the ingroups. On the other two lines we have the other outgroup species, with respect to their distance to the ingroup after the main set of outgroups.

Species

Under the species folder we have 31 folders each with this structure: Model_Condition-DST, where DST defines the type of sequence alignment. For example astral.trim50genes33taxa.no3rd.final-FNA2AA is a folder under the species folder, where astral.trim50genes33taxa.no3rd.final is the model condition name, and FNA2AA is the DST. Then under each folder we have a species tree with the name estimated_species_tree.tree. In order to generate the same figures as available in the supplementary materials of the paper you would use the following commands if you installed discoVista on your machine:

$WS_HOME/DiscoVista/src/utils/discoVista.py -a parameters/annotation.txt -c parameters/clade-defs.txt -p species/ -r parameters/rooting.txt -t 95 -y parameters/newModel.txt parameters/newOrders.txt -m 0

or using the docker image, you can run discovista in the interactive mode. You should first use the following command. Note that <path to example folder> is the absolute path to the directory where 1KP example folder is placed:

docker run -it --rm -v <path to example folder>/1KP:/data esayyari/discovista

and then run it like you installed discovista on your machine from the docker image:

$WS_HOME/DiscoVista/src/utils/discoVista.py -a parameters/annotation.txt -c parameters/clade-defs.txt -p species/ -r parameters/rooting.txt -t 95 -y parameters/newModel.txt parameters/newOrders.txt -m 0

Genetrees

Under the genetrees/filtered folder we have 852 folders each has 3 subfolders with one gene tree under each with the name estimated_gene_trees.tree. Each of these subfolders are named as ID-Model_Condition-DST. More particularly, we have the 4032 (ID) folder, and under this folder we have 3 subfolders 4032-c1c2_filterlen33-FNA2AA_c1c2_filterlen33, 4032-filterlen33-FAA_filterlen33, and 4032-filterlen33-FNA2AA_filterlen33. In these folders c1c2_filterlen33, and filterlen33 are the model conditions, and FAA and FNA2AA are the sequence alignment type. In order to generate the same figures as available in the supplementary materials of the paper you would use the following commands if you installed discoVista on your machine:

$WS_HOME/DiscoVista/src/utils/discoVista.py -a parameters/annotation.txt -c parameters/clade-defs.txt -p genetrees/filtered/ -r parameters/rooting.txt -t 75 -w parameters/newOrders.txt -y parameters/newModel.txt -m 1

or using the docker image, you can run discovista in the interactive mode. You should first use the following command. Note that <path to example folder> is the absolute path to the directory where 1KP example folder is placed:

docker run -it --rm -v <path to example folder>/1KP:/data esayyari/discovista

and then run it like you installed discovista on your machine from the docker image:

$WS_HOME/DiscoVista/src/utils/discoVista.py -a parameters/annotation.txt -c parameters/clade-defs.txt -p genetrees/filtered/ -r parameters/rooting.txt -t 75 -w parameters/newOrders.txt -y parameters/newModel.txt -m 1

GC

We have the GC/unfiltered folder available in the example folder. Under this folder we have 852 folders for each gene and the name of each folder is condsidered as the GENE ID, e.g. gene ID 4032. Then under each of these folders we have a fasta file, with the name DS-alignment-noFilter.fasta. For example, FNA2AA-alignment-noFilter.fasta is available under the folder GC/unfiltered/4032. In order to generate the same figures as available in the supplementary materials of the paper you would use the following commands if you installed discoVista on your machine:

$WS_HOME/DiscoVista/src/utils/discoVista.py -m 2 -a parameters/annotation.txt -p GC/unfiltered/

oror using the docker image, you can run discovista in the interactive mode. You should first use the following command. Note that <path to example folder> is the absolute path to the directory where 1KP example folder is placed:

docker run -it --rm -v <path to example folder>/1KP:/data esayyari/discovista

and then run it like you installed discovista on your machine from the docker image:

$WS_HOME/DiscoVista/src/utils/discoVista.py -m 2 -a parameters/annotation.txt -p GC/unfiltered/

Occupancy

We have the occupancy/filtered folder available in the example folder. Under this folder we have 852 folders for each gene and the name of each folder is condsidered as the GENE ID, e.g. gene ID 4032. Then under each of these folders we have a fasta file, with the name DST-alignment-Model_Condition.fasta. For example, FNA2AA-alignment-f25.fasta and FNA2AA-alignment-filterlen33.fasta are available under the folder occupancy/filtered/4032. In order to generate the same figures as available in the supplementary materials of the paper you would use the following commands if you installed discoVista on your machine:

$WS_HOME/DiscoVista/src/utils/discoVista.py -m 3 -a parameters/annotation.txt -p occupancy/filtered/

or using the docker image, you can run discovista in the interactive mode. You should first use the following command. Note that <path to example folder> is the absolute path to the directory where 1KP example folder is placed:

docker run -it --rm -v <path to example folder>/1KP:/data esayyari/discovista

and then run it like you installed discovista on your machine from the docker image:

$WS_HOME/DiscoVista/src/utils/discoVista.py -m 3 -a parameters/annotation.txt -p occupancy/filtered/

Branch analysis

Under the folder branchAnalysis available in the example folder, there are 6 folders, GAMMA.2, c1c2.GAMMA.2, c1c2.f25, c1c2_filterlen33, f25, and filterlen33, and under each of them we have a file with this naming structure FNA2AA-estimated_gene_trees.tree, where you would replace FNA2AA with any alignment type or label that you wish, and each of them has 852 gene trees (lines) in the newick format. In order to generate the same figures as available in the supplementary materials of the paper you would use the following commands if you installed discoVista on your machine:

$WS_HOME/DiscoVista/src/utils/discoVista.py -m 4 -a parameters/annotation.txt -p branchAnalysis/ -r parameters/rooting.txt

or using the docker image, you can run discovista in the interactive mode. You should first use the following command. Note that <path to example folder> is the absolute path to the directory where 1KP example folder is placed:

docker run -it --rm -v <path to example folder>/1KP:/data esayyari/discovista

and then run it like you installed discovista on your machine from the docker image:

$WS_HOME/DiscoVista/src/utils/discoVista.py -m 4 -a parameters/annotation.txt -p branchAnalysis/ -r parameters/rooting.txt

Relative frequency

Under the folder relativeFreq/astral.trim50genes33taxa.no3rd.final-FNA2AA, we have two files with the names estimated_species_tree.tree, and estimated_gene_trees.tree for species tree, and set of gene trees (852) in newick format. In order to generate the same figures as available in the supplementary materials of the paper you would use the following commands if you installed discoVista on your machine. Let's assume that you want to test the relative frequencies of the firts hypothesis (annotation-1.txt), in which there are 5 clades, Base (as outgroup), Charales, Coleochaetales, Landplants, Zygnematophyceae. We use the following set of commands:

$WS_HOME/DiscoVista/src/utils/discoVista.py -a parameters/annotation-1.txt -m 5 -p relativeFreq/astral.trim50genes33taxa.no3rd.final-FNA2AA/ -l anot1 -g Base

or using the docker image, you can run discovista in the interactive mode. You should first use the following command. Note that <path to example folder> is the absolute path to the directory where 1KP example folder is placed:

docker run -it --rm -v <path to example folder>/1KP:/data esayyari/discovista

and then run it like you installed discovista on your machine from the docker image:

$WS_HOME/DiscoVista/src/utils/discoVista.py -a parameters/annotation-1.txt -m 5 -p relativeFreq/astral.trim50genes33taxa.no3rd.final-FNA2AA/ -l anot1 -g Base
Clone this wiki locally