-
Notifications
You must be signed in to change notification settings - Fork 0
License
hzi-bifo/AntigenicTreeTools
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
COPYRIGHT: ========== Copyright 2012 Lars Steinbrueck under the GPL. See file LICENSE.txt for details. The software includes other software written by third parties. This has been distributed according to the licenses provided by the respective authors (se below). JAMA (http://math.nist.gov/javanumerics/jama/): This software is a cooperative product of The MathWorks and the National Institute of Standards and Technology (NIST) which has been released to the public domain. Neither The MathWorks nor NIST assumes any responsibility whatsoever for its use by other parties, and makes no guarantees, expressed or implied, about its quality, reliability, or any other characteristic. BVLS (http://people.sc.fsu.edu/~jburkardt/f_src/bvls/bvls.html): Charles Lawson, Richard Hanson, Solving Least Squares Problems, Revised edition, SIAM, 1995, ISBN: 0898713560, LC: QA275.L38. GENERAL USAGE: ============== The software is written in Jjava and depending on the version includes further C and Fortran code. The software is distributed either as (i) a jar file or as (ii) java, c and fortran source code. The first version solves the non-negative least-squares (NNLS) problem using an implementation in java. This version is already compiled and can be used from scratch. The second version solves the NNLS problem with a fortran implementation of C. Lawson and R. Hanson. We strongly recommend the use of the second version, as this implementation is magnitudes faster than the java implementation. INSTALLATION: ------------- Version (i) needs no installation and can be used directly. To compile version (ii) adjust the file 'makefile', such that the neccessary libraries 'jni.h' and 'jni_md.h' can be included. After adjustment simply type 'make' and the source code will be compiled into the 'bin/' directory. RUN THE PROGRAM: ---------------- Version (i): java -jar AntigenicTreeTools.jar [options] Version (ii): java -cp [path to software folder]/bin/:[path to software folder]/jar/Jama.jar -Djava.library.path=[path to software folder]/bin/ phyloDriver [options] Options: ........ -h will print the help mesage below. Use the following options: (options indicated with [] are optional) TREE INPUT [-f strategy -- infer intermediate sequences (AccTran/DelTran)] [-g -- count gaps as changes (when ancestral states are reconstructed)] [-i file -- input file with intermediate sequences in fasta format] [-l file -- file with node linkage] [-m file -- file with leaf node mapping] -n file -- file with tree in newick format -o name -- output name [-p -- given tree is in phylip format (default is nexus)] -t file -- input file with leave sequences in fasta format TREE MANIPULATION [-col -- permit branch collapsing] [-not list -- comma separated list of nodes to be pruned] [-r name -- reroot tree at leaf 'name'] NNLS FIT -ls file -- input matrix for least squares fit [-d -- HI input matrix contains already log2 normalized distance values] [-loo -- do loo for fit?] [-cv "x,y" -- do x-fold cross validation y times for fit?] Output: ....... The program will output three files: [output name].leastSquares.distance The squared training and testing (if applied) error for each element (HI titer / distance) and the total squarred and absolut error. Each line is compossed of distance label [tab] true value [tab] predicted value (training) [tab] squarred error (training) [tab] predicted value (testing, if applied) [tab] squarred error (testing, if applied) [output name].leastSquares.mutationImpact Individual weigths of each branch. Each line is compossed of branch ID [tab] weight [tab] mapped mutations Positive branch IDs refer to up-weights, whereas negativ branch IDs refer to down-weigths. 'NaN' indicates that no weight could be inferred for that branch (e.g. in case of no antiserum is present in the subtree, such that the down weight is not defined). [output name].leastSquares.withMuts.tre Antigenic tree in nexus format with mutations and antigenic weights mapped to each branch. Branch lengths are set to the maximum of the respective up- or down-weight. The tree can be easily viewed using FigTree (http://tree.bio.ed.ac.uk/software/figtree/). Options in detail: .................. -col Collapse branches that are shorter than 1e-7. Results in multifurcating trees. -cv Perform a x-fold cross validation for the specified data. Parameter passed as 'x,n': x-fold cross-validation independently repeated n times. Folds are built randomly. -d Input matrix for least-squares optimization already contains log2 transformed distances. -f Strategy for ancestral character state reconstruction. Choose AccTran (accelerated transformation, default) or DelTran (delayed transformation). -g Count gaps as changes during ancestral character state reconstruction. If not specified, gaps will be treated as missing. -i Alignment file containing the sequences of intermediate nodes in fasta format. -l Linkage file to map ancestral sequences to intermediate nodes. The file is compossed of pairs of nodes of the following scheme: from [tab] to [line break]. The ordering of links is defined by the newick tree (parsing the newick string from the left to the right). -ls Either the HI titer matrix or already log2 transformed distance values. In the second case the option '-d' has to be used, too. For the HI titer matrix the titers between antigen i and antiserum j will be transformed into log2 distances: d(i,j) = log2(max(H(j))) - log2(H(i,j)). The general input format follows this specification: First row: Sera names, tab separated, starting with a tab ([tab] name 1 [tab] name 2 [tab] ...) Second row: Reference values for normalization (REF [tab] value for serum 1 [tab] value for serum 2 [tab] ...). If log2 distances are provided set these values to 0.0. Remaining rows: Input values (antigen name [tab] value for serum 1 [tab] value for serum 2 [tab] ...) If a value for a specific serum is not present use '*' (in case of HI titers) or 'NaN' (in case of log2 transformed distances). -loo Perform leave-one-out cross-validation for the specified data. For each element of the input matrix train a model (antigenic tree) using all other elements and predict the distance for the left out element. -m Mapping of additional information to leaf nodes. This file is addapted to the needs of influenza virus strains and allows to pass additional information to the program. Each line has top follow this scheme: Node ID [tab] accession [tab] strain name [tab] serotype [tab] year of isolation [tab] host [tab] whole identifier string [tab] exact date of isolation In the current version of the program only column one and three are used. The remaining information can be skipped (left blank). If this option is specified, the strain names will be output at the leaves of the tree rather than the node identifiers. -n The newick tree either in nexus format or in phylip format. If the tree is supplied in phylip format you have to specify the option '-p', too. -not Remove the specified leaf nodes from the tree. IDs should be passed comma separated (ID1,ID2,...) -o Output prefix used for output files. -p Input tree is in phylip format. If not specified, the input tree is assmed to be in nexus format. -r Reroot the tree at the specified leaf node. -t Alignment file for leave sequences in fasta format. Ancestral character state reconstruction: ......................................... For sake of simplicity we implemented a basic parsimony approach [1] for ancestral character state reconstruction. Ties are resolved using either accelerated transition ('-f AccTran') or delayed transition ('-f DelTran'). However, the output of other ancestral character state reconstruction techniques can be used, too. In this case do not specify the '-f' option. Instead provide the sequences of intermediate nodes ('-i') and and a linkage file ('l') to specify where the sequences map in the tree. Examples (called from within the software directory): .................................................... (1) java -cp bin/:jar/Jama.jar -Djava.library.path=bin/ phyloDriver -n example_data/tree.phy -p -t example_data/aa.aln -f AccTran -m example_data/aa.map -col -r f0dp7 -ls example_data/HI_titers.txt -o WHO1988a (2) java -cp bin/:jar/Jama.jar -Djava.library.path=bin/ phyloDriver -n example_data/tree.phy -p -t example_data/aa.aln -i example_data/aa.intermediate.aln -l example_data/aa.link -m example_data/aa.map -col -r f0dp7 -ls example_data/HI_distances.txt -d -o WHO1988b (3) java -jar AntigenicTreeTools.jar -n example_data/tree.phy -p -t example_data/aa.aln -f AccTran -m example_data/aa.map -col -r f0dp7 -ls example_data/HI_titers.txt -loo -o WHO1988c (4) java -cp bin/:jar/Jama.jar -Djava.library.path=bin/ phyloDriver -n example_data/tree.phy -p -t example_data/aa.aln -f Sankoff -m example_data/aa.map -col -r f0dp7 -ls example_data/HI_titers.txt -o WHO1988a -seed 4 -cost example_data/aa-cost.txt These examples highlight the use of the different versions and parameters. The first and second example use the Fortran library to solve the NNLS problem, whereas the last example uses a Java library. All examples produce the same output. However differences are as follows: - Example (1) infers the ancestral character states using an implemented parsimony approach and transformes the HI titers into distances. - Example (2) reads node linkage information and maps ancestral sequences that were inferred by a different program and uses already log2-transformed distances. - Example (3) is similar to example (1), but furthermore computes the leave-one-out error. Example sequences were downloaded from the Influenza Virus Ressource [2] and HI data retrieved from [3]. References: =========== [1] Fitch, W. (1971). Toward defining the course of evolution: minimum change for a specific tree topology. Syst Zool, 20 (4): 406-16. [2] Bao, Y., P. Bolotov, D. Dernovoy, B. Kiryutin, L. Zaslavsky et al. (2008). The influenza virus resource at the National Center for Biotechnology Information. J Virol, 82 (2): 596-601. [3] WHO (1988). Recommended composition of influenza virus vaccines for use in the 1988-1989 season. WHO Wkly Epidem Rec 63 (9): 57-60.
About
No description, website, or topics provided.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published