| Calling Sequence | PhylogeneticTree(Seqs,Ids,Mode)
| |||||||||||||||||||||
| Parameters |
| |||||||||||||||||||||
| Return Type | Tree | |||||||||||||||||||||
| Globals | DimensionlessFit, MST_Qual, printlevel, | |||||||||||||||||||||
| Synopsis | PhylogeneticTree is a method for constructing phylogenetic trees using either minimization of the least squares of the distances in the real data and computed tree or by minimizing the number of changes/mutations that would be required | |||||||||||||||||||||
If the mode passed is DISTANCE, an all-against-all (each sequence aligned against each other sequence) is calculated and the distance and variance information is used to compute a binary tree which approximates via least squares the distance information. If an optional array of Alignment data structures is passed as an argument, this all-against-all will be used instead of recalculating it. Ten trees are constructed from random starting points and the best tree is returned. All trees are optimized using iterations of 4-optim and 5-optim which optimize all subtrees with 4 and 5 branches respectively. The quality of the fit is measured by the sum of the squares of the weighted deviations divided by (n-2)(n-3)/2. This value is stored in the global variable MST_Qual. If the global variable MinLen is assigned a positive value, it will determine the minimum length between internal or external nodes. If not set, 0.1 PAM is used. The distance of the branches are the approximate distances calculated by least squares in PAM units. Since the tree is made from alignments, the input sequences must be protein or DNA sequences. | ||||||||||||||||||||||
If the mode passed is PARSIMONY, random trees are constructed and then optimized with 4-and 5-optim using the parsimony criterion (the tree with the least amount of mutations is the best tree). This is sometimes also called character compatibility. Each position of the given sequences is treated as a character. The goal of the parsimony trees is to build a tree such that we can assign character changes on the branches of the tree and this total number of changes is minimized. Amino acids or DNA bases can be used as characters, but also any other arbitrary symbol (characters are restricted to be ASCII characters though). If a MAlignment data structure is passed as an optional argument, this alignment is used. If all the sequences are exactly the same length, it is assumed that they have been already aligned and they are taken as given. If not, the sequences in Seqs are aligned with the circular tour method (See ?MAlign). The global variable MST_Qual is assigned the number of changes that the returned tree requires. The distances in the tree are taken from the parsimony construction and indicate the minimum number of changes that must occur in that particular branch. The Parsimony method accepts an additional parameter which indicates which method to use to build the initial tree. This tree is later optimized. The methods to build the initial tree are: | ||||||||||||||||||||||
NJRandom Neighbour Joining with randomness in the selection of the best pair to join. | ||||||||||||||||||||||
CircularTour A circular tour of minimum cost is built at each step, and the pair of nodes with least cost is selected to be joined. | ||||||||||||||||||||||
NeighJoin Neighbour Joining. At each step the two subtrees with the least cost to join them are joined. | ||||||||||||||||||||||
DynProgr(k) Use a dynamic programming approach among the k best results of Neighbour Joining. | ||||||||||||||||||||||
DynProgr Identical to DynProgr(10) | ||||||||||||||||||||||
OptInsertion Insert each leaf/subtree in the best possible branch of the previously built subtrees. This is the default choice, it is a bit slow, but normally gives the best trees. | ||||||||||||||||||||||
Random Leaves/subtrees are joined randomly. Quite fast, but produces poor trees. | ||||||||||||||||||||||
LowerBound Do not build a tree, just compute a lower bound on the cost of the tree (minimum number of changes). | ||||||||||||||||||||||
SemiOptInsertion(t) Like OptInsertion, but limit the search of the best insertion to t seconds. | ||||||||||||||||||||||
SemiOptInsertion Synonym of SemiOptInsertion(10). | ||||||||||||||||||||||
If the mode passed is StrictCharacterCompatibility, then it is assumed that the Seqs are strings (all of the same lengths) of binary characters. Any symbols can be used for the characters. If the characters are not compatible, an error is given with the first pair of characters which are not compatible. The global variable MST_Qual will contain the minimum number of character changes, which is equal to the number of informative characters (and never greater than the length of the sequences of characters). | ||||||||||||||||||||||
If the mode passed is LINEAGE, then it is assumed that the Seqs are lists containing lineage descriptions. The lists are assumed to classify each sequence from the most general to the most specific class. The lineage descriptions have to be consistent, that is if a particular class is used, then it should always be preceded with the same sequence of classes. The classes are typically strings, but could be any valid Darwin object. | ||||||||||||||||||||||
| Examples | > Ids := ['one','two','three','four']: > Seqs := ['RTHKLPEMNVC', 'KSHKLPEMNVC', 'SHKLMNVC', 'HKLPEMNVC']: > PhylogeneticTree(Seqs,Ids,DISTANCE); > MST_Qual; 0.01116240 > PhylogeneticTree(Seqs,Ids,PARSIMONY); Tree(Tree(Leaf(four,1.0050,4),0.00500000,Leaf(two,1.0050,2)),0,Tree(Leaf(three,2 .0050,3),0.00500000,Leaf(one,2.0050,1))) > Seqs := [B1xj,B2zj,G2zi,G1xi,G2xi]: > PhylogeneticTree(Seqs,[seq(i,i=1..5)],parsimony); Tree(Tree(Leaf(5,0.5100,5),0.5000,Tree(Leaf(1,3.5000,1),1.5000,Leaf(4,1.5100,4)) ),0,Tree(Leaf(2,2.5000,2),0.5000,Leaf(3,0.5100,3))) > MST_Qual; 6 | |||||||||||||||||||||
| See also | BootstrapTree, ComputeDimensionlessFit, DrawTree, Entry, GapTree, Leaf, LeastSquaresTree, MAlignment, RBFS_Tree, Sequence, SignedSynteny, Synteny, Tree | |||||||||||||||||||||