Darwin Help

Back to Index

Clusters

Function Clusters - find Clusters of seqs or objects

Calling Sequence  Clusters(seqs,lim)
Clusters(AllAll,lim)
Clusters(Dist,lim)
Parameters
NameTypeDescription

seqs list(string)a list of Sequences (DNA or proteins)
AllAll matrix(Alignment)all vs all Alignment matrix
Dist matrix(numeric)all vs all distance matrix (symmetric)
lim symbol = positivemode and value used to define clusters
Return Type  list(set(posint))
Synopsis This function finds clusters in a set of sequences or any objects from their distance or similarity constraints. The input is either a set of sequences or a distance matrix or an AllAll matrix and the result is a list of sets of clusters. The components of the clusters are identified by the indices to the seqs or AllAll or Dist arrays. The parameters can be:

List of sequences - n sequences. The sequences are aligned all against all using Global alignments with the default DM matrix. (the rest is as with AllAll matrix).


AllAll matrix - an n x n symmetric matrix of Alignments. If the cluster definition is based on MaxDistance=ddd or AveDistance=dd then the clusters are selected so that the PamDistance (or average) of the Alignments are less than ddd. If MinSimil=sss or AveSimil=sss is specified, the the clusters will be determined by the Score (or average) of the Alignments being larger than sss.


Distance matrix - an n x n symmetric distance matrix. MaxDistance=ddd or AveDistance=ddd should be specified and the clusters are determined by this maximum/average distance.


MaxDistance = ddd - The clusters are determined by the distance ddd. I.e. any two sequences or objects which are separated by less than ddd will be part of the same cluster


AveDistance = ddd - The clusters are determined by the distance ddd. The clusters are built one at a time, starting with the first sequence/object and adding one member at a time. The member added is the one whose average distance to the rest of the cluster is less than ddd. The clusters built this way, may depend on the order of the input sequences.


MinSimil = sss - Like MaxDistance, but the selection criteria is based on Similarity or Score being greater than sss.


AveSimil = sss - Like AveDistance, but the selection criteria is based on the average Similarity or Score being greater than sss.


The output is the list of sets of indices. Each set is a cluster. All indices are included, hence some clusters may be singletons.

Examples
> seqs := [SSSSS, AAAAA, AAAAS, SASSS, SSSSA, ASAAA]:

> Clusters(seqs,AveSimil=8);
[{1,4,5}, {2,3,6}]


See also CircularTour,   ComputeTSP,   FindCircularOrder,   MAlign