NAME

     AnalyseSeqs - Analyse a set of sequences of common length


SYNOPSIS

     AnalyseSeqs  [-X[bswn]]  [-Q]  [-M{mask}[+|!]]   [-D{H|A|G}]
     [-d{S|H|D|B}]


DESCRIPTION

     AnalyseSeqs reads a set of sequences from stdin and tries  a
     variety  of methods for sequence analysis on them. Currently
     available are:
     Statistical geometry for quadruples of  sequences;  THIS  IS
     PRELIMINARY AND NOT WELL TESTED BY NOW.
     split decomposition; neighbour joining and  Ward's  variance
     method for reconstructing phylogenies using various distance
     measures. For statistical geometry and the  cluster  methods
     PostScript output is available.
     The program continues reading until it encounters one of the
     separator characters '@' or '%'. Only sequences of alphabet-
     ical characters or of a specified  alphabet  are  processed,
     all other lines are ignored. The program stops reading if it
     either encounters an EOF condition, or if there are no valid
     sequence  data  between  two  lines beginning with separator
     characters.
     A list of taxa names can be specified in the  input  stream.
     The  list begins with a line beginning with '*'. Optionally,
     a file name prefix [fn] for the  PostScript  output  can  be
     specified  in  this  line.   The  entries have the form 'x :
     Taxon', where x  is  the  number  of  taxon,  i.e.,  of  the
     corresponding entry in the list of input sequences. The taxa
     list need not be complete. It must end, however, with a line
     beginning  with  '*' or any of the separator characters. The
     taxa list is printed on top of  the  output.  The  specified
     taxa names are used as labels in the PostScript output.



OPTIONS

     -X[bswn]
          specifies the analysis methods to be used.

     [b]  Statistical   Geometry.   A   PostScript   file   named
          '[fn_]box.ps'  giving a graphical representation of the
          statistical geometry is created. The resulting box is a
          good  measure of 'tree likeness' of the data set.  This
          is the default.

     [s]  Split decomposition.

     [w]  Cluster analysis using Ward's method. A PostScript file
          named  '[fn_]wards.ps'  is created containing a drawing
          of the tree.

     [n]  Cluster  analysis  using  Saitou's  neighbour   joining
          method. A PostScript file named '[fn_]nj.ps' is created
          containing a drawing of the tree.


     -Q   indicates that a statistical geometry analysis is to be
          performed  comparing  four  data  sets, for instance to
          confirm the significance of a proposed phylogeny.  This
          option is only useful for statistical geometry analysis
          and hence the -X option is ignored. Each  of  the  four
          data sets must be of the form
          * [filename_prefix]
          # number
          [list of taxa names]
          *
          list of sequences
          %
          where number is 1,2,3,4 for the four groups to be  com-
          pared.


     -M{mask}[+|!]
          allows to specify a mask for the input  file.  '{mask}'
          can be one of the following letters indicating a prede-
          fined alphabet or the %-sign followed by all characters
          to  be  accepted.  A + sign at the very end of the mask
          indicates that the input is to be handled  case  sensi-
          tive. Default is conversion of the input to upper case.
          A ! sign can be used to convert the input  data  to  RY
          code: GgAaXx -> R, UuCcKkTt -> Y, all other letters are
          converted to *.

     -Ma  all letters A-Z and a-z.

     -Mu  uppercase letters.

     -Ml  lowercase letters.

     -Mc  digits [0-9].

     -Mn  all alphanumeric characters.

     -MR  RNA alphabet (GCAUgcau).

     -MD  DNA alphabet (GCATgcat).

     -MA  Amino acids in one-letter code.

     -MS  Secondary strcutures coded as '^.()'

     -M%alphabet
          use the specified alphabet.

     -D   specifies the algorithm to be used for calculating  the
          distance matrix of the input data set. Available are

     -DH  Hamming Distance

     -DA[,cost]
          Simple alignment distance according  to  Needleman  and
          Wunsch.   A gap cost different from 1. can be specified
          after the comma.

     -DG[,cost1,cost2]
          Gotoh's  distance  with  gap  cost  function   g(k)   =
          cost2+cost1*(k-1).  cost2<=cost1  has  to be fulfilled.
          Default values are  cost1=1.,  cost2=1.,  yielding  the
          same distance as option A.
          ONLY THE HAMMING DISTANCE IS WELL TESTED BY NOW !!!


     -d   specifies the edit cost matrix to  be  used.  Available
          are

     -dS  simple distance. Indel and  substitution  of  different
          characters  all  have cost 1. The indel cost can be set
          by specifying the gap costs with the algorithm  options
          -DA and -DG. This is the default.

     -dH  A  distance  matrix  for  RNA   secondary   structures.
          Inspired  by  Hogeweg's  similarity measure (J.Mol.Biol
          1988).  Gap-function is set automatically.

     -dD  Dayhoff's matrix for amino acid distances.

     -dB  Distinguish purines and pyrimidines only. CAUTION  this
          option  of  course  influences  only the calculation of
          distances.  It does NOT affect computation of the  sta-
          tistical   geometry.  This  is  done  directly  on  the
          sequences. If you want to do statistical geometry on RY
          sequences  use  the  !  sign  with  the  -M option, for
          instance -MR!.



REFERENCES

     The method of statistical geometry has been introduced by M.
     Eigen,  R.  Winkler-Oswatitsch  and  A.W.M. Dress (Proc Natl
     Acad Sci, 85:1988,5912).  The method of split  decomposition
     was  proposed  by  H.J.  Bandelt and A.W.M. Dress (Adv Math,
     92:1992,47).  The variance method for  cluster  analysis  is
     due to H.J. Ward (J Amer Stat Ass, 58:1963,236).  The neigh-
     bour joining method was published by  Saitou  and  Nei  (Mol
     Biol Evol, 4:1987,406).

     This program is part of the Vienna RNA Package


WARNING

     This is the beta test version. Some options or  combinations
     of  options  may  still  produce  nonsense.  Please send bug
     reports to ivo@tbi.univie.ac.at.



VERSION

     This man page is part of the Vienna RNA Package version 1.2.


AUTHOR

     Peter F Stadler, Ivo L. Hofacker.


BUGS

     Comments should be sent to ivo@itc.univie.ac.at.







































Man(1) output converted with man2html