Strmat

Strmat is a collection of C programs tied together with a simple menu system that implement a variety of string matching and pattern discovery algorithms. The emphasis is on exact matching methods, particularly ones based on the Z-algorithm, on the use of suffix trees, and the search for repeat patterns under several different definitions of repeats in strings. Strmat is under continuing development and we welcome inclusion of additional programs. Strmat was initiated by Dan Gusfield at UC Davis with support from DOE and NSF. Many individuals have contributed to its development. The major implementation of strmat is due to Jim Knight and Jens Stoye.

The best reference for background on these algorithms is: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology, by D. Gusfield; Cambridge University Press, 1997. ISBN 0-521-58519-8

To download a copy of strmat: Download (.tar.gz)

Another major sequence analysis tool, XPARAL, has been developed at UC Davis. It emphasizes inexact matching (alignment), and solves parametric sequence alignment problems, where scores and penalties are allowed to vary.

Another sequence handling tool, Seqio-1.2.2, has been developed at UC Davis. It contains programs to do format conversion between a variety of bioinformatics files and databases.

Dan Gusfield, Feb. 2000