Genetic Algorithm Neural Networks for Regulatory Region Identification

Robert G. Beiko and Robert L. Charlebois


GANN is a machine learning method designed with the complexities of transcriptional regulation in mind. The key principle is that regulatory regions are composed of features such as consensus strings, characterized binding sites, and DNA structural properties. GANN identifies these features in a set of sequences, and then identifies combinations of features that can differentiate between the positive set (sequences with known or putative regulatory function) and the negative set (sequences with no regulatory function). Once these features have been identified, they can be used to classify new sequences of unknown function.


The GANN suite is a set of Perl scripts and C++ programs that extract genomic sequences of interest, extract the desired sequence features, and identify useful combinations of these features with the core machine learning algorithm. The modular design of the suite allows the input of tabular data from outside sources, and analysis of observed sequence properties with more traditional statistical analysis methods. GANN is currently in Version 2.0; there are many more features that I would like to implement but the time frame for these changes is not determined. Requests for modifications and bug reports are welcome. Alternatively, since the source code is released under the GPL and available for download and inspection (and is hopefully not too inscrutable), you can always implement changes yourself :^>


GANN 2.0 flowchart (.pdf)

The GANN 2.0 Manual (txt) / (doc) / (PDF)

Download GANN


Win32 executables + Perl scripts

Source code

Source code for Win32 and UNIX

Each of the 4 C++ programs has its own makefile; simply type 'make' in the appropriate directory to generate the executable.

The .mcp files included are project files associated with MetroWorks CodeWarrior for Windows; if you have CodeWarrior open these to compile the source code.

Unfortunately due to differences in C++ string stream libraries the current implementation of GANN will not compile properly on Mac OS X.

Citing GANN

The main citation for GANN is:

Beiko, R.G. and Charlebois, R.L. (2005). GANN: genetic algorithm neural networks for the detection of conserved combinations of features in DNA. BMC Bioinformatics 6: 36.


You can E-mail me your comments at

© 2004-2006 Robert G. Beiko