EEEP: Efficient Evaluation of Edit Paths
Robert G. Beiko and Nicholas Hamilton
Overview
EEEP is a program that reconciles a rooted reference tree and an unrooted "test" tree by performing subtree prune-and-regraft (SPR) operations on the reference tree. The effect of these SPR operations in analogous to that of lateral genetic transfer events, so the reconciliation path output by the program is equivalent to a set of lateral gene transfers between organisms in the reference tree. The goal is to find the smallest number of SPR operations that can reconcile the reference and test trees.
Implementation and Building
EEEP is implemented in C++ and released under the GNU GPL. To run EEEP, you will need a C++ compiler (or one of the provided binaries) and Perl 5.x. To build your own version on a UNIX-like system, download the source code and type
make
on any UNIX system with gcc/g++ installed. We highly recommend using whatever code optimizations are available with your favourite compiler.
EEEP Revision History
May 16, 2006 - Version 1.01 uploaded: correction to 'convertNewick.pl' to allow for tree files with integer branch lengths.
September 5, 2005 - Uploaded EEEP Version 1.00 to this site. This is the version described in the BMC Evolutionary Biology manuscript.
Download EEEP
Just the README
ReadMe.txt
Source code
Source code, makefile, and README
Binaries
Binary for RedHat Linux (optimized for 64-bit architecture)
Citing EEEP
The citation for EEEP is:
Beiko, R.G. and Hamilton, N. (2006). Phylogenetic identification of lateral genetic transfer events. BMC Evolutionary Biology 6:15. pdf
An earlier version of EEEP was described briefly in the supporting information of the following article:
Beiko, R.G., Harlow, T.J., and Ragan, M.A. (2005). Highways of gene sharing in prokaryotes. Proc. Natl. Acad. Sci. USA 102:14332-14337. pdf
Benchmarking and Application of EEEP
The data sets and results summarized in the BMC Evolutionary Biology article are available for download below.
Benchmarking runs (e.g., Table 1 and Figure 4)
These results consist of reference / test tree pairs, with a defined number of SPR operations performed on the rooted reference tree to yield the test tree.
The naming convention for the reference and test trees is 'tests3_$a_$b_$c_$d'
Where $a is 's' or 'l' (short or long, depending on the number of taxa), $b is a replicate ID (1..10), $c is the number of taxa (leaves), and $d is the number of random SPR moves that were applied to the test tree.
In the results file, "0 scenarios of length 0" indicates a failure to recover any paths due to running out of time / memory.
Datasets (1 MB .tar.gz)
Results (19K .txt.gz)
Random tree comparisons (e.g., Figure 6)
These results are grouped into two separate files: one containing data sets that were solved by EEEP within the time and memory constraints, the other containing those data sets that could not be solved.
The naming convention for both sets of files follows the convention 'ref_eeep_$a-R$b.tre', 'test_eeep_$a-R$b.tre'
Where $a is the number of taxa (leaves), and $b the replicate ID (1..500)
Datasets and results (42 MB tarfile)
Comparisons of 22 432 protein trees (e.g., Figures 5 and 6)
These results include the protein trees inferred for the PNAS paper, which are collected into a single file (FileTrees.txt) for convenience. To run the current version of EEEP on them, you will need to extract them into separate files. Also included is the rooted reference tree 'MRP_095_biparts.txt' and a set of rules to convert the sequence IDs in the protein trees to the appropriate 1-144 genome ID.
The results are split into five groups, each run with a different ratchet (or lack of ratchet). Sample run shells are included to show the command line that was used to invoke EEEP for each of the five sets. The '.out' files in the archives are in the EEEP output format indicated above.
Datasets (1.8 MB .tar.gz)
Results (15 MB .tar, containing five .tar.gz archives)
Contact
Contact the authors: Robert G. Beiko or
Nick Hamilton
© 2005-2006 Robert G. Beiko