EEEP: Efficient Evaluation of Edit Paths

Robert G. Beiko and Nicholas Hamilton


EEEP is a program that reconciles a rooted reference tree and an unrooted "test" tree by performing subtree prune-and-regraft (SPR) operations on the reference tree. The effect of these SPR operations in analogous to that of lateral genetic transfer events, so the reconciliation path output by the program is equivalent to a set of lateral gene transfers between organisms in the reference tree. The goal is to find the smallest number of SPR operations that can reconcile the reference and test trees.

Implementation and Building

EEEP is implemented in C++ and released under the GNU GPL. To run EEEP, you will need a C++ compiler (or one of the provided binaries) and Perl 5.x. To build your own version on a UNIX-like system, download the source code and type


on any UNIX system with gcc/g++ installed. We highly recommend using whatever code optimizations are available with your favourite compiler.

EEEP Revision History

May 16, 2006 - Version 1.01 uploaded: correction to '' to allow for tree files with integer branch lengths.

September 5, 2005 - Uploaded EEEP Version 1.00 to this site. This is the version described in the BMC Evolutionary Biology manuscript.

Download EEEP

Just the README


Source code

Source code, makefile, and README


Binary for RedHat Linux (optimized for 64-bit architecture)

Citing EEEP

The citation for EEEP is:

Beiko, R.G. and Hamilton, N. (2006). Phylogenetic identification of lateral genetic transfer events. BMC Evolutionary Biology 6:15. pdf

An earlier version of EEEP was described briefly in the supporting information of the following article:

Beiko, R.G., Harlow, T.J., and Ragan, M.A. (2005). Highways of gene sharing in prokaryotes. Proc. Natl. Acad. Sci. USA 102:14332-14337. pdf

Benchmarking and Application of EEEP

The data sets and results summarized in the BMC Evolutionary Biology article are available for download below.

Benchmarking runs (e.g., Table 1 and Figure 4)
These results consist of reference / test tree pairs, with a defined number of SPR operations performed on the rooted reference tree to yield the test tree.

The naming convention for the reference and test trees is 'tests3_$a_$b_$c_$d'
Where $a is 's' or 'l' (short or long, depending on the number of taxa), $b is a replicate ID (1..10), $c is the number of taxa (leaves), and $d is the number of random SPR moves that were applied to the test tree.
In the results file, "0 scenarios of length 0" indicates a failure to recover any paths due to running out of time / memory.

Datasets (1 MB .tar.gz)
Results (19K .txt.gz)

Random tree comparisons (e.g., Figure 6)
These results are grouped into two separate files: one containing data sets that were solved by EEEP within the time and memory constraints, the other containing those data sets that could not be solved.

The naming convention for both sets of files follows the convention 'ref_eeep_$a-R$b.tre', 'test_eeep_$a-R$b.tre'
Where $a is the number of taxa (leaves), and $b the replicate ID (1..500)

Datasets and results (42 MB tarfile)

Comparisons of 22 432 protein trees (e.g., Figures 5 and 6)
These results include the protein trees inferred for the PNAS paper, which are collected into a single file (FileTrees.txt) for convenience. To run the current version of EEEP on them, you will need to extract them into separate files. Also included is the rooted reference tree 'MRP_095_biparts.txt' and a set of rules to convert the sequence IDs in the protein trees to the appropriate 1-144 genome ID. The results are split into five groups, each run with a different ratchet (or lack of ratchet). Sample run shells are included to show the command line that was used to invoke EEEP for each of the five sets. The '.out' files in the archives are in the EEEP output format indicated above.

Datasets (1.8 MB .tar.gz)
Results (15 MB .tar, containing five .tar.gz archives)


Contact the authors: Robert G. Beiko or Nick Hamilton

© 2005-2006 Robert G. Beiko