Synthetic eight-taxon & putative orthologs datasets


This website presents data used in the manuscript

Is multiple sequence alignment required for accurate inference of phylogeny?

by Michael Höhl and Mark A. Ragan (submitted, 2006)

Data organization

The data is organized hierarchically into directories as follows:

Format of bipart-file

The bipart-file (ref-bipart/*/biparts) is a tab-delimited file consisting of one line (the single deep phylogenetic branch measured by DPB) with three parts

dot-star partition1 partition2

dot-star is a bipartition of the reference tree in dot-star format (using characters '.' and '*').

partition1 and partition2 are comma-separated lists (each enclosed by '[' and ']') of sequence identifiers ID (used in fasta file headers: '>ID 1').

The characters in dot-star refer to the combined IDs from partition1 and partition2 in ascending numerical order. Dots refer to partition1, stars refer to partition2. Example:

.**. [1, 52] [2, 42]

The combined and ordered IDs are [1, 2, 42, 52]. Now we map these IDs to dot-star format. Since 1 is in partition1, the first character is '.'; 2 is in partition2, the corresponding character is '*'; similar for 42; for 52, we proceed as for 1. Hence the resulting string is '.**.'!


(bzip2-compressed tar-balls)

Michael Höhl, 4 May 2006