ChIP ‐ seq has become a method of choice to study binding preferences of transcription factors, and localisation of epigenetic regulatory marks at a genomic scale. There is a crucial need for specialised software tools to make sense of these data. While various programs have been developed to perform read mapping and peak calling, the subsequent steps have not yet reached proper maturation: identifying relevant transcription factor binding motifs and the precise location of their binding sites remains a bottleneck. Most existing tools present limitations on sequence size, and typically restrict motif discovery to a few hundred peaks.
We present a pipeline called peak ‐ motifs , integrated into the Regulatory Sequence Analysis Tools (http://rsat.ulb.ac.be/rsat/), which takes as input a set of peak sequences, discovers exceptional motifs, compares them with motif databases, predicts binding site positions, and offers different visualisation interfaces. The pipeline relies on tried ‐ and ‐ tested algorithms whose computing time increases linearly with sequence size, ensuring scalability to massive datasets of several tens of Mb. In addition to the website, peak ‐ motifs can be used as stand ‐ alone application, as well as via SOAP/WSDL web services.
We assessed the performance of peak ‐ motifs on several published datasets. In all cases, relevant motifs are disclosed. For example, we discovered individual Oct and Sox motifs in Sox2 and Oct4 peak collections, whereas the original study only found the composite Sox/Oct motif. For the generic transcriptional co ‐ activator p300 examined in heart and midbrain, peak ‐ motifs identified motifs bound by tissue ‐ specific transcription factors consistent with these two tissues.
In summary, peak ‐ motifs support time ‐ efficient and statistically reliable analysis of complete ChIP ‐ seq datasets, while offering an online user ‐ friendly and well ‐ documented interface.