Examples for TFO search, TTS search and Triplex search

Triplexator Examples

We give three generic examples on how Triplexator can be applied to biological sequence data. Each example is addressing one of the three steps involved in triplex formation.

The file provided in ./demos/P00374.fasta serves as an example sequence. It contains the genomic region spanning 1000bp up and 200 bp downstream of the transcription start site from the human DHFR gene.

Identify TFOs in single-strand sequences

We want to find all putative triplex-forming oligonucleotide in a set of transcripts subject to the following specifics:
  • at least 15 bps in length "-l 15"
  • having at most 15% errors in the motif "-e 15"
  • we may be only interested in TFOs that form triplexes of the purine and the purine-pyrimidine motif "-m R,M"
  • we want to remove low complexity regions of length >= 7 and period <=1 (e.g. for polyA filtering) "-fr on -mrl 7 -mrp 1"
  • output all sites "-of 0"
  • indicate errors with small letters (pretty output) "-po "
  • output to the file names transcripts.tfo "-o transcripts.tfo"
  • place the results in a specific forder "-od folder"

Command:
>triplexator -l 15 -e 15 -m R,M -fr on -mrl 7 -mrp 1 -of 0 -po -od folder -o transcripts.tfo -ss transcripts.fasta

An example script searching TFOs in the DHFR gene promoter is provided in demos:
>./demos/tfo_search.sh

It produces three output files:

Identify high quality putative TTSs in a genome

We want to find all putative target sites in genome, which comply to the following specifics:
  • at least 15 bps in length "-l 15"
  • containing at least 50% guanines "-g 50"
  • having at most 10% pyrimidine interruptions "-e 10"
  • filtered for low complexity regions of length >= 7 and period <=3 "-fr on -mrl 7 -mrp 3"
  • at most 2 duplicates in the duplex set with strict detection algorithm "-dd 2 -dc 2"
  • output all sites "-of 0"
  • output to the file names genome.tts "-o genome.tts"

Command:
>triplexator -l 15 -g 50 -e 10 -fr on -mrl 7 -mrp 3 -dd 2 -dc 2 -of 0 -o genome.tts -ds genome.fasta

An example script searching TTSs in the DHFR gene promoter is provided in demos:
>./demos/tfo_search.sh

It produces three output files:

Identify TFO-TTS pairs in single-strand and duplex sequences

We want to find all putative triplexes that can form between a set of transcripts and promoters subject to the following specifics:
  • at least 15 bps in length "-l 15"
  • having at most 20% errors "-e 20"
  • tolerate up to 2 consecutive errors "-c 2"
  • require a guanine ration of at least 20% "-g 20"
  • disable low-complexity filtering "-fr off"
  • use the purine motif only "-m R"
  • given the previous parameter disable q-gram filtering since it woun't help "-fm 0"
  • output the alignments "-of 1"
  • we like to look at the alignments so make them pretty "-po "
  • output to the file names transcripts_promoters.tpx "-o transcripts_promoters.tpx"
  • we don't have much time but lots of memory so run in parallel, promoters are fairly short, so parallelize on duplexes "-rm 2"
  • but don't use all my processors I still have to work, I'll give you 3 "-p 3"

Attention: this specific parameter setting spans a large search space and may require substantial computational resources.

Command:
>triplexator -l 15 -e 20 -c 2 -fr off -g 20 -m R -fm 0 -of 1 -o transcripts_promoters.tpx -po -rm 2 -p 3 -ss transcripts.fasta -ds promoters.fasta

An example script searching triplexes in the DHFR gene promoter is provided in demos:
>./demos/tfo_search.sh

It produces three output files: