Machine learning applications in functional annotation – Marina Naval Sanchez

Functional genomics is in the forefront of the application of machine learning methods to predict enhancers and cis-regulatory regions as well as to predict the impact of SNPs in regulatory regions at the level of downstream function, namely open-chromatin and gene expression. These tools are based on data generated by high-throughput technologies such as ChIP-seq, ATAC-seq, transcription factor state and chromatin states such as the ones produced by the Encyclopedia of DNA Elements (ENCODE) or RoadMap Epigenomes in human. In the realm of non-model organisms, our lab is part of the Functional Annotation of Animal Genomes (FAANG), which aims to produce high-throughput experimental profile of regulatory elements across tissues mostly in livestock species. CSIRO has generated ATAC-seq in Tropical cattle for four tissues and Salmon for several tissues and developmental states. The next step is to make use of machine learning methods, starting with Support Vector Machines (SVM), Random Forests and Deep Learning approaches to unravel the regulatory logic underlying functionality and predict the impact of mutations in phenotype. In this study, I will demonstrate how we applied the machine learning method (e.g. SVM) to identify master regulators in cattle tissues and salmon development ATAC-seq data.


Comments are closed.