Project Description

Dr Marina Naval Sanchez

CSIRO, Brisbane

Machine learning applications in functional annotation

Data science and machine learning for bioinformatics

Thursday 4 July 2019

Marina Naval Sanchez received a BS degree in Agri-food and a MSc in Agriculture Engineering majors in animal biotechnology from Universitat de Lleida (Spain), and a MSc in Applied bioinformatics from Cranfield University (UK). She completed a PhD from Katholieke Universiteit Leuven (Belgium) under the mentorship of Professor Stein Aerts. She moved to Australia in 2015 as an OCE Postdoctoral Fellow at CSIRO (Brisbane) with Dr James Kijas and currently with Dr Toni Reverter. Her main research interests include genomics, transcriptomics, populations and evolutionary genomics and gene regulatory networks, focusing on livestock species.

Functional genomics is in the forefront of the application of machine learning methods to predict enhancers and cis-regulatory regions as well as to predict the impact of SNPs in regulatory regions at the level of downstream function, namely open-chromatin and gene expression. These tools are based on data generated by high-throughput technologies such as ChIP-seq, ATAC-seq, transcription factor state and chromatin states such as the ones produced by the Encyclopedia of DNA Elements (ENCODE) or RoadMap Epigenomes in human. In the realm of non-model organisms, our lab is part of the Functional Annotation of Animal Genomes (FAANG), which aims to produce high-throughput experimental profile of regulatory elements across tissues mostly in livestock species. CSIRO has generated ATAC-seq in Tropical cattle for four tissues and Salmon for several tissues and developmental states. The next step is to make use of machine learning methods, starting with Support Vector Machines (SVM), Random Forests and Deep Learning approaches to unravel the regulatory logic underlying functionality and predict the impact of mutations in phenotype. In this study, I will demonstrate how we applied the machine learning method (e.g. SVM) to identify master regulators in cattle tissues and salmon development ATAC-seq data.