Project Description

Ms Alexandra Essebier

The University of Queensland

Extracting knowledge from models trained on biological data to measure performance and improve understanding of the system

Data science and machine learning for bioinformatics

Wednesday 3 July 2019

Alex Essebier completed her undergraduate degrees in Science (Biochemistry and Molecular Biology) and Information Technology at the University of Queensland in 2013 and a Masters of Bioinformatics in 2015. Alex first developed an interest in bioinformatics in her second year of university when she discovered it would allow her to solve a variety of biological problems through the application of her programming skills. She has undertaken a number of research projects over the last five years as part of A/Prof. Mikael Bodén’s group at UQ.

These projects involved large biological datasets and allowed Alex to explore big data techniques to extract relevant patterns and relationships. Her main focus is on the use of machine learning to analyse high-throughput genomic datasets. She is currently a PhD student investigating the application of machine learning to predict long distance regulatory interactions. The ability to accurately detect these interactions can improve our understanding of developmental disorders and diseases such as cancer.

Alex’s multidisciplinary background has allowed her to provide bioinformatic insight on a number of research projects engaging with multiple collaborators. It has also provided her with the skills to drive her own research and work toward developing new bioinformatic tools and techniques.

Extracting knowledge from statistical and machine learning approaches is a challenge faced by many researchers including those in the field of bioinformatics. A large amount of data is now available that captures multiple aspects of human biology at the cellular level and we are faced with the task of extracting knowledge, patterns and relationships from this data to assist in our understanding of how our bodies function at a molecular level and to aid in the treatment of diseases such as cancer.

To explore approaches to extracting knowledge from a model, we built a Bayesian network with a limited set of features and a relatively simple structure to identify transcription factor binding sites in vivo, information key to understanding regulation in the genome. While our network did not have the best performance, Bayesian networks are generative and our aim was to gain a better understanding of the features that define a transcription factor binding event.

In this talk I will discuss the approaches we used and the challenges we faced in extracting knowledge from our Bayesian network and linking performance back to the input data to learn about the features of transcription factor binding and how they vary under different conditions.