Project Description

Dr  Denis Bauer

CSIRO

How novel compute technology and artificial intelligence transforms life science research

Data science and machine learning for bioinformatics

Tuesday 3 July 2018

Dr Denis Bauer is the team leader of the transformational bioinformatics team at Australia’s research agency, CSIRO. She has a PhD in Bioinformatics from the University of Queensland and held Postdoctoral appointments in biological machine learning at the Institute for Molecular Bioscience and high-throughput genetics at the Queensland Brain institute. Her machine learning solutions for processing genomic and genome engineering Big Data efficiently have been featured in Blogs (AWS Jeff Barr, Databricks) as well as the international press (GenomeWeb, CIO magazine) and was included in Computer Weekly’s “Top 10 IT stories of 2017”. She presented the Keynotes at YOW! 2017 (Sydney, Melbourne, Brisbane), Agile India 2018 and AI Dev Days 2018. She is involved in national and international initiatives (funded with $200M) tasked to include genomic information into medical practice. She has 31 peer-reviewed publications (14 as first or senior author) with 7 in journals of IF>8 (e.g. Nature Genetics) and H-index 12. To date she has attracted more than $6.5Million in funding as chief investigator.

Genomic data is outpacing traditional Big Data disciplines, producing more information than Astronomy, twitter, and YouTube combined. As such, Genomic research has leapfrogged to the forefront of Big Data and Cloud solutions. This talks outlines how we use Apache Spark to identify genomic association on population-scale whole genome sequencing data, as well as how the accuracy of genome editing approaches can be improved with massively parallel server-less cloud functions. Furthermore, analysing this sheer volume of data has become a complex task, which we solve using artificial intelligence or machine learning. This talk hence also showcases our custom random forest implementations to deal with the 80 million features of genomic data and allow the scoring of genomic target site activity in less than a second.