Project Description

Dr  Denis Bauer


The dawn of cloud-native bioinformatics

Data science and machine learning for bioinformatics

Wednesday 3 July 2019

Dr Denis Bauer is an internationally recognised expert in machine learning, specifically in processing big genomic data to help unlock the secrets in human DNA. Her achievements include developing an open-source, artificial intelligence-based cloud-service that accelerates disease research and contributing to national and international initiatives for genomic medicine funded with over $500M.

As CSIRO’s transformational bioinformatics leader, Denis is frequently invited as a keynote at international medical and IT conferences including Amazon Web Services Summit 2018, International conference on Frontotemporal Dementia’18, Alibaba Infinity Singapore’18 and Open Data Science Conference India’18. Her revolutionary achievements have been featured in international press such as GenomeWeb, ZDNet, Computer World, CIO Magazine, the AWS Jeff Barr blog, and was in ComputerWeekly’s Top 10 IT stories of 2017.

Denis holds a BSc from Germany and PhD in Bioinformatics from The University of Queensland, and has completed postdoctoral research in both biological machine learning and high-throughput genetics. She has 33 peer-reviewed publications (14 as first or senior author), with over 1000 citations and an H-index 14.

Denis advocates for gender equality in IT, and is active on CSIRO’s Inclusion and Diversity committee.

Genomic produces more data than Astronomy, twitter, and YouTube combined, having caused research in this discipline to leapfrog to the forefront of cloud technology. Using machine learning and harnessing radically new architecture patterns, a new cloud-native discipline of bioinformatics is emerging.

The talk illustrates this transformation on the example of disease gene discovery. Here, a Spark-based machine learning framework, VariantSpark, was custom designed to deal with ‘wide’ or ultra-high-dimensional data (80 million columns) to find the genetic origin of ALS in 22,000 whole genome sequences. Made available on Amazon Web Services (AWS) and Microsoft Azure through notebook-style access portals, international researchers can explore large volumes of data in real time.

The talk also discusses a new cloud architecture paradigm, serverless, pitted to become an $8 Billion market for its ability to make analysis more economical, akin to how prefabrication scaled up the construction sector over bricklaying. The talk illustrates, the “search engine for the genome” (GT-Scan), a web-service that enables researchers to identify the optimal spot in the 3 billion letter-long genome to make alterations (CRISPR) that one day helps to cure or prevent diseases.

Providing practical tips for the new cloud-native generation of bioinformaticians, the talk compares cloud setups across AWS, Alibaba and Azure and touches on how to evolve cloud architecture more efficiently through a hypothesis-driven approach to DevOps.