Project Description

Dr Kim-Anh Lê Cao

The University of Melbourne

Statistical ‘omics integration

Next generation sequencing & bioinformatics

Monday 1 July 2019

Dr Kim-Anh Lê Cao graduated from her PhD in 2008 at the Université de Toulouse, France. Soon after her graduation she moved to Australia and was appointed as a postdoctoral research fellow at the Institute for Molecular Bioscience, University of Queensland, then as a Research and Consultant Biostatistician at QFAB Bioinformatics between 2009 – 2013. Kim-Anh’s research directions veered towards biomedical problems when she moved to UQ Diamantina Institute in 2014 and was awarded an NHMRC Career Development Fellowship (CDF1). In 2017, she joined the University of Melbourne, as a Senior Lecturer at the School of Mathematics and Statistics, and Melbourne Integrative Genomics that hosts biology-focussed researchers with statistical and computational skills. In 2019 she was awarded her NHRMC CDF2, focusing on microbiome studies and received the biennial Moran medal in Statistical Sciences from the Australian Academy of Science.

Dr Kim-Anh Lê Cao is an expert in multivariate statistical methods and develops novel methods for ‘omics data integration. Since 2009, her team has been working on developing the R toolkit mixOmics dedicated to the integrative analysis of `omics’ data to help researchers mine and make sense of biological data ( More information about Kim-Anh’s research group:

Technological improvements have allowed for the collection of data from different molecular compartments (e.g. gene expression, protein abundance) resulting in multiple omics data from the same set of biospecimens or individuals (e.g. transcriptomics, proteomics). We propose to adopt a systems biology holistic approach by statistically integrating data from multiple biological compartments. Such approach provides improved biological insights compared with traditional single omics analyses, as it allows to take into account interactions between omics layers.

In this talk, I will present a dimension reduction multivariate method called DIABLO, which addresses data integration challenges, such as the complexity and sheer size of the datasets, each with few samples and many molecules, and the heterogeneous nature of data measured on different scales and technological platforms. DIABLO is a hypothesis-free method that constructs combinations of variables (e.g. cytokines, transcripts, proteins, metabolites) that are maximally correlated across data types to identify a minimal subset of markers – a multi-omics signature. This signature can highlight novel findings but is also the starting point to network modelling. DIABLO is not limited to a data-driven analysis, and can also handle pathway-based analysis, or a mix of knowledge- and data- driven analyses.

I will illustrate the use of DIABLO in studies we have analysed for bulk omics, microbiome, and single cells. DIABLO is implemented in our package mixOmics, dedicated to omics data integration.