Extracting knowledge from statistical and machine learning approaches is a challenge faced by many researchers including those in the field of bioinformatics. A large amount of data is now available that captures multiple aspects of human biology at the cellular level and we are faced with the task of extracting knowledge, patterns and relationships from this data to assist in our understanding of how our bodies function at a molecular level and to aid in the treatment of diseases such as cancer.
To explore approaches to extracting knowledge from a model, we built a Bayesian network with a limited set of features and a relatively simple structure to identify transcription factor binding sites in vivo, information key to understanding regulation in the genome. While our network did not have the best performance, Bayesian networks are generative and our aim was to gain a better understanding of the features that define a transcription factor binding event.
In this talk I will discuss the approaches we used and the challenges we faced in extracting knowledge from our Bayesian network and linking performance back to the input data to learn about the features of transcription factor binding and how they vary under different conditions.