The central dogma of genetics is that the genome, comprised of DNA, encodes many thousands of genes that can be transcribed into RNA. Following this, the RNA may be translated into amino acids giving a functional protein. While the genome of an individual will be identical for each cell throughout their body, the number of transcribed copies of each gene, as RNA, will differ due to the different functional requirement of each tissue type. An important area of research within genetics is to study the genome in ‐ action, through RNA. For example, by comparing the quantities of each gene’s RNA between different tissue types, through development, in disease or in different environments – known as differential gene expression analysis.
RNA ‐ Seq, or high throughput RNA sequencing, has accelerated research in this area. The technology works by reverse transcribing the RNA back into DNA, sheering it into smaller fragments, then reading each fragments sequence in parallel to give millions of short “reads”, each between approximately 50 ‐ 200 bases in length. With this data comes a computational and statistical challenge because the biology must be inferred from millions of short sequences. Along with technical biases, there is true biological variability between samples of the same type, which must be accounted for.
In this talk I discuss the applications of RNA ‐ Seq, its challenges and some of the bioinformatics strategies being employed to analyse this complex data. In particular, I will focus on the steps involved in differential gene expression analysis, for both model organisms, like human, and more exotic organisms, without a sequenced genome .