Harnessing the power of bioinformatics
Large volumes of biological data are constantly being generated as high-throughput sequencing technologies become increasingly accessible; yet there is often a backlog in the processing and analysis of this data. The TrainMALTA project provides training in bioinformatics analysis, enabling researchers to gain new insights into the genetic causes of disease, as Dr Rosienne Farrugia explains A wide range of bioinformatics techniques and software tools are available today, enabling researchers to draw new insights from biological datasets. The volume of data being generated and the rapid development of bioinformatics techniques means there is an ongoing need to provide high-quality training, an issue which lies at the core of the TrainMALTA project. “We saw a need to provide training, to improve local expertise in bioinformatics, with a specific focus on the analysis of high-throughput sequencing data including genomics, RNA transcriptomics and epigenetics work. We also aim to tie that in with our other ongoing research into the background of disease,” says Dr Rosienne Farrugia, the project’s Principal Investigator. Bioinformatics is a key element of modern medical research, enabling scientists to analyse biological data in greater depth. “We focus on providing training in the use of informatics, command line open-source tools and high-throughput analysis pipelines, to query biological datasets. The main type of biological data sets that we are looking at are those generated from high throughput sequencing,” continues Dr Farrugia. This could be whole exome or genome
70
sequences from the DNA of a group of people, many with a specific condition. The power of informatics can then be applied to sieve through these huge volumes of data and investigate certain research questions; Dr Farrugia and her colleagues are looking at several biological datasets. “In one of the projects I work on together with Dr Stephanie Bezzina Wettinger, we study relatively rare diseases, and we try to identify the mutation or mutations giving rise to each disease,” she outlines. In another project researchers are investigating myocardial infarction (MI), a highly complex condition. “We usually look at pathways. Which pathways are affected? Which pathways have accumulated changes? We look at people who have had a heart attack and people who haven’t, and compare the data,” says Dr Farrugia. “In both of these cases, the rare diseases and MI, we need to query these big volumes of data in different ways, but always within a biological context of the specific condition. Informatics is being used in this way to gain a deeper understanding of biological processes. This cannot be done manually – nobody could manually sieve through all that data in a reasonable timeframe.”
High-throughput sequencing This is due not only to the volume of the data being generated, but also the complexity of it. With high-throughput sequencing, the DNA of an individual is effectively chopped up into many small fragments, which are then ‘read’ using appropriate chemistry. The data from all the fragments then have to be put together again, from which point researchers can start to analyse the data. “Once the sequence has been put together again, we can start to draw comparisons and pull out a list of variants, a list of differences,” explains Dr Farrugia. With the project on MI, Dr Farrugia says researchers have a collection of approximately 1,000 individuals, including people who have had an MI and people who haven’t. “There may be millions of differences between a patient sample and the control. So you need to pull out these differences, and then find out the difference that is the cause of the condition,” she says. “For a complex condition like MI, it will very rarely be a single causative mutation. So we look at individual pathways instead of the entire genome, and we investigate whether patients have differences in that pathway when compared to the controls.”
EU Research