Evolution’s Deluge of Data

01 February 2013

The South Australian Museum's Evolutionary Biology Unit is an incredible hub of talented researchers who are asking the key questions about life: how are organisms related? How do we adapt to our environment? What roles do particular genes play in evolution? Their studies offer amazing windows into the past and the future.

Advances in technology mean researchers can generate more data about organisms than ever before. So much in fact, that finding, using and keeping the information can be a very difficult and expensive task.

To produce the essential, complex comparisons that illustrate stories of evolution for us, scientists pour through hundreds of millions of DNA sequences, which is provided to them in the form of the letters A, C, G and T. They find the same gene in different organisms and compare them to make decisions about its behaviour or physical changes. Managing the information takes such skill that an entirely separate field of science – Bioinformatics – exists.

The South Australian Museum's Senior Researcher in Bioinformatics, Dr Terry Bertozzi, is in charge of finding, analysing and storing genetic information for the many important projects underway in the Evolutionary Biology Unit. Many studies underway including partnerships with the University of Adelaide and Flinders University, need genetic data. He says a bioinformatics scientist needs a background in computer programming to handle the important job.

"It's a really big shift in the way that you would traditionally approach your data. You can no longer manually look at a sequence and then make some determination about what it is or what to do with it. The data files are just too big. You really need to use computer programs specifically designed for that study to do the work for you."

Next generation DNA sequencers output the data in one or more large text files. The sequencer also produces a quality score so that scientists have an idea of how reliable the DNA sequences are. There is generally so much data that sequences that are too poor in quality are just thrown away.

Before this technology existed, researchers would scan the published literature hoping to find gene markers that had been developed in a similar organism and use those markers in their study. Sometime they worked, sometimes they didn't.

"We've developed ways of making marker selection easier and nore targeted. We can take some next generation sequence data and filter it in particular ways to find good markers that we can then test to make sure they're appropriate to use in a study. It means we can answer the questions we want to ask in a faster and more rigorous way."

Dr Bertozzi says that many of the computer programs that are used to manipulate DNA data are not designed for Windows or Mac but run on Unix based operating systems using a command-line interface.

"A lot of students haven't grown up with that and aren't particularly comfortable working on the command-line, so we need to upskill the students who we work with. I'll be developing some teaching materials, to hopefully make sure they understand their data and the analysis process better."

Last year the Museum purchased a high-end large memory computer server which gives Museum researchers the capability to run long and complex analyses. Whole genomes and transcriptomes can now be assembled from many small DNA sequences which wouldn't be feasible without this resource.

Dr Bertozzi likened it to "a huge jigsaw puzzle with a hunded million pieces and trying to fit it together without knowing what it should look like at the end."

He tries to keep on top of the latest technology in the field of Bioinformatics. "The field is changing so quickly that what was in yesterday is no longer relevant," he says.

"There are myriad papers coming out with new techniques and new algorithms to analyse data. We choose those that best address the specific questions our scientists are asking. It's a steep learning curve and very difficult to keep on top of all of the literature being published."

The latest significant publication from the unit examines the notion of 'gene jumping' between organisms. Some animals such as snakes, cows and elephants share similar genetic information that don't follow traditional genetic inheritance. The paper was published in January 2013 in the Proceedings of the National Academy of Sciences in the USA, and covered by media outlets including National Geographic.

Dr Bertozzi is currently working with Principal Researcher Dr Steve Cooper to understand the evolution of the eyes of beetles that live underground. They have sequenced beetle genomes and will use the information to compare the genes of those with full eye function and those with reduced eye function, to see whether blindness in the underground organisms is reflected in their genes.

He is also working with Professor Steve Donnellan to examine the recent evolutionary history and origins of the humble black rat that managed to be such an effective carrier of disease such as the Plague.

"We're using the genetic signals in the DNA to try and make sense of the world around us," says Dr Bertozzi, "and it's very satisfying when the puzzle comes together."

"Bioinformatics is becoming more and more important as our capcity to generate data increases. Bigger data sets not only take up a lot of disk space and take longer to analyse, but the resulting files just get bigger and bigger the more you work on them! Efficient storage and management of information is one of the biggest hurdles facing biologists."

The Evolutionary Biology Unit is also lucky enough to have access to the Museum's Australian Biological Tissue Collection, which includes about one hundred thousand valuable frozen tissues. Having the samples onsite means even greater access to vital information.