Training & Development Committee member Chris Nowak interviewed Tim Frayling, PhD, about the role of big data in genetics.
May 2016
The UK Biobank offers a unique open access resource with records of 500,000 people aged 40-69 years who, between 2006-2010, underwent detailed measurements, including genotyping, biochemical tests and lifestyle questionnaires, and who have been followed up in health registries ever since. With the emergence of the first high-impact publications based on the not-for-profit UK Biobank project, its value is increasingly being recognized by the international research community. For almost two decades, Tim Frayling, PhD, professor of human genetics at the University of Exeter in the UK, has been at the forefront of molecular genetics research–not least since his leading role in discovering the FTO locus for obesity. He urges trainees to be bold and innovative in making use of the opportunities offered by resources like the UK Biobank, and to become used to handling large datasets and bioinformatics applications.
“It is quite simply mind boggling,” says Dr. Frayling about the UK Biobank, that “every qualified scientist on the planet will be able to access, in one study, genome-wide genotype and extensive phenotype data from half a million individuals. Consortia of hundreds of scientists and hundreds of studies have taken seven or eight years to build up sample sizes of over 100,000 individuals for genome-wide association studies.”
So, with these new opportunities, what has changed in recent years? “Genome-wide association studies are now a routine tool that inform other experiments–follow-up in vitro, in vivo, and clinical studies,” summarizes Dr. Frayling. “That said, most variants that influence common diseases and traits do so with very subtle effect sizes. Studies of millions of individuals will continue to identify more fascinating leads into new biology.”
Beyond its size, Dr. Frayling considers the UK Biobank’s main strength “that anyone with a scientific qualification to understand and analyze the data can apply for it and use it–from a PhD student in the smallest university to the most famous professor in the largest institutions. It is a major advance for the democratization of sciences. The only limiting factors are your ideas and having enough trained people in your team to implement them.” To make the most of the growing possibilities offered by big data science, Dr. Frayling’s advice to trainees is to “get yourself trained in handling large datasets – make sure you are familiar with Unix/Linux computing environments and statistical genetics softwareand some basic coding. Then apply for the data and get going!”
Thinking about the future, how can projects like the UK Biobank make a difference in clinical practice? “I don’t think general physicians will be looking at people’s genetic profile to make decisions on every patient,” says Dr. Frayling. “As well as rare monogenic diseases and profiling tumors, the exceptions will be those patients who fall into uncertain diagnostic categories–a patient with diabetes diagnosed at age 30 but with a body mass index in the obese range, for example–a type 1 diabetes risk score will help inform a clinician as to whether their patient has type 1 or type 2 diabetes. What is beyond doubt in my opinion is that genetic associations will inform biology and disease etiology and ultimately drug development. A recent paper showed how a drug target validated by a human genetic association was twice as likely to make it through the drug development pipeline.”
Researchers can gain access to the UK Biobank by submitting an application for ethical approval. As the protagonist K. in Franz Kafka’s novel The Castle put it: “You can encourage someone with his eyes blind-folded to see through the blindfold as much as you like, he’ll still never see a thing until the blindfold is removed.” So start acquiring your data analysis skills early!