Cordelia Langford

Genome Analysis Production

Cordelia is Head of Genome Analysis Production and has worked at the Sanger Institute since 1994. She is responsible for managing DNA pipelines that provide data on sequence variation and gene expression. She has contributed to a number of studies investigating the genetic components of human diseases and is currently involved in the UK 10,000 Genomes Project (UK10K). This study aims to sequence 10,000 human genomes to uncover many rare genetic variants important in human disease.

Previously, Cordelia worked at the MRC Laboratory of Molecular Biology and the BBSRC Babraham Institute in Cambridge. When she started at the Sanger Institute she was responsible for flow-sorting chromosomes that were destined for sequencing as part of the Human Genome Project. In 1999 she became manager of the newly established Microarray Facility. She then completed her PhD and became Head of the Microarray Facility in 2003.

Microarray (or DNA chip) technology involves the use of silicon slides with millions of fragments of DNA on their surface. Microarrays are used to identify which genes are expressed or present within a sample and regions of the human genome that have been duplicated or lost in individuals. This information can identify changes to DNA that may lead to the development of a disease. Cordelia established the Institute’s own production of microarrays. This involves robots that create an array by ‘printing’ DNA fragments on to a slide in a microscopic grid pattern.

Cordelia was appointed Head of Genome Analysis Production in September 2009 with additional responsibility for the sample logistics and genotyping facilities. In 2010, she established a new core pipeline for exome sequencing (sequencing the coding regions of the genome) in preparation for delivering large-scale programmes.






Click for Transcript

Please select a video and then open the transcript!

My role at the Sanger Institute
Cordelia Langford 1:03 min - 6,547 kb
My name’s Cordelia Langford and I’m an Operational Manager. My title, my job title is Head of Genome Analysis Production and I’m responsible for managing DNA pipelines within the Institute. We’ve got responsibility for taking thousands of samples a week and passing them through pipelines which consist of a series of flows of work through different labs to generate thousands of points of data for analysis by the scientists. And the way that we use the data is to analyze gene function. Part of my role involves translating the science vision of the faculty into practical aspects and protocols and procedures. So I keep an eye on cutting edge technology. I have a view of what’s coming round the corner, what’s been designed recently and become available and I introduce those into the Institute to formulate new pipelines and activities.
The DNA pipelines at the Sanger Institute
Cordelia Langford 1:23 min - 8,629 kb
I’m responsible for eight pipelines within the Institute, and they're all referred to as DNA pipelines. So, loosely, this means that we input DNA or RNA samples at one end and we generate high quantities of data at the other end. Each one of the pipelines has a specific role and each one of them will comprise a series of laboratory activities and the samples or the data would flow from one of these activities to another. There are a range of applications or of roles of each particular pipeline and at the hub of all of those activities is the Sample Logistics Facility, and as its name suggests, the staff within that facility are responsible for importing samples externally to the Institute. The samples are logged and archived and tracked and they are then re-formatted and distributed amongst the different DNA pipelines depending on what the biological question is that we’re interested in. It’s probably worth mentioning that we handle well over a hundred thousand samples a year through that conduit. And then the other seven pipelines are responsible for performing a particular research assay on the samples.
What is sequence capture and exome sequencing?
Cordelia Langford 1:40 min - 10,406 kb
The fourth pipeline involves a process that we call targeted sequencing, but this time, instead of using PCR, we use a process that’s called sequence capture, and this doesn’t involve the polymerase-chain-reaction – actually we use a different approach. We take genomic DNA and fragment it and generate what is called a standard library for sequencing, and this sort of library would typically go down the standard route into sequence production. After generating the library, we check its quality, and if it’s looking good we go through a subsequent step where we isolate out fragments of interest. And so what we’re doing is fractionating the library so that we can only send bits of the genome of interest down for sequencing. The reason for doing that is because if we fractionate the library, or the whole genome, it means that we can get sequence information much more quickly, and actually it’s a lot cheaper to perform that sort of process, and at the moment our main application for that is to isolate out all of the coding regions of the genome, and this is referred to as the exome, because exons are the annotated coding regions of the genome.
How do we prepare libraries for exome sequencing?
Cordelia Langford 1:09 min - 7,173 kb
The library that we’ve constructed is fragmented through a process that’s called pull-down or sequence capture, and to do that, we take the library and we mix it in solution to hybridize with what we call baits. And these baits are made up of millions of short strands of RNA which are complimentary in sequence to the exons or the coding regions of the genome, and because we’re driving a hybridization process, the strands from the library that represent the exons will bind to – through this process of hybridization – the RNA baits of interest. And the really interesting part of the protocol is that we attach the RNA baits to magnetic beads, and we use magnets to isolate out the bound fraction that we’re interested in. So what that means is that we can wash away all of the un-bound fraction of the genome that’s not coding, and we can release from the magnets the coding, the exome, the coding region, and send that down for sequencing.
The Gene Expression Pipeline at the Sanger Institute
Cordelia Langford 0:55 min - 5,763 kb
When we process samples through the Gene Expression Pipeline, we take RNA which has been isolated from cells, and this is applied to a DNA chip to enable us to measure the activity of genes within a particular cell sample, and so the applications of this might be to study gene activity, i.e. which gene is switched on, which is switched off during development of normal cells of organisms, but we can also apply it when samples have been taken, for example, from clinical patients and they have perhaps already got a diseased state, maybe they’re suffering from coronary artery disease, and we can extract RNA from a blood sample and identify and measure the activity of genes once that diseased state has manifested itself.
The Array CGH [Comparative Genomic Hybridisation] Pipeline at the Sanger Institute
Cordelia Langford 1:35 min - 9,909 kb
The Array CGH [Comparative Genomic Hybridisation] Pipeline also involves the use of DNA chips or micro-arrays, and this time we are interested in measuring the genome biology or the structure of the genome. We would take genomic DNA that’s been isolated from a cell population perhaps or perhaps a clinical sample from a patient and we would apply it to a DNA chip and the readings that we would get out from that chip would help us understand whether there have been mutations in the genome on quite a large structural scale. So the sort of mutations that I’m talking about might be addition of extra material (and we refer to those as insertions) or there might have been a loss of material (which we refer to as deletions). So the sort of variations, these large-scale variations that I’m referring to would typically have occurred prior to the development of the disease. So it's very important for us to understand what sort of variations like that occur in a healthy individual and to compare the profile of the genome with what sort of mutations might have occurred in an individual that has developed a disease.
The Genotyping Pipelines at the Sanger Institute
Cordelia Langford 1:27 min - 9,064 kb
So there are actually three pipelines for performing genotyping and each of them is subtly different. Two of them are probably performing the same process but we are using different commercial products and the third one, rather than being on a genome-wide scale or rather than giving us the ability to profile SNPs on a genome-wide scale. The Sequenom platform enables us to just focus in on a very small number of SNPs and it is very high thoughput, so what I mean by that is that we could actually process something like 30 thousand samples in a single week. But we would only be analyzing the status of about 50 or maybe 100 SNPs. So if we compare that to the Illumina or the Affymetrix pipelines we would be processing maybe three thousand samples a week but we would actually be interrogating up to a million SNPs each time.
Working with researchers to investigate new areas of genomics
Cordelia Langford 1:50 min - 11,516 kb
Quite often when a researcher comes along they’ve often got a preconceived idea, and so we'll discuss what all of the options are, but sometimes someone will come in who wants to ask a particular question or generate some biological data and they just don’t know what platforms are available or what can be done. And so we can go through all of the options to work out whether we want to process quite cheaply perhaps tens of thousands of samples but just generate a small amount of data, maybe about a very small region of the genome or perhaps just a few single nucleotide polymorphisms. Or alternatively we might be looking at just a few thousand samples or maybe even just a hundred samples and the option is there to generate genome-wide data and what we would be doing would be balancing the practical aspects of each of those different choices against the cost, the budget available, the capacity in the pipelines and how much sample is available. From a typical gene expression study where we’re analyzing the activity of genes, we are able to look at the activity of every single gene in the genome in one experiment so you can imagine that this generates a vast amount of data. Quite often we find that the researcher is overwhelmed with the amount of information that’s generated from a single experiment. And in that way we are able to provide guidance about what the probably the most important results are to allow them to plan follow-up experiments. And so that’s where informatics and statistics comes in to help us prioritize follow-up experiments from the huge amounts of data that we’re able to generate.
The most interesting part of my job
Cordelia Langford 0:25 min - 2,599 kb
I think probably the most interesting aspect of my work is trying to understand what technologies are being developed, what’s just around the corner. It’s always developing at a really fast pace, sometimes it’s difficult to keep up, and I’m intrigued to know what’s going to be revealed as the next big thing that’s going to continue to help us develop and understand our knowledge of the genome and how it works.