Genome-wide association studies

Genome-wide association studies have led to the discovery of hundreds of genes with a role in common diseases. 

What is a genome-wide association study?

SNPs are places in the genome that are known to vary between individuals and can be associated with a particular disease.

A genome-wide association study (or “GWAS” for short) involves scanning many genomes to find common genetic variations associated with a particular characteristic. Most often they involve looking for single base changes in the DNA called single nucleotide polymorphisms or SNPs. These are places in the genome that are known to vary between individuals and can be associated with a particular disease.

It is called ‘genome-wide’ because it involves looking at many SNPs across the whole genome in one go. For example, researchers may compare many SNPs in the genome in people with a particular disease against those in people without the disease. This enables them to see if there are any genotypes that occur more or less commonly in people with the disease.

How are case-control genome-wide association studies carried out?

To carry out a case-control genome-wide association study the genomes from two groups of people are examined. These two groups have to include:

  1. a set of individuals with the disease or characteristic being studied
  2. a set of people who are similar to the first set but who do not have the disease or characteristic being studied (this is the control group).

Participants provide a blood sample or cheek swab from which their DNA can be extracted.

The participants provide a blood sample or cheek swab from which their DNA can be extracted. Their genome is then analysed, commonly using a SNP-chip, to look at several hundred thousand SNPs across the genome. The two groups are then compared to see if there are any specific SNPs that are more common in one group compared to the other.

If certain SNPs are more commonly found in people with a disease the SNPs are said to be “associated” with the disease.

Differences in the SNPs found in the genomes of these two sets of people can help point towards the genes involved in the development of that characteristic or disease. If certain SNPs are more commonly found in people with a disease, compared to those without the disease, the SNPs are said to be “associated” with the disease.

These SNPs may not directly cause the disease but may just “tag along” with the specific genetic variant that does cause it. After identifying the approximate location of the causal genetic variant, scientists may have to look in more detail at the region of DNA to try to find the exact variant involved in the disease. This may include sequencing the region of DNA associated with the characteristic or disease, to find all of the DNA variation that occurs there. 

Here are the simplified results from a comparison between 1,000 people with heart disease and 1,000 people without heart disease.

Results from a case-control genome-wide association study investigating genetic variants associated with heart disease. Image credit: Genome Research Limited

Results from a case-control genome-wide association study investigating genetic variants associated with heart disease. 

Image credit: Genome Research Limited

From the data we can see that a higher percentage of people with heart disease have the DNA base  ‘C’ at this position in the genome compared to the control group.  But does this mean that a ‘C’ in this position causes heart disease?

We know from looking at the control group that some people who do not have heart disease carry the ‘C’ at this position. This suggests that there may be other genetic variants elsewhere in the genome or environmental factors that also play a role in the disease.

Other genome-wide association studies can be carried out to look at characteristics such as height, blood pressure and BMI.

Case-control genome-wide association studies are just one type of genome-wide association study. Other genome-wide association studies can be carried out to look at characteristics such as height, blood pressure, body mass index (BMI) or insulin levels. In contrast to the case-control genome-wide association studies, in these projects everyone in the study has the characteristic of interest but scientists are looking for associations between differences in their genomes and differences in their characteristics. For example, are some SNPs more common in tall people.

Why are genome-wide association studies important?

One of the key benefits of the GWAS approach is that you can test a very large number of SNPs at the same time.

Armed with the information from genome-wide association studies, scientists can get a better understanding of how diseases develop and how they might be diagnosed and treated.

One of the key benefits of the genome-wide association study approach is that you can test a very large number of SNPs at the same time. As a result, a large number of genetic locations associated with various diseases have been identified. These have then been investigated further to get a better understanding of how diseases develop.

What have genome-wide association studies found so far?

Over the last few years, genome-wide association studies in humans have revealed many genes associated with disease and provided an insight into the mechanisms of a number of complex diseases. 

Genome-wide association studies have helped identify SNPs associated with conditions such as type 2 diabetes, Alzheimer’s disease, Parkinson’s disease and Crohn’s disease.

Genome-wide association studies have helped identify SNPs associated with several complex conditions such as type 2 diabetes, Alzheimer’s disease, Parkinson’s disease and Crohn’s disease. They have also highlighted SNPs that influence an individual’s response to anti-depressant medication and provided an insight into the genetics of obesity.

The first successful genome-wide association study was published in 2005 and investigated patients with age-related macular degeneration (AMD). This is a painless eye condition that can lead to loss of vision. The study found two SNPs were more common in individuals with the condition, compared with individuals without AMD. The study suggested that the regulation of inflammation in the eye was important in the disease, something that few scientists had previously considered. As a result, anti-inflammatory therapies are now being explored as a potential treatment option for AMD.

There are now well over 2,000 specific areas of the genome (loci) that have been associated with one or more complex disease traits.

In 2007, the Wellcome Trust Case Control Consortium (WTCCC) published a paper in Nature. Their study was the first, large genome-wide association study to examine complex diseases using a SNP-chip or array. This chip enabled scientists to scan a person’s entire genome for SNPs in a single experiment. This technique sparked an explosion in the discovery of genetic variants associated with disease (shown in the graph below). As a result, there are now well over 2,000 specific areas of the genome (loci) that have been associated with one or more complex disease traits. This means that scientists have identified 2,000 new biological leads that may help to explain how some diseases develop. These include over a hundred genetic variants associated with inflammatory bowel diseases, such as Crohn’s disease.

Graph showing the number of genome-wide association studies published between 2005 and 2013.

A graph to show the number of genome-wide association studies published between 2005 and 2013.
Image credit: Genome Research Limited

A study looking into the FTO gene found that people carrying one copy had a 30 per cent increased risk of being obese compared to a person with no copies.

Before genome-wide association studies, the only robust association between genetics and either body mass index (BMI) or weight, involved variants in just one gene called Melanocortin 4 receptor (MC4R). Now there are over 30 genes that are associated with BMI. A particularly strong association has been found with a variant in the Fat mass and obesity-associated (FTO) gene. A study looking into this gene found that people carrying one copy of this particular variant had a 30 per cent increased risk of being obese compared to a person with no copies of the FTO gene. However, a person carrying two copies of the variant had a 70 per cent increased risk of being obese. They were, on average, 3 kg heavier than a similar person with no copies. 

What could this mean for the future?

Genome-wide association studies are currently being used to investigate many diseases. These include autoimmune disorders, where the body’s immune system attacks and destroys healthy body tissue by mistake, and metabolic diseases, where there are problems with how the body absorbs or makes energy from food.  

There is no doubt that genome-wide association studies will lead to a better understanding of disease mechanisms, which will in turn lead to improved, novel and perhaps more personalised treatments for disease. However, the translation of information from genome-wide association studies to practical application, takes time.

The majority of discoveries from genome-wide association studies have only been made since 2007. Although significant, a lot of work still needs to be done to compile a full list of each of the genetic variants that contribute to disease and find out exactly how they influence disease risk. Only then will scientists be able to apply this knowledge to develop new treatments for disease.

In addition to this, genome-wide association studies cannot realistically provide all the answers. Although a lot of genetic variants have been linked to disease not all are found to be directly relevant to understanding the disease.

Many genome-wide association studies have led to new biological knowledge about genes and disease that was otherwise absent a decade ago.

But, many genome-wide association studies have led to new biological knowledge about genes and disease that was otherwise absent a decade ago. This knowledge has many applications for the improvement in our understanding about the role of genetics in disease, as well as the development of new diagnostics and treatments for people with certain diseases. As DNA sequencing becomes faster and cheaper, we will be able to sequence and compare many more areas of the genome, rather than just SNPs. As a result our knowledge of disease will increase further over the coming decades.

This page was last updated on 2016-06-13