What is shotgun sequencing?

Image credit: Shutterstock

Shotgun sequencing is a way of assembling a long fragment of DNA (such as a gene, a chromosome or even a whole genome) from shorter sequencing reads. It’s sometimes called whole genome shotgun sequencing. 

  • Earlier sequencing techniques produce short fragments of DNA during the sequencing process.
  • These need to be reassembled in the right order to understand the sequence of the full stretch of DNA – such as a gene or even a whole genome.
  • There are two main strategies for this: shotgun sequencing and clone-by-clone sequencing.
  • Both methods are used less frequently now that sequencing techniques can produce increasingly long sequences – although, in the 2020s, shotgun sequencing is the more common technique.

 

What is shotgun sequencing?

 

  • Large, mammalian genomes are particularly difficult to clone, sequence and assemble because of their size and structural complexity. Using earlier sequencing methods, many small DNA fragments produced during the sequencing process have to be reassembled into the right order to understand the full sequence.
  • There are two main methods of putting the sequences back together: clone-by-clone sequencing and shotgun sequencing (this page).
  • Both techniques have been used for several decades – shotgun sequencing was first used in early Sanger sequencing to sequence small genomes, such as those of viruses and bacteria. Clone-by-clone sequencing was favoured during the Human Genome Project.
  • Shotgun sequencing involves randomly breaking up DNA sequences into lots of small pieces and then reassembling the sequence by looking for regions of overlap. It is generally the preferred method today, as it is quicker and less expensive.

 

How does shotgun sequencing work?

 

  • Whole genome shotgun sequencing bypasses the time-consuming mapping and cloning steps that make clone-by-clone sequencing so slow.
  • First, the entire genome is broken up into small fragments of DNA for sequencing. These are of varying sizes, ranging from 2,000 bases to 300,000 bases long.
  • These fragments are sequenced to determine the order of the DNA bases, A, T, C and G.
  • Computer programmes look for where the fragments overlap and begin to assemble them in the right sequence.
  • A good analogy is like shredding multiple copies of a book (which in this case is a genome). The fragments of the book are mixed together then reassembled and pieced back together by finding fragments of text that overlap.
  • This method of genome sequencing was used by Craig Venter, founder of the private company Celera Genomics, to sequence the human genome. Venter wanted to sequence the human genome faster than the publicly funded Human Genome Project and felt this was the best way. Comparatively, the Human Genome Project used clone-by-clone sequencing.
  • Now, as technologies are improving, whole genome shotgun sequencing is being used to improve the accuracy of existing genome sequences, such as the reference human genome.
  • It’s used to remove errors, fill in gaps or correct parts of the sequence that were originally assembled incorrectly when clone-by-clone sequencing was used.
  • Consequently, the reference human genome is constantly being improved to ensure that the genome sequence is of the highest possible standard.

 

What are the advantages of shotgun sequencing?

  • By removing the mapping stages, whole genome shotgun sequencing is a much faster process than clone-by-clone sequencing.
  • Whole genome shotgun sequencing uses a fraction of the DNA that clone-by-clone sequencing needs.
  • Whole genome shotgun sequencing is particularly efficient if there is an existing reference sequence. It is much easier to assemble the genome sequence by aligning it to an existing reference genome.
  • Shotgun sequencing is much faster and less expensive than methods requiring a genetic map.

 

What are the disadvantages of shotgun sequencing?

  • Vast amounts of computing power and sophisticated software are required to assemble shotgun sequences together. To sequence the genome from a mammal (billions of bases long), you need about 60 million individual DNA sequence reads.
  • Errors in assembly are more likely to be made because a genetic map is not used. However these errors are generally easier to resolve than in other methods and minimised if a reference genome can be used.
  • Whole genome shotgun sequencing can only really be carried out if a reference genome is already available, otherwise assembly is very difficult without an existing genome to match it to.
  • Whole genome shotgun sequencing can also lead to errors which need to be resolved by other, more labour-intensive types of sequencing, such as clone-by-clone sequencing.
  • Repetitive genomes and sequences can be more difficult to assemble.

Read more about how the genome is reassembled from fragments in our miniseries.