What happens to DNA sequence when it comes off a sequencing machine?

DNA sequencing produces huge amounts of data essentially comprising of lots of short sections of DNA letters. The first step is to check that the sequence is of the highest quality before we start to piece the sections together. 

Bioinformatics 1: quality control

What’s the challenge?

  • After we have sequenced a sample of DNA we need  a process to check that:
  1. The laboratory stage of the process, that prepares the DNA sample for sequencing, has worked properly
  2. The instrument carrying out the sequencing itself has run properly
  3. The DNA sample is from a single source and not been contaminated with DNA from another sample.

What do we need to do?

  • Quality control is an extensive set of procedures carried out to ensure that the sample and DNA sequence are of good quality. It is used to check that all of the DNA sequence is:
  1. Of suitable quality that it can be sent on and used for the scientific study. 
    • One way this is assessed is by looking at how much DNA (measured in clusters) are in every mm2 of each lane of the sequencing machine.
    • For a sample to be accepted there should be hundreds of thousands to millions of clusters of DNA per mm2 of each lane (depending on the sequencing machine being used). If the number of clusters is outside the range for a certain machine it indicates that something has gone wrong during sequencing and the sample will not be accepted for further processing.
    • The strength of the signal from the DNA bases in the sequence is also measured. The signals should be as bright as possible, particularly for the first base in the sequence. If the signal is dull it means that something might have gone wrong or that the camera on the machine was out of focus.
  2. The DNA sample is not contaminated with DNA from another sample.
    • This is checked by aligning the DNA sequence against the reference genome for that organism and checking that it matches with the species it should be. For example, if you have sequenced a mouse genome you would expect to see a 98-99 per cent match to the reference mouse genome and much lower matches with other reference genomes. It will never be 100 per cent because there is always some genetic variation between individuals of the same species. 
    • Individual ‘tags’ are added to each DNA sample before sequencing. These tags are short sequences of DNA that act as barcodes to identify DNA fragments from the same individual. These can then all be easily identified and sorted afterwards. After sequencing, if a tag does not appear in a sample when it should it is a sign that something has gone wrong before or during sequencing. This may be a result of contamination or human error.
  • The time taken to transfer the sequence data off the machines and then undergo primary analysis takes about three to four days to complete. Although, the manual quality control process usually only takes about one hour.
  • After this the sample will then either be passed or failed.
  • If the sample is failed, the failed sequence will be discarded and sequencing will be carried out again.
  • For all the samples that pass, the DNA sequence is stored in a large data ‘bucket’ along with additional information about the sample. This will include which sample the DNA sequence is from, which species it is from and which study the genome was sequenced for. 


This page was last updated on 2021-07-21