Reading Genomes: A brief history of next-generation sequencing.


We’ve talked a lot about DNA, genes, and the identification of specific mutations in specific genes related to cancer progression.  But how is this done?  What is the methodology?  Why should you care? What are the practical applications for research? For patients? What are the limitations? 

The technology we are talking about is called “next-generation sequencing” and it is used to decipher the genomes of entire organisms or, for our purposes, the genome of cancers.  In fact, the genetic differences between cancer types or those responsible for intratumor heterogeneity, as we previously discussed (see Mutational Landscape of Cancer, Intratumor Heterogeneity), were identified through the use of next-generation sequencing.  In this three-part discussion, we will first learn about the methodology involved in this latest technology and in the following discussion we will delve into its implications for both researchers and patients.

Imagine your genome or your cancer genome are like books, composed of a series of letters, our DNA.  There are four letters called nucleotides in our genomic alphabet (A, T, C and G) and like letters, the number of nucleotides and the sequence of these letters create specific words or genes.  Like words organized into sentences and paragraphs, the series of genes are organized onto chromosomes.  Our entire genomic book is composed of 23 chromosomes.  Sequencing the entire genome of individuals or that of specific cancers is akin to reading that book.  And the longer the book, the more difficult it can be.  Additionally, spelling mistakes, or mutations can further complicate the reading.

Our ability to read through genomes, to sequence the genes and the DNA that makes up those genes is based on our understanding of the structure of DNA.  Thanks to Watson and Crick, we know that DNA is configured as an anti-parallel double helix.  Essentially this means DNA consists of two strands that twist together in a head-to-toe fashion.  Think of it like a zipper, each half binds, connects, to the other half to make a tightly closed structure.  The most critical aspect of this structure lies in the fact that the two DNA strands are complementary.  Through experimentation, we know that each of the 4 nucleotides binds each other with precise specificity: A only binds T and C only binds G.  Thus, if we know the sequence of one strand, we can easily deduce the sequence of the second.

Sanger Sequencing: the Gold Standard
For the last 35 years, the gold standard for sequencing has been Sanger sequencing also referred to as the chain termination method.  Although next-generation sequencing (NGS) has taken this knowledge to the next level, allowing for analysis of large genomes in a quick and accurate manner, Sanger sequencing is still widely utilized for its simplicity and cost-effectiveness.  To understand how NGS has improved our ability to read DNA sequences, we need to understand the methodology behind Sanger sequencing.
This method utilizes our knowledge of how DNA is replicated in normal cellular processes.  It involves unzipping template DNA (the DNA that needs to be sequenced) to create single stranded DNA and mixing it with a short single stranded complementary DNA strand, called a primer, that specifically binds to the template DNA.  The reaction is started by the addition of an enzyme called a polymerase and a mixture of labeled nucleotides (the letters of our DNA).  The DNA polymerase is the powerhouse behind DNA replication, adding complementary nucleotides (ddNTPs) one by one.  In sequencing reactions, these four nucleotides bases (A,T,C,G) have been altered in two ways: they are labeled (by fluorescence) for identification in downstream reactions, and they are modified so that elongation terminates upon their addition.  By halting the elongation with one of these labeled ddNTPs, the length of the fragment can be utilized for interrogating the base identity of the terminating base.  Future reactions include capillary or gel electrophoresis which essentially separates sequences by length and then identifies the terminating base (See Figure 1). 


Figure 1: Sanger Sequencing1.


The success story of Sanger sequencing belongs to the Human Genome Project.  Sequencing the human genome required the cooperation of multiple international research institutions and the injection of billions of dollars by governments and private corporations.  After more than a decade, an understanding of what the human genome looks like was generated.2  Such knowledge yields power for understanding the genetic basis behind all diseases from cancer to neurological disorders.  It also helps answer some basic biological questions: why do we taste bitter foods? Why do we see colour? The implications for this technology are huge. However, limitations including speed, scalability, and resolution or accuracy prompted the development of technologies that could sequence larger sequences more quickly and more accurately, and therefore answer more genetic questions.

Next generation sequencing: Illumina
Next generation sequencing is the all-encompassing term for these new methodologies that aim for high-throughput analysis of large sequences such as entire chromosomes or even entire genomes.  Several biotech companies developed NGS methods that differ in their template preparation, method of sequencing and types of analysis.  For an example, let’s highlight the method used by Illumina.



Figure 2: Illumina Next Generation sequencing3.

The first step in this process involves fragmenting the DNA to be sequenced into small fragments. These fragments attach to adaptors which are essentially primers.  This solid surface substrate is propriety for this technology.  The next step involves amplifying these DNA fragments in a manner referred to as bridge amplification.  With the same polymerase used in Sangar sequencing, complementary DNA strands are created.  With this technology, up to 1000 identical fragments can be generated.  This amplifaction uses another propriety element: the incorporation of 4 nucleotides each labeled with a different dye.  Like Sanger sequencing, these modified nucleotides also terminate the reaction after addition to the DNA fragment.  In this way, after laser excitation, the emitted fluorescent signal is captured and the subsequent sequencing reactions can proceed.  These DNA clusters are sequenced, one base at a time and then aligned to a reference sample.


The advantages of NGS compared to Sanger sequencing are clear.  Whereas Sanger sequencing sequences one region at a time, next generation sequencing can sequence multiple fragments simultaneously, speeding up the process tremendously.  Additionally, Sanger sequencing generally only sequences a specific region a limited number of times.  This “read-depth” is dramatically improved with NGS technology whose techniques allow for deep sequencing.  This increases the accuracy of the seqeunce.  With the increased speed and accuracy, the ability to sequence large genomes is now feasible in a short amount of time. 

Once a sequence has been determined it can be mapped to the latest human whole-genome reference using computational algorithms.   The technology to accurately map samples and identify mutations has also dramatically improved in our technological age.  When it comes to using this technology for the identification of mutations in cancers, additional factors need to be considered including germline mutations (what the patient carries independent of the cancer) as well as single nucleotide polymorphisms or small difference in coding DNA.  The interpretation of this data requires intensive knowledge of genetics. 

Stay tuned for the implications of this technology!  We’ll be discussing the role of NGS in the cancer clinic as well as in the cancer research lab.

Today’s uncovered cancer morsel: The advancement of technology directly impacts cancer research and cancer patients.


References
     2.     International Human Genome Sequencing Consortium.  Initial sequencing and analysis of the human genome.  Nature.   
           2001. 409: 860-922.
     3.     http://res.illumina.com/documents/products/techspotlights/techspotlight_sequencing.pdf
     4.     Michael Metzker.  Sequencing technologies – the next generation.  Nature Reviews: Genetics.  2010. 11: 31-46