We’ve talked a lot about DNA, genes, and the identification
of specific mutations in specific genes related to cancer progression. But how is this done? What is the methodology? Why should you care? What are the practical
applications for research? For patients? What are the limitations?
The technology we are
talking about is called “next-generation sequencing” and it is used to decipher
the genomes of entire organisms or, for our purposes, the genome of
cancers. In fact, the genetic
differences between cancer types or those responsible for intratumor
heterogeneity, as we previously discussed (see Mutational Landscape of Cancer, Intratumor Heterogeneity), were identified through the
use of next-generation sequencing.
In this three-part discussion, we will first learn about the methodology
involved in this latest technology and in the following discussion we will
delve into its implications for both researchers and patients.
Imagine your genome or your
cancer genome are like books, composed of a series of letters, our DNA. There are four letters called
nucleotides in our genomic alphabet (A, T, C and G) and like letters, the
number of nucleotides and the sequence of these letters create specific words
or genes. Like words organized
into sentences and paragraphs, the series of genes are organized onto
chromosomes. Our entire genomic book
is composed of 23 chromosomes.
Sequencing the entire genome of individuals or that of specific cancers
is akin to reading that book. And
the longer the book, the more difficult it can be. Additionally, spelling mistakes, or mutations can further
complicate the reading.
Our ability to read through
genomes, to sequence the genes and the DNA that makes up those genes is based
on our understanding of the structure of DNA. Thanks to Watson and Crick, we know that DNA is configured
as an anti-parallel double helix.
Essentially this means DNA consists of two strands that twist together
in a head-to-toe fashion. Think of
it like a zipper, each half binds, connects, to the other half to make a
tightly closed structure. The most
critical aspect of this structure lies in the fact that the two DNA strands are
complementary. Through
experimentation, we know that each of the 4 nucleotides binds each other with
precise specificity: A only binds T and C only binds G. Thus, if we know the sequence of one
strand, we can easily deduce the sequence of the second.
Sanger Sequencing: the Gold Standard
For the last 35 years, the gold standard for sequencing has
been Sanger sequencing also referred to as the chain termination method. Although next-generation sequencing
(NGS) has taken this knowledge to the next level, allowing for analysis of
large genomes in a quick and accurate manner, Sanger sequencing is still widely
utilized for its simplicity and cost-effectiveness. To understand how NGS has improved our ability to read DNA
sequences, we need to understand the methodology behind Sanger sequencing.
This method utilizes
our knowledge of how DNA is replicated in normal cellular processes. It involves unzipping template DNA (the
DNA that needs to be sequenced) to create single stranded DNA and mixing it with
a short single stranded complementary DNA strand, called a primer, that specifically
binds to the template DNA. The
reaction is started by the addition of an enzyme called a polymerase and a
mixture of labeled nucleotides (the letters of our DNA). The DNA polymerase is the powerhouse
behind DNA replication, adding complementary nucleotides (ddNTPs) one by one. In sequencing reactions, these four
nucleotides bases (A,T,C,G) have been altered in two ways: they are labeled (by
fluorescence) for identification in downstream reactions, and they are modified
so that elongation terminates upon their addition. By halting the elongation with one of these labeled ddNTPs,
the length of the fragment can be utilized for interrogating the base identity
of the terminating base. Future
reactions include capillary or gel electrophoresis which essentially separates
sequences by length and then identifies the terminating base (See Figure 1).
Figure 1: Sanger Sequencing1.
The success story of Sanger
sequencing belongs to the Human Genome Project. Sequencing the human genome required the cooperation of multiple
international research institutions and the injection of billions of dollars by
governments and private corporations.
After more than a decade, an understanding of what the human genome
looks like was generated.2
Such knowledge yields power for understanding the genetic basis behind
all diseases from cancer to neurological disorders. It also helps answer some basic biological questions: why do
we taste bitter foods? Why do we see colour? The implications for this
technology are huge. However, limitations including speed, scalability, and
resolution or accuracy prompted the development of technologies that could
sequence larger sequences more quickly and more accurately, and therefore
answer more genetic questions.
Next generation sequencing: Illumina
Next generation sequencing is the all-encompassing term for
these new methodologies that aim for high-throughput analysis of large
sequences such as entire chromosomes or even entire genomes. Several biotech companies developed NGS
methods that differ in their template preparation, method of sequencing and
types of analysis. For an example,
let’s highlight the method used by Illumina.
Figure 2: Illumina Next Generation sequencing3.
The first step in this process
involves fragmenting the DNA to be sequenced into small fragments. These
fragments attach to adaptors which are essentially primers. This solid surface substrate is
propriety for this technology. The
next step involves amplifying these DNA fragments in a manner referred to as
bridge amplification. With the
same polymerase used in Sangar sequencing, complementary DNA strands are
created. With this technology, up
to 1000 identical fragments can be generated. This amplifaction uses another propriety element: the
incorporation of 4 nucleotides each labeled with a different dye. Like Sanger sequencing, these modified
nucleotides also terminate the reaction after addition to the DNA fragment. In this way, after laser excitation,
the emitted fluorescent signal is captured and the subsequent sequencing reactions
can proceed. These DNA clusters
are sequenced, one base at a time and then aligned to a reference sample.
The advantages of NGS compared to Sanger sequencing are
clear. Whereas Sanger sequencing sequences
one region at a time, next generation sequencing can sequence multiple
fragments simultaneously, speeding up the process tremendously. Additionally, Sanger sequencing
generally only sequences a specific region a limited number of times. This “read-depth” is dramatically
improved with NGS technology whose techniques allow for deep sequencing. This increases the accuracy of the
seqeunce. With the increased speed
and accuracy, the ability to sequence large genomes is now feasible in a short
amount of time.
Once a sequence has been determined it can be mapped to the
latest human whole-genome reference using computational algorithms. The technology to accurately map
samples and identify mutations has also dramatically improved in our technological
age. When it comes to using this
technology for the identification of mutations in cancers, additional factors
need to be considered including germline mutations (what the patient carries
independent of the cancer) as well as single nucleotide polymorphisms or small
difference in coding DNA. The
interpretation of this data requires intensive knowledge of genetics.
Stay tuned for the implications of this technology! We’ll be discussing the role of NGS in
the cancer clinic as well as in the cancer research lab.
Today’s uncovered cancer morsel: The advancement of
technology directly impacts cancer research and cancer patients.
References
2.
International Human Genome Sequencing Consortium. Initial sequencing and analysis of the
human genome. Nature.
2001. 409: 860-922.
2001. 409: 860-922.
4.
Michael Metzker.
Sequencing technologies – the next generation. Nature Reviews:
Genetics. 2010. 11: 31-46