Chapter 12
Structure of Nucleic Acids

"Scherzo
in D & E" (detail) by David E. Rodale (1955-1985)
Chapter 11 presented the structure and chemistry of nucleotides and how these units are joined via phosphodiester bonds to form nucleic acids, the biological polymers for information storage and transmission. In this chapter, we investigate biochemical methods that reveal this information by determining the sequential order of nucleotides in a polynucleotide, the so-called primary structure of nucleic acids. Then, we consider the higher orders of structure in the nucleic acids, the secondary and tertiary levels. Although the focus here is primarily on the structural and chemical properties of these macromolecules, it is fruitful to keep in mind the biological roles of these remarkable substances. The sequence of nucleotides in nucleic acids is the embodiment of genetic information (see Part IV). We can anticipate that the cellular mechanisms for accessing this information, as well as reproducing it with high fidelity, will be illuminated by knowledge of the chemical and structural qualities of these polymers.
12.1 · The Primary Structure of Nucleic Acids
As recently as 1975, determining the primary structure of nucleic acids (the nucleotide sequence) was a more formidable problem than amino acid sequencing of proteins, simply because nucleic acids contain only four unique monomeric units whereas proteins have twenty. With only four, there are apparently fewer specific sites for selective cleavage, distinctive sequences are more difficult to recognize, and the likelihood of ambiguity is greater. The much greater number of monomeric units in most polynucleotides as compared to polypeptides is a further difficulty. Two important breakthroughs reversed this situation so that now sequencing nucleic acids is substantially easier than sequencing polypeptides. One was the discovery of restriction endonucleases that cleave DNA at specific oligonucleotide sites, generating unique fragments of manageable size (see Chapter 11). The second is the power of polyacrylamide gel electrophoresis separation methods to resolve nucleic acid fragments that differ from one another in length by just one nucleotide.
Sequencing Nucleic Acids
Two basic protocols for nucleic acid sequencing are in widespread use: the chain termination or dideoxy method of F. Sanger and the base-specific chemical cleavage method developed by A. M. Maxam and W. Gilbert. Because both methods are carried out on nanogram amounts of DNA, very sensitive analytical techniques are used to detect the DNA chains following electrophoretic separation on polyacrylamide gels. Typically, the DNA molecules are labeled with radioactive 32P,1 and following electrophoresis, the pattern of their separation is visualized by autoradiography. A piece of X-ray film is placed over the gel and the radioactive disintegrations emanating from 32P decay create a pattern on the film that is an accurate image of the resolved oligonucleotides. Recently, sensitive biochemical and chemiluminescent methods have begun to supersede the use of radioisotopes as tracers in these experiments.
Chain Termination or Dideoxy Method
To appreciate the rationale of the chain termination or dideoxy method, we first must briefly examine the biochemistry of DNA replication. DNA is a double-helical molecule. In the course of its replication, the sequence of nucleotides in one strand is copied in a complementary fashion to form a new second strand by the enzyme DNA polymerase. Each original strand of the double helix serves as template for the biosynthesis that yields two daughter DNA duplexes from the parental double helix (Figure 12.1).
Figure 12.1 DNA replication yields two daughter DNA duplexes identical to the parental DNA molecule. Each original strand of the double helix serves as a template, and the sequence of nucleotides in each of these strands is copied to form a new complementary strand by the enzyme DNA polymerase. By this process, biosynthesis yields two daughter DNA duplexes from the parental double helix.
DNA polymerase carries out this reaction in vitro in the presence of the four deoxynucleotide monomers and copies single-stranded DNA, provided a double-stranded region of DNA is artificially generated by adding a primer. This primer is merely an oligonucleotide capable of forming a short stretch of dsDNA by base pairing with the ssDNA (Figure 12.2). The primer must have a free 3'-OH end from which the new polynucleotide chain can grow as the first residue is added in the initial step of the polymerization process. DNA polymerases synthesize new strands by adding successive nucleotides in the 5' ® 3' direction.
Figure 12.2 DNA polymerase copies ssDNA in vitro in the presence of the four deoxynucleotide monomers, provided a double-stranded region of DNA is artificially generated by adding a primer, an oligonucleotide capable of forming a short stretch of dsDNA by base pairing with the ssDNA. The primer must have a free 3'-OH end from which the new polynucleotide chain can grow as the first residue is added in the initial step of the polymerization process.
Chain Termination Protocol
In the chain termination method of DNA sequencing, a DNA fragment of unknown sequence serves as template in a polymerization reaction using some type of DNA polymerase, usually Sequenase 2®, a genetically engineered version of bacteriophage T7 DNA polymerase that lacks all traces of exonuclease activity that might otherwise degrade the DNA. The primer requirement is met by an appropriate oligonucleotide (this method is also known as the primed synthesis method for this reason). Four parallel reactions are run; all four contain the four deoxynucleoside triphosphates dATP, dGTP, dCTP, and dTTP, which are the substrates for DNA polymerase (Figure 12.3 on the facing page). In each of the four reactions, a different 2',3'-dideoxynucleotide is included, and it is these dideoxynucleotides that give the method its name.
Figure 12.3 The chain termination or dideoxy method of DNA sequencing. (a) DNA polymerase reaction. (b) Structure of dideoxynucleotide. (c) Four reaction mixtures with nucleoside triphosphates plus one dideoxynucleoside triphosphate. (d) Electro-phoretogram. Note that the nucleotide sequence as read from the bottom to the top of the gel is the order of nucleotide addition carried out by DNA polymerase.
Because dideoxynucleotides lack 3'-OH groups, these nucleotides cannot serve as acceptors for 5'-nucleotide addition in the polymerization reaction, and thus the chain is terminated where they become incorporated. The concentrations of the four deoxynucleotides and the single dideoxynucleotide in each reaction mixture are adjusted so that the dideoxynucleotide is incorporated infrequently. Therefore, base-specific premature chain termination is only a random, occasional event, and a population of new strands of varying length is synthesized. Four reactions are run, one for each dideoxynucleotide, so that termination, although random, can occur everywhere in the sequence. In each mixture, each newly synthesized strand has a dideoxynucleotide at its 3'-end, and its presence at that position demonstrates that a base of that particular kind was specified by the template. A radioactively labeled dNTP is included in each reaction mixture to provide a tracer for the products of the polymerization process.
Reading Dideoxy Sequencing Gels
The sequencing products are visualized by autoradiography (or similar means) following their separation according to size by polyacrylamide gel electrophoresis (Figure 12.3). Because the smallest fragments migrate fastest upon electrophoresis and because fragments differing by only single nucleotides in length are readily resolved, the autoradiogram of the gel can be read from bottom to top, noting which lane has the next largest band at each step. Thus, the gel in Figure 12.3 is read AGCGTAGC (5' ® 3'). Because of the way DNA polymerase acts, this observed sequence is complementary to the corresponding unknown template sequence. Knowing this, the template sequence now can be written GCTACGCT (5' ® 3').
Base-Specific Chemical Cleavage Method
The base-specific chemical cleavage (or Maxam–Gilbert) method starts with a single-stranded DNA that is labeled at one end with radioactive 32P. (Double-stranded DNA can be used if only one strand is labeled at only one of its ends.) The DNA strand is then randomly cleaved by reactions that specifically fragment its sugar–phosphate backbone only where certain bases have been chemically removed. There is no unique reaction for each of the four bases. However, there is a reaction specific to G only and a purine-specific reaction that removes A or G (Figure 12.4).
Figure 12.4 Maxam–Gilbert sequencing of DNA: cleavage at purines uses dimethyl sulfate, followed by strand scission with piperidine.
Cleavage at G using dimethyl sulfate, followed by strand scission with piperidine : Under alkaline conditions, dimethyl sulfate reacts with guanine to methylate it at the 7-position (1). This substitution leads to instability of the N-9 glycosidic bond, so that in the presence of OH - and the secondary amine piperidine (2), the purine ring is degraded and released. A β-elimination reaction facilitated by piperidine (3) then causes the excision of the naked deoxyribose moiety from the sugar–phosphate backbone, with consequent scission of the DNA strand to yield 5'- and 3'-fragments.
Cleavage at A or G: If the DNA is first treated with acid, dimethyl sulfate methylates adenine at the 3-position as well as guanine at the 7-position (not shown). Subsequent reaction with OH - and piperidine triggers degradation and displacement of the methylated A or G purine base and strand scission, essentially as indicated here for reaction of dimethyl sulfate with guanine.
Thus, the difference in these two reactions is a specific indication of where A occurs. Similarly, there is a cleavage reaction specific for the pyrimidines (C+T) (Figure 12.5), which, if run in the presence of 1 or 2 M NaCl, works only with C. Differences in these two are thus attributable to the presence of T in the nucleotide sequence.
Figure
12.5
Maxam–Gilbert
sequencing of DNA: hydrolysis of pyrimidine rings by hydrazine. Hydrazine (H
2 N-NH 2 ) attacks across the C-4 and C-6 atoms of pyrimidines
to open the ring. This degradation subsequently leads to modification of the
deoxyribose, rendering it susceptible to β-elimination by piperidine in
the presence of hydroxide ion. Shown here is the excision of a T residue. As
in Figure 12.4, 5'- and 3'-fragments are produced. The presence of high salt
concentrations protects T (but not C) from reaction with hydrazine. In the presence
of 2 M NaCl, the reaction shown here occurs only at C residues.
Note that the key to Maxam–Gilbert sequencing is to modify a base chemically so that it is removed from its sugar. Then piperidine excises the sugar from its 5'- and 3'-links in a β-elimination reaction. The conditions of chemical cleavage described in Figures 12.4 and 12.5 are generally adjusted so that, on average, only a single scission occurs per DNA molecule. However, because a very large number of DNA molecules exist in each reaction mixture, the products are a random collection of different-sized fragments wherein the occurrence of any base is represented by its unique pair of 5'- and 3'-cleavage products. These products form a complete set, the members of which differ in length by only one nucleotide, and they can be resolved by gel electrophoresis into a “ladder,” which can be visualized by autoradiography of the gel if the DNA fragments are radioactively labeled (Figure 12.6).
Figure 12.6 Autoradiogram of a hypothetical electrophoretic pattern obtained for four reaction mixtures, performed as described in Figures 12.4 and 12.5 and run in the four lanes G, A1G, C1T, and C, respectively. Reading this pattern from the bottom up yields the sequence CCTGATCCCAGTCTA. The correct 5' ® 3' order is determined by knowing which end of the ssDNA was 32 P-labeled. If the 5'-end was 32 P-labeled, only the 5'-fragments will be evident on the autoradiogram; the 3'-ends will be invisible. Similarly, if the 3'-end was originally labeled, only the 3'-fragments light up the autoradiogram. Assuming that the 5'-end was labeled, the sequence would be CCTGATCCCAGTCTA. If it were the 3'-end, the sequence read in the 5' ® 3' convention would be ATCTGACCCTAGTCC. An interesting feature of the Maxam–Gilbert sequencing procedure is that the base that is “read” in the ladder is actually not present in the oligonucleotide that identifies it. Thus, an unidentified base bears the label at the end of the smallest fragment; this unidentified base is the one that preceded the first identified base. For example, an oligonucleotide of either
32 P-5'-(A,C,G,T)CCTGATCCCAGTCTA-3'
or
5'-ATCTGACCCTAGTCC(A,C,G,T)-3'- 32 P
would yield the same
pattern in the autoradiogram. (Indication here of T as the end-labeled nucleotide
is arbitrary.)
In principle, the Maxam–Gilbert method can provide the total sequence of a dsDNA molecule just by determining the purine positions on one strand and then the purines on the complementary strand. Complementary base-pairing rules then reveal the pyrimidines along each strand, T complementary to where A is, C complementary to where G occurs. (The analogous approach of locating the pyrimidines on each strand would also provide sufficient information to write the total sequence.)
Figure
12.7
A photograph
of the autoradiogram from an actual sequencing gel. A portion of the DNA sequence
of nit-6, the Neurospora gene encoding the enzyme nitrite reductase.
( James D. Colandene,
With current technology, it is possible to read the order of as many as 400 bases from the autoradiogram of a sequencing gel (Figure 12.7). The actual chemical or enzymatic reactions, electrophoresis, and autoradiography are now routine, and a skilled technician can sequence about 1 kbp per week using these manual techniques. The major effort in DNA sequencing is in the isolation and preparation of fragments of interest, such as cloned genes.
Automated DNA Sequencing
In recent years, automated DNA sequencing machines capable of identifying about 104 bases per day have become commercially available. One clever innovation has been the use of fluorescent dyes of different colors to uniquely label the primer DNA introduced into the four sequencing reactions; for example, red for the A reaction, blue for T, green for G, and yellow for C. Then, all four reaction mixtures can be combined and run together on one electrophoretic gel slab. As the oligonucleotides are separated and pass to the bottom of the gel, each is illuminated by a low-power argon laser beam that causes the dye attached to the primer to fluoresce. The color of the fluorescence is detected automatically, revealing the identity of the primer, and hence the base, immediately (Figure 12.8). The development of such automation has opened the possibility for sequencing the entire human genome, some 2.9 billion bp. Even so, if 100 automated machines operating at peak efficiency were dedicated to the task, it would still take at least 8 years to complete!
Figure
12.8
Schematic
diagram of the methodology used in fluorescent labeling and automated sequencing
of DNA. Four reactions are set up, one for each base, and the primer in each
is end-labeled with one of four different fluorescent dyes; the dyes serve to
color-code the base-specific sequencing protocol (a unique dye is used in each
dideoxynucleotide reaction). The four reaction mixtures are then combined and
run in one lane. Thus, each lane in the gel represents a different sequencing
experiment. As the differently sized fragments pass down the gel, a laser beam
excites the dye in the scan area. The emitted energy passes through a rotating
color filter and is detected by a fluorometer. The color of the emitted light
identifies the final base in the fragment. ( Applied Biosystems, Inc.,
12.2 · The ABZs of DNA Secondary Structure
Double-stranded
DNA molecules assume one of three secondary structures, termed A, B, and Z.
Fundamentally, double-stranded DNA is a regular two-chain structure with hydrogen
bonds formed between opposing bases on the two chains (see Chapter
11). Such H-bonding is possible only when the two chains are antiparallel.
The polar sugar–phosphate backbones of the two chains are on the outside. The
bases are stacked on the inside of the structure; these heterocyclic bases,
as a consequence of their π-electron clouds, are hydrophobic on their flat
sides. One purely hypothetical conformational possibility for a two-stranded
arrangement would be a ladderlike structure (Figure 12.9) in which the base
pairs are fixed at 0.6 nm apart because this is the distance between adjacent
sugars in the DNA backbone. Because H2O molecules would be accessible
to the spaces between the hydrophobic surfaces of the bases, this conformation
is energetically unfavorable. This ladderlike structure converts to a helix
when given a simple right-handed twist. Helical twisting brings the base-pair
rungs of the ladder closer together, stacking them 0.34 nm apart, without affecting
the sugar–sugar distance of 0.6 nm. Because this helix repeats itself approximately
every 10 bp, its pitch is 3.4 nm. This is the major conformation of DNA
in solution and it is called B-DNA.
Figure 12.9 (a) Double-stranded DNA as an imaginary ladderlike structure. (b) A simple right-handed twist converts the ladder to a helix.
Structural Equivalence of Watson–Crick Base Pairs
As indicated in Chapter 11, the base pairing in DNA is very specific: the purine adenine pairs with the pyrimidine thymine; the purine guanine pairs with the pyrimidine cytosine. Further, the A:T pair and G:C pair have virtually identical dimensions (Figure 12.10). Watson and Crick realized that units of such similarity could serve as spatially invariant substructures to build a polymer whose exterior dimensions would be uniform along its length, regardless of the sequence of bases.
Figure 12.10 Watson–Crick A:T and G:C base pairs. All H bonds in both base pairs are straight, with each H atom pointing directly at its acceptor N or O atom. Linear H bonds are the strongest. The mandatory binding of larger purines with smaller pyrimi-dines leads to base pairs that have virtually identical dimensions, allowing the two sugar–phosphate backbones to adopt identical helical conformations.
The DNA Double Helix Is a Stable Structure
Several factors account for the stability of the double-helical structure of DNA. First, both internal and external hydrogen bonds stabilize the double helix. The two strands of DNA are held together by H-bonds that form between the complementary purines and pyrimidines, two in an A:T pair and three in a G:C pair (Figure 12.10), while polar atoms in the sugar-phosphate backbone form external H bonds with surrounding water molecules. Second, the negatively charged phosphate groups are all situated on the exterior surface of the helix in such a way that they have minimal effect on one another and are free to interact electrostatically with cations in solution such as Mg2+. Third, the core of the helix consists of the base pairs, which, in addition to being H-bonded, stack together through hydrophobic interactions and van der Waals forces that contribute significantly to the overall stabilizing energy.
Figure
12.11
The bases
in a base pair are not directly across the helix axis from one another along
some diameter but rather are slightly displaced. This displacement, and the
relative orientation of the glycosidic bonds linking the bases to the sugar–phosphate
backbone, leads to differently sized grooves in the cylindrical column created
by the double helix, the major groove and the minor groove, each coursing along
its length.
A stereochemical consequence of the way A:T and G:C base pairs form is that the sugars of the respective nucleotides have opposite orientations, and thus the sugar-phosphate backbones of the two chains run in opposite or “antiparallel” directions. Furthermore, the two glycosidic bonds holding the bases in each base pair are not directly across the helix from each other, defining a common diameter (Figure 12.11). Consequently, the sugar-phosphate backbones of the helix are not equally spaced along the helix axis, and the grooves between them are not the same size. Instead, the intertwined chains create a major groove and a minor groove (Figure 12.11). The edges of the base pairs have a specific relationship to these grooves. The “top” edges of the base pairs (“top” as defined by placing the glycosidic bond at the bottom, as in Figure 12.10) are exposed along the interior surface or “floor” of the major groove; the base-pair edges nearest to the glycosidic bond form the interior surface of the minor groove. Some proteins that bind to DNA can actually recognize specific nucleotide sequences by “reading” the pattern of H-bonding possibilities presented by the edges of the bases in these grooves. Such DNA -protein interactions provide one step toward understanding how cells regulate the expression of genetic information encoded in DNA (see Chapter 32).
Conformational Variation in Double-Helical Structures
In solution, DNA ordinarily assumes the structure we have been discussing: B-DNA. However, nucleic acids also occur naturally in other double-helical forms. The base-pairing arrangement remains the same, but the sugar -phosphate groupings that constitute the backbone are inherently flexible and can adopt different conformations. One conformational variation is propeller twist (Figure 12.12). Propeller twist allows greater overlap between successive bases along a strand of DNA and diminishes the area of contact between bases and solvent water.
Figure
12.12
Helical twist and propeller twist in DNA. (a) Successive base pairs in B-DNA
show a rotation with respect to each other (so-called helical twist) of 36°
or so, as viewed down the cylindrical axis of the DNA. (b) Rotation in a different
dimension—propellor twist—allows the hydrophobic surfaces of bases
to overlap better. The view here is edge-on to two successive bases in one DNA
strand (as if the two bases on the right-hand strand of DNA in (a) were viewed
from the right-hand margin of the page; dots represent end-on views down the
glycosidic bonds). Clockwise rotation (as shown here) has a positive sign. (c)
The two bases on the left-hand strand of DNA in (a) also show positive propellor
twist (a clockwise rotation of the two bases in (a) as viewed from the left-hand
margin of the paper). ( Adapted from Figure 3.4 in Callandine, C. R., and
Drew, H. R., 1992. Understanding DNA: The Molecule and How It Works.
Alternative Form of Right-Handed DNA
An alternative form of the right-handed double helix is A-DNA. A-DNA molecules differ in a number of ways from B-DNA. The pitch, or distance required to complete one helical turn, is different. In B-DNA, it is 3.4 nm, whereas in A-DNA it is 2.46 nm. One turn in A-DNA requires 11 bp to complete. Depending on local sequence, 10 to 10.6 bp define one helical turn in B-form DNA. In A-DNA, the base pairs are no longer nearly perpendicular to the helix axis but instead are tilted 19° with respect to this axis. Successive base pairs occur every 0.23 nm along the axis, as opposed to 0.332 nm in B-DNA. The B-form of DNA is thus longer and thinner than the short, squat A-form, which has its base pairs displaced around, rather than centered on, the helix axis. Figure 12.13 shows the relevant structural characteristics of the A- and B-forms of DNA. (Z-DNA, another form of DNA to be discussed shortly, is also depicted in Figure 12.13.) A comparison of the structural properties of A-, B-, and Z-DNA is summarized in Table 12.1.


Figure
12.13
(here
and on the facing page) Comparison of the A-, B-, and Z-forms of the DNA
double helix. The distance required to complete one helical turn is shorter
in A-DNA than it is in B-DNA. The alternating pyrimidine–purine sequence of
Z-DNA is the key to the “left-handedness” of this helix. (Robert Stodola,
Although relatively dehydrated DNA fibers can be shown to adopt the A-conformation under physiological conditions, it is unclear whether DNA ever assumes this form in vivo. However, double-helical DNA:RNA hybrids probably have an A-like conformation. The 2'-OH in RNA sterically prevents double-helical regions of RNA chains from adopting the B-form helical arrangement. Importantly, double-stranded regions in RNA chains assume an A-like conformation, with their bases strongly tilted with respect to the helix axis.
Z-DNA: A Left-Handed Double Helix
Z-DNA was first recognized by Alexander Rich and his colleagues at MIT in X-ray analysis of the synthetic deoxynucleotide dCpGpCpGpCpG, which crystallized into an antiparallel double helix of unexpected conformation. The alternating pyrimidine -purine (Py-Pu) sequence of this oligonucleotide is the key to its unusual properties. The N-glycosyl bonds of G residues in this alternating copolymer are rotated 180° with respect to their conformation in B-DNA, so that now the purine ring is in the syn rather than the anti conformation (Figure 12.14). The C residues remain in the anti form. Because the G ring is “flipped,” the C ring must also flip to maintain normal Watson–Crick base pairing. However, pyrimidine nucleosides do not readily adopt the syn conformation because it creates steric interference between the pyrimidine C-2 oxy substituent and atoms of the pentose.

Figure 12.14
Comparison of
the deoxy-guanosine conformation in B- and Z-DNA. In B-DNA, the Cl'–N-9 glycosyl
bond is always in the anti position (left). In contrast, in the left-handed
Z-DNA structure, this bond rotates (as shown) to adopt the syn conformation.
Because the cytosine ring does not rotate relative to the pentose, the whole C nucleoside (base and sugar) must flip 180° (Figure 12.15). It is topologically possible for the G to go syn and the C nucleoside to undergo rotation by 180° without breaking and re-forming the G:C hydrogen bonds. In other words, the B to Z structural transition can take place without disruption of the bonding relationships among the atoms involved.
Figure
12.15
The change in topological relationships of base pairs from B- to Z-DNA. A six-base-pair
segment of B-DNA is converted to Z-DNA through rotation of the base pairs, as
indicated by the curved arrows. The purine rings (green) of the deoxyguanosine
nucleosides rotate via an anti to syn change in the conformation of the guanine–deoxyribose
glycosidic bond; the pyrimidine rings (blue) are rotated by flipping the entire
deoxycytidine nucleoside (base and deoxyribose). As a consequence of
these conformational changes, the base pairs in the Z-DNA region no longer share
p , p stacking interactions with adjacent B-DNA regions.
Because alternate nucleotides assume different conformations, the repeating unit on a given strand in the Z-helix is the dinucleotide. That is, for any number of bases, n, along one strand, n-1 dinucleotides must be considered. For example, a GpCpGpC subset of sequence along one strand is comprised of three successive dinucleotide units: GpC, CpG, and GpC. (In B-DNA, the nucleotide conformations are essentially uniform and the repeating unit is the mononucleotide.) It follows that the CpG sequence is distinct conformationally from the GpC sequence along the alternating copolymer chains in the Z-double helix. The conformational alterations going from B to Z realign the sugar-phosphate backbone along a zigzag course that has a left-handed orientation (Figure 12.13), thus the designation Z-DNA. Note that in any GpCpGp subset, the sugar-phosphates of GpC form the horizontal “zig” while the CpG backbone segment forms the vertical “zag.” The mean rotation angle circumscribed around the helix axis is -15° for a CpG step and -45° for a GpC step (giving -60° for the dinucleotide repeat). The minus sign denotes a left-handed or counterclockwise rotation about the helix axis. Z-DNA is more elongated and slimmer than B-DNA.
Cytosine Methylation and Z-DNA
The Z-form can arise in sequences that are not strictly alternating Py–Pu. For example, the hexanucleotide m5CGATm5CG, a Py-Pu-Pu-Py-Py-Pu sequence containing two 5-methylcytosines (m5C), crystallizes as Z-DNA. Indeed, the in vivo methylation of C at the 5-position is believed to favor a B to Z switch because, in B-DNA, these hydrophobic methyl groups would protrude into the aqueous environment of the major groove and destabilize its structure. In Z-DNA, the same methyl groups can form a stabilizing hydrophobic patch. It is likely that the Z-conformation naturally occurs in specific regions of cellular DNA, which otherwise is predominantly in the B-form. Furthermore, because methylation is implicated in gene regulation, the occurrence of Z-DNA may affect the expression of genetic information (see Part IV, Information Transfer).
The Double Helix in Solution
The long-range structure of B-DNA in solution is not a rigid, linear rod. Instead, DNA behaves as a dynamic, flexible molecule. Localized thermal fluctuations temporarily distort and deform DNA structure over short regions. Base and backbone ensembles of atoms undergo elastic motions on a time scale of nanoseconds. To some extent, these effects represent changes in rotational angles of the bonds comprising the polynucleotide backbone. These changes are also influenced by sequence-dependent variations in base-pair stacking. The consequence is that the helix bends gently. When these variations are summed over the great length of a DNA molecule, the net result of these bending motions is that, at any given time, the double helix assumes a roughly spherical shape, as might be expected for a long, semi-rigid rod undergoing apparently random coiling. It is also worth noting that, on close scrutiny, the surface of the double helix is not that of a totally featureless, smooth, regular “barber pole” structure. Different base sequences impart their own special signatures to the molecule by subtle influences on such factors as the groove width, the angle between the helix axis and base planes, and the mechanical rigidity. Certain regulatory proteins bind to specific DNA sequences and participate in activating or suppressing expression of the information encoded therein. These proteins bind at unique sites by virtue of their ability to recognize novel structural characteristics imposed on the DNA by the local nucleotide sequence.
Intercalating Agents Distort the Double Helix
Aromatic macrocycles, flat hydrophobic molecules composed of fused, hetero-cyclic rings, such as ethidium bromide, acridine orange, and actinomycin D (Figure 12.16), can insert between the stacked base pairs of DNA. The bases are forced apart to accommodate these so-called intercalating agents, causing an unwinding of the helix to a more ladderlike structure. The deoxyribose -phosphate backbone is almost fully extended as successive base pairs are displaced 0.7 nm from one another, and the rotational angle about the helix axis between adjacent base pairs is reduced from 36° to 10°.
Figure 12.16 The structures of ethidium bromide, acridine orange, and actinomycin D, three intercalating agents, and their effects on DNA structure.
Dynamic Nature of the DNA Double Helix in Solution
Intercalating substances insert with ease into the double helix, indicating that the van der Waals bonds they form with the bases sandwiching them are more favorable than similar bonds between the bases themselves. Furthermore, the fact that these agents slip in suggests that the double helix must temporarily unwind and present gaps for these agents to occupy. That is, the DNA double helix in solution must be represented by a set of metastable alternatives to the standard B-conformation. These alternatives constitute a flickering repertoire of dynamic structures.
12.3 · Denaturation and Renaturation of DNA
Thermal Denaturation and Hyperchromic Shift
When duplex DNA molecules
are subjected to conditions of pH, temperature, or ionic strength that disrupt
hydrogen bonds, the strands are no longer held together. That is, the double
helix is denatured and the strands separate as individual random coils.
If temperature is the denaturing agent, the double helix is said to melt.
The course of this dissociation can be followed spectrophotometrically because
the relative absorbance of the DNA solution at 260 nm increases as much as 40%
as the bases unstack. This absorbance increase, or hyperchromic shift,
is due to the fact that the aromatic bases in DNA interact via their p electron
clouds when stacked together in the double helix. Because the UV absorbance
of the bases is a consequence of π electron transitions, and because the
potential for these transitions is diminished when the bases stack, the bases
in duplex DNA absorb less 260-nm radiation than expected for their numbers.
Unstacking alleviates this suppression of UV absorbance. The rise in absorbance
coincides with strand separation, and the midpoint of the absorbance increase
is termed the melting temperature, Tm (Figure 12.17). 
Figure 12.17 Heat denaturation of DNA from various sources, so-called melting curves. The midpoint of the melting curve is defined as the melting temperature, T m . (From Marmur, J., 1959. Nature 183:1427–1429.)
DNAs differ in their Tm values because they differ in relative G + C content. The higher the G + C content of a DNA, the higher its melting temperature because G:C pairs are held by three H bonds whereas A:T pairs have only two. The dependence of Tm on the G + C content is depicted in Figure 12.18. Also note that Tm is dependent on the ionic strength of the solution; the lower the ionic strength, the lower the melting temperature. At 0.2 M Na+, Tm = 69.3 + 0.41(% G + C). Ions suppress the electrostatic repulsion between the negatively charged phosphate groups in the complementary strands of the helix, thereby stabilizing it. (DNA in pure water melts even at room temperature.) At high concentrations of ions, Tm is raised and the transition between helix and coil is sharp.
Figure
12.18
The dependence
of melting temperature on relative (G + C) content in DNA. Note that Tm increases
if ionic strength is raised at constant pH (pH 7); 0.01 M phosphate+0.001
M EDTA versus 0.15 M NaCl/0.015 M Na citrate. In 0.15 M
NaCl/0.015 M Na citrate, duplex DNA consisting of 100% A : T pairs melts
at less than 70°C, whereas DNA of 100% G : C has a T m greater
than 110°C. (From Marmur, J., and Doty, P., 1962. Journal of Molecular
Biology 5:120.)
pH Extremes or Strong H-Bonding Solutes _Also Denature DNA Duplexes
At pH values greater than 10, extensive deprotonation of the bases occurs, destroying their hydrogen bonding potential and denaturing the DNA duplex. Similarly, extensive protonation of the bases below pH 2.3 disrupts base pairing. Alkali is the preferred denaturant because, unlike acid, it does not hydrolyze the glycosidic linkages in the sugar–phosphate backbone. Small solutes that readily form H bonds are also DNA denaturants at temperatures below Tm if present in sufficiently high concentrations to compete effectively with the H-bonding between the base pairs. Examples include formamide and urea.
DNA Renaturation
Denatured DNA will renature to re-form the duplex structure if the denaturing conditions are removed (that is, if the solution is cooled, the pH is returned to neutrality, or the denaturants are diluted out). Renaturation requires reassociation of the DNA strands into a double helix, a process termed reannealing. For this to occur, the strands must realign themselves so that their complementary bases are once again in register and the helix can be zippered up (Figure 12.19). Renaturation is dependent both on DNA concentration and time. Many of the realignments are imperfect, and thus the strands must dissociate again to allow for proper pairings to be formed. The process occurs more quickly if the temperature is warm enough to promote diffusion of the large DNA molecules but not so warm as to cause melting.
Figure
12.19
Steps in the
thermal denaturation and renaturation of DNA. The nucleation phase of the reaction
is a second-order process depending on sequence alignment of the two strands.
This process takes place slowly because it takes time for complementary sequences
to encounter one another in solution and then align themselves in register.
Once the sequences are aligned, the strands zipper up quickly.
Renaturation Rate and DNA Sequence Complexity—c0t Curves
The renaturation rate of DNA is an excellent indicator of the sequence complexity of DNA. For example, bacteriophage T4 DNA contains about 2 x 105 nucleotide pairs, whereas Escherichia coli DNA possesses 4.64 x 106. E. coli DNA is considerably more complex in that it encodes more information. Expressed another way, for any given amount of DNA (in grams), the sequences represented in an E. coli sample are more heterogeneous, that is, more dissimilar from one another, than those in an equal weight of phage T4 DNA. Therefore, it will take the E. coli DNA strands longer to find their complementary partners and reanneal. This situation can be analyzed quantitatively.
If c is the concentration of single-stranded DNA at time t, then the second-order rate equation for two complementary strands coming together is given by the rate of decrease in c:
-dc/dt = k2c2
where k2 is the second-order rate constant. Starting with a concentration, c0, of completely denatured DNA at t = 0, the amount of single-stranded DNA remaining at some time t is
c/c0 = 1/(1 + k2c0t)
where the units of c are mol of nucleotide per L and t is in seconds. The time for half of the DNA to renature (when c/c0 = 0.5) is defined as t = t1/2. Then,
0.5 = 1/(1 + k2c0t1/2) and thus 1 + k2c0t1/2 = 2
yielding
c0t1/2 = 1/k2
A graph of the fraction of single-stranded DNA reannealed (c/c0) as a function of c0t on a semilogarithmic plot is referred to as a c0t (pronounced “cot”) curve (Figure 12.20). The rate of reassociation can be followed spectrophotometrically by the UV absorbance decrease as duplex DNA is formed. Note that relatively more complex DNAs take longer to renature, as reflected by their greater c0t1/2 values. Poly A and poly U (Figure 12.20) are minimally complex in sequence and anneal rapidly to form a double-stranded A : U polynucleotide. Mouse satellite DNA is a highly repetitive subfraction of mouse DNA. Its lack of sequence heterogeneity is seen in its low c0t1/2 value. MS-2 is a small bacteriophage whose genetic material is RNA. Calf thymus DNA is the mammalian representative in Figure 12.20.
Figure
12.20
These c0t
curves show the rates of reassociation of denatured DNA from various sources
and illustrate how the rate of reassociation is inversely proportional to genome
complexity. The DNA sources are as follows: poly A+poly U, a synthetic DNA duplex
of poly A and poly U polynucleotide chains; mouse satellite DNA, a fraction
of mouse DNA in which the same sequence is repeated many thousands of times;
MS-2 dsRNA, the double-stranded form of RNA found during replication of MS-2,
a simple bacteriophage; T4 DNA, the DNA of a more complex bacteriophage; E.
coli DNA, bacterial DNA; calf DNA (nonrepetitive fraction), mammalian
DNA (calf) from which the highly repetitive DNA fraction (satellite DNA) has
been removed. Arrows indicate the genome size (in bp) of the various DNAs. (From
Britten, R. J., and Kohne, D. E., 1968. Science 161:529–540.)
Nucleic Acid Hybridization
If DNA from two different
species are mixed, denatured, and allowed to cool slowly so that reannealing
can occur, artificial hybrid duplexes may form, provided the DNA from
one species is similar in nucleotide sequence to the DNA of the other. The degree
of hybridization is a measure of the sequence similarity or relatedness
between the two species. Depending on the conditions of the experiment, about
25% of the DNA from a human forms hybrids with mouse DNA, implying that some
of the nucleotide sequences (genes) in humans are very similar to those in mice.
Mixed RNA : DNA hybrids can be created in vitro if single-stranded DNA is allowed
to anneal with RNA copies of itself, such as those formed when genes are transcribed
into mRNA molecules.
Nucleic
acid hybridization is a commonly employed procedure in molecular biology. First,
it can reveal evolutionary relationships. Second, it gives researchers the power
to identify specific genes selectively against a vast background of irrelevant
genetic material. An appropriately labeled oligo- or polynucleotide, referred
to as a probe, is constructed so that its sequence is complementary to
a target gene. The probe specifically base pairs with the target gene, allowing
identification and subsequent isolation of the gene. Also, the quantitative
expression of genes (in terms of the amount of mRNA synthesized) can be assayed
by hybridization experiments.
Buoyant Density of DNA
Not only the melting temperature of DNA but also its density in solution is dependent on relative G:C content. G:C-rich DNA has a significantly higher density than A:T-rich DNA. Furthermore, a linear relationship exists between the buoyant densities of DNA from different sources and their G: C content (Figure 12.21). The density of DNA, r (in g/mL), as a function of its G: C content is given by the equation r = 1.660 + 0.098(GC), where (GC) is the mole fraction of (G + C) in the DNA. Because of its relatively high density, DNA can be purified from cellular material by a form of density gradient centrifugation known as isopycnic centrifugation (see Appendix to this chapter).
Figure 12.21 The relationship of the densities (in g/mL) of DNAs from various sources and their G : C content .(From Doty, P., 1961. Harvey Lectures 55:103.)
12.4 · Supercoils and Cruciforms: Tertiary Structure in DNA
The conformations of DNA discussed thus far are variations sharing a common secondary structural theme, the double helix, in which the DNA is assumed to be in a regular, linear form. DNA can also adopt regular structures of higher complexity in several ways. For example, many DNA molecules are circular. Most, if not all, bacterial chromosomes are covalently closed, circular DNA duplexes, as are almost all plasmid DNAs. Plasmids are naturally occurring, self-replicating, circular, extrachromosomal DNA molecules found in bacteria; plasmids carry genes specifying novel metabolic capacities advantageous to the host bacterium. Various animal virus DNAs are circular as well.
Supercoils
In duplex DNA, the two strands are wound about each other once every 10 bp, that is, once every turn of the helix. Double-stranded circular DNA (or linear DNA duplexes whose ends are not free to rotate), form supercoils if the strands are underwound (negatively supercoiled) or overwound (positively supercoiled) (Figure 12.22). Underwound duplex DNA has fewer than the natural number of turns, whereas overwound DNA has more. DNA supercoiling is analogous to twisting or untwisting a two-stranded rope so that it is torsionally stressed. Negative supercoiling introduces a torsional stress that favors unwinding of the right-handed B-DNA double helix, while positive supercoiling overwinds such a helix. Both forms of supercoiling compact the DNA so that it sediments faster upon ultracentrifugation or migrates more rapidly in an electrophoretic gel in comparison to relaxed DNA (DNA that is not supercoiled).
Figure
12.22
Toroidal
and interwound varieties of DNA supercoiling. (a) The DNA is coiled in a spiral
fashion about an imaginary toroid. (b) The DNA interwinds and wraps about itself.
(c) Supercoils in long, linear DNA arranged into loops whose ends are restrained—a
model for chromosomal DNA. (Adapted from Figures 6.1 and 6.2 in Callandine,
C. R., and Drew, H. R., 1992. Understanding DNA: The Molecule and How It
Works.
Linking Number
The basic parameter characterizing supercoiled DNA is the linking number (L). This is the number of times the two strands are intertwined, and, provided both strands remain covalently intact, L cannot change. In a relaxed circular DNA duplex of 400 bp, L is 40 (assuming 10 bp per turn in B-DNA). The linking number for relaxed DNA is usually taken as the reference parameter and is written as L0. L can be equated to the twist (T) and writhe (W) of the duplex, where twist is the number of helical turns and writhe is the number of supercoils:
L = T + W
Figure
12.23
Supercoiled DNA
topology. (Adapted from Figures 6.5 and 6.6 in
Callandine,C. R.,and Drew, H. R., 1992. Understanding DNA: The Molecule
and
How It Works.
Figure 12.23a shows the values of T and W for various positively and negatively supercoiled circular DNAs. In any closed, circular DNA duplex that is relaxed, W = 0. A relaxed circular DNA of 400 bp has 40 helical turns, T = L = 40. This linking number can only be changed by breaking one or both strands of the DNA, winding them tighter or looser, and rejoining the ends. Enzymes capable of carrying out such reactions are called topoisomerases because they change the topological state of DNA. Topoisomerase falls into two basic classes, I and II. Topoisomerases of the I type cut one strand of a DNA double helix, pass the other strand through, and then rejoin the cut ends. Topoisomerase II enzymes cut both strands of a dsDNA, pass a region of the DNA duplex between the cut ends, and then rejoin the ends (Figure 12.24). Topoisomerases are important players in DNA replication (see Chapter 30).
Figure
12.24
A simple model for the action of bacterial DNA gyrase (topoisomerase II). The
A-subunits cut the DNA duplex and then hold onto
the cut ends. Conformational changes occur in the enzyme that allow a continuous
region of the DNA duplex to pass between the cut ends and into
an internal cavity of the protein. The cut ends are then re-ligated, and the
intact DNA duplex is released from the enzyme. The released intact circular
DNA now contains two negative supercoils as a consequence of DNA gyrase action.
DNA Gyrase
The bacterial enzyme DNA gyrase is a topoisomerase that introduces negative supercoils into DNA in the manner shown in Figure 12.24. Suppose DNA gyrase puts four negative supercoils into the 400-bp circular duplex, then W = 24, T remains the same, and L = 36 (Figure 12.25). In actuality, the negative supercoils cause a torsional stress on the molecule so that T tends to decrease; that is, the helix becomes a bit unwound so that base pairs are separated. The extreme would be that T would decrease by 4 and the supercoiling would be removed (T = 36, L = 36, and W = 0). Usually the real situation is a compromise in which the negative value of W is reduced, T decreases slightly, and these changes are distributed over the length of the circular duplex so that no localized unwinding of the helix ensues. Although the parameters T and W are conceptually useful, neither can be measured experimentally at the present time.
Figure
12.25
A 400-bp
circular DNA mole-cule in different topological states: (a) relaxed, (b) negative
supercoils distributed over the entire length, and (c) negative supercoils creating
a localized
single-stranded region. Negative supercoiling has the potential to cause localized
unwinding of the
DNA double helix so that single-stranded regions (or bubbles) are created.
Superhelix Density
The difference between the linking number of a DNA and the linking number of its relaxed form is D L: D L = (L - L0). In our example with four negative supercoils, D L = 24. The superhelix density or specific linking difference is defined as D L/L0 and is sometimes termed sigma, s . For our example, s = -4/40, or -0.1. As a ratio, s is a measure of supercoiling that is independent of length. Its sign reflects whether the supercoiling tends to unwind (negative s ) or overwind (positive s ) the helix. In other words, the superhelix density states the number of supercoils per 10 bp, which also is the same as the number of supercoils per B-DNA repeat. Circular DNA isolated from natural sources is always found in the underwound, negatively supercoiled state.
Toroidal Supercoiled DNA
Negatively supercoiled DNA can arrange into a toroidal state (Figure 12.26). The toroidal state of negatively supercoiled DNA is stabilized by wrapping around proteins which serve as spools for the DNA “ribbon.” This toroidal conformation of DNA is found in protein : DNA interactions that are the basis of phenomena as diverse as chromosome structure (see Figure 12.31) and gene expression.
Figure
12.26
Supercoiled
DNA in a toroidal form wraps readily around protein “spools.” A twisted segment
of linear DNA with two negative supercoils (a) can collapse into a toroidal
conformation if its ends are brought closer together (b). Wrapping the DNA toroid
around a protein “spool” stabilizes this conformation of supercoiled DNA (c).
(Adapted from Figure 6.6 in Callandine, C. R., and Drew, H. R., 1992. Understanding
DNA: The Molecule and How It Works.
Cruciforms
Palindromes are words, phrases, or sentences that are the same when read backward or forward, such as “radar,” “sex at noon taxes,” “Madam, I’m Adam,” and “a man, a plan, a canal, Panama .” DNA sequences that are inverted repeats, or palindromes, have the potential to form a tertiary structure known as a cruciform (literally meaning “cross-shaped”) if the normal interstrand base pairing is replaced by intrastrand pairing (Figure 12.27). In effect, each DNA strand folds back on itself in a hairpin structure to align the palindrome in base-pairing register. Such cruciforms are never as stable as normal DNA duplexes because an unpaired segment must exist in the loop region. However, negative supercoiling causes a localized disruption of hydrogen bonding between base pairs in DNA and may promote formation of cruciform loops. Cruciform structures have a twofold rotational symmetry about their centers and potentially create distinctive recognition sites for specific DNA-binding proteins.
Figure 12.27 The formation of a cruciform structure from a palindromic sequence within DNA. The self-complementary inverted repeats can rearrange to form hydrogen-bonded cruciform loops.
A typical human cell is 20 m m in diameter. Its genetic material consists of 23 pairs of dsDNA molecules in the form of chromosomes, the average length of which is 3 x 109 bp/23 or 1.3 x 108 nucleotide pairs. At 0.34 nm/bp in B-DNA, this represents a DNA molecule 5 cm long. Together, these 46 dsDNA molecules amount to more than 2 m of DNA that must be packaged into a nucleus perhaps 5 m m in diameter! Clearly, the DNA must be condensed by a factor of more than 105. This remarkable task is accomplished by neatly wrapping the DNA around protein spools called nucleosomes and then packing the nucleosomes to form a helical filament that is arranged in loops associated with the nuclear matrix, a skeleton or scaffold of proteins providing a structural framework within the nucleus.
Nucleosomes
The DNA in a eukaryotic cell nucleus during the interphase between cell divisions exists as a nucleoprotein complex called chromatin. The proteins of chromatin fall into two classes: histones and nonhistone chromosomal proteins. Histones are abundant structural proteins, whereas the nonhistone class is represented only by a few copies each of many diverse proteins involved in genetic regulation. The histones are relatively small, positively charged arginine- or lysine-rich proteins that interact via ionic bonds with the negatively charged phosphate groups on the polynucleotide backbone. Five distinct histones are known: H1, H2A, H2B, H3, and H4 (Table 12.2). Pairs of histones H2A, H2B, H3, and H4 aggregate to form an octameric core structure, which is the core of the nucleosome, around which the DNA helix is wound (see Figure 11.23).
Figure
12.28
Electron micrograph of Drosophila melanogaster chromatin after swelling
reveals the presence of nucleosomes as “beads on a string.” (Electron micrograph
courtesy of Oscar L. Miller, Jr., of the
If chromatin is swelled suddenly in water and prepared for viewing in the electron microscope, the nucleosomes are evident as “beads on a string,” dsDNA being the string (Figure 12.28). The structure of the histone octamer core has been determined by X-ray crystallography without DNA by E. N. Moudrianakis’s laboratory (Figure 12.29) and wrapped with DNA by T. J. Richmond and collaborators (Figure 12.30).
Figure
12.29
Four orthogonal
views of the histone octamer as determined by X-ray crystallography: (a) front
view; (b) top view; and (c) disk view, that is, as viewed down the long axis
of the chromatin fiber. In the (c) perspective, the DNA duplex would wrap around
the octamer, with the axis of the DNA supercoil perpendicular to the plane of
the picture. (d) Suggested appearance of the nucleosome when wrapped with DNA.
(Photographs courtesy of Evangelos N. Moudrianakis of
The octamer (Figure 12.29) has surface landmarks that guide the course of the DNA around the octamer; 146 bp of B-DNA in a flat, left-handed superhelical conformation make 1.65 turns around the histone core (Figure 12.30), which itself is a protein superhelix consisting of a spiral array of the four histone dimers. Histone 1, a three-domain protein, serves to seal the ends of the DNA turns to the nucleosome core and to organize the additional 40 to 60 bp of DNA that link consecutive nucleosomes.
Figure
12.30
(a) Deduced
structure of the nucleosome core particle wrapped with 1.65 turns of DNA (146
bp). The DNA is shown as a ribbon. (left) View down the axis of the nucleosome;
(right) view perpendicular to the axis. (b) One-half of the nucleosome
core particle with 73 bp of DNA, as viewed down the nucleosome axis. Note that
the DNA does not wrap in a uniform circle about the histone core, but instead
follows a course consisting of a series of somewhat straight segments separated
by bends. (Adapted from Luger, C., et al., 1997.
Organization of Chromatin and Chromosomes
A higher order of chromatin structure is created when the nucleosomes, in their characteristic beads-on-a-string motif, are wound in the fashion of a solenoid having six nucleosomes per turn (Figure 12.31). The resulting 30-nm fila-ment contains about 1200 bp in each of its solenoid turns. Interactions between the respective H1 components of successive nucleosomes stabilize the 30-nm filament. This 30-nm filament then forms long DNA loops of variable length, each containing on average between 60,000 and 150,000 bp. Electron microscopic analysis of human chromosome 4 suggests that 18 such loops are then arranged radially about the circumference of a single turn to form a miniband unit of the chromosome. According to this model, approximately 106 of these minibands are arranged along a central axis in each of the chromatids of human chromosome 4 that form at mitosis (Figure 12.31). Despite intensive study, much of the higher-order structure of chromosomes remains a mystery.
Figure 12.31 A model for chromosome structure, human chromosome 4. The 2-nm DNA helix is wound twice around histone octamers to form 10-nm nucleosomes, each of which contains 160 bp (80 per turn). These nucleosomes are then wound in solenoid fashion with six nucleosomes per turn to form a 30-nm filament. In this model, the 30-nm filament forms long DNA loops, each containing about 60,000 bp, which are attached at their base to the nuclear matrix. Eighteen of these loops are then wound radially around the circumference of a single turn to form a miniband unit of a chromosome. Approximately 10 6 of these minibands occur in each chromatid of human chromosome 4 at mitosis.
| Human Biochemistry | |
| Telomeres and Tumors | |
|
Eukaryotic chromosomes are linear. The ends of chromosomes have specialized structures known as telomeres. The telomeres of virtually all eukaryotic chromosomes consist of short, tandemly repeated nucleotide sequences at the ends of the chromosomal DNA. For example, the telomeres of human germline (sperm and egg) cells contain between 1000 and 1700 copies of the hexa-meric repeat TTAGGG (see figure). Telomeres are believed to be responsible for maintaining chromosomal integrity by protecting against DNA degradation or rearrangement. Telomeres are added to the ends of chromosomal DNA by an RNA-containing enzyme known as telomerase (Chapter 30); telomerase is an unusual DNA polymerase that was discovered in 1985 by Elizabeth Blackburn and Carol Greider of the University of California, San Francisco. |
However, most normal somatic cells lack telomerase. Con-sequently, upon every cycle of cell division when the cell replicates its DNA, about 50-nucleotide portions are lost from the end of each telomere. Thus, over time, the telomeres of somatic cells in animals become shorter and shorter, eventually leading to chromosome instability and cell death. This phenomenon has led some scientists to espouse a “telomere theory of aging” that implicates telomere shortening as the principal factor in cell, tissue, and even organism aging. Interestingly, cancer cells appear “immortal” because they continue to reproduce indefinitely. A survey of 20 different tumor types by Geron Corporation of Menlo Park, California, revealed that all contained telomerase activity. |
![]() |
(a) Telomeres on human chromosomes consist of the hexanucleotide sequence TTAGGG repeated between 1000 and 1700 times. These TTAGGG tandem repeats are attached to the 3'-ends of the DNA strands and are paired with the complementary sequence 3'-AATCCC-5' on the other DNA strand. Thus, a G-rich region is created at the 3'-end of each DNA strand and a C-rich region is created at the 5'-end of each DNA strand. Typically, at each end of the chromosome, the G-rich strand protrudes 12 to 16 nucleotides beyond its complementary C-rich strand. (b) Like other telomerases, human telomerase is a ribonucleoprotein. The ribonucleic acid of human telomerase is an RNA molecule 962 nucleotides long. This RNA serves as the template for the DNA polymerase activity of telomerase. Nucleotides 46 to 56 of this RNA are CUAACCCUAAC and provide the template function for the telomerase-catalyzed addition of TTAGGG units to the 3'-end of a DNA strand. |
12.6 · Chemical Synthesis of Nucleic Acids
Laboratory synthesis of oligonucleotide chains of defined sequence presents some of the same problems encountered in chemical synthesis of polypeptides (see Chapter 5). First, functional groups on the monomeric units (in this case, bases) are reactive under conditions of polymerization and therefore must be protected by blocking agents. Second, to generate the desired sequence, a phosphodiester bridge must be formed between the 3'-O of one nucleotide (B) and the 5'-O of the preceding one (A) in a way that precludes the unwanted bridging of the 3'-O of A with the 5'-O of B. Finally, recoveries at each step must be high so that overall yields in the multistep process are acceptable. As in peptide synthesis (see Chapter 5), solid phase methods are used to overcome some of these problems. Commercially available automated instruments, called DNA synthesizers or “gene machines,” are capable of carrying out the synthesis of oligonucleotides of 150 bases or more.
Phosphoramidite Chemistry
Phosphoramidite chemistry is currently the accepted method of oligonucleotide synthesis. The general strategy involves the sequential addition of nucleotide units as nucleoside phosphoramidite derivatives to a nucleoside covalently attached to the insoluble resin. Excess reagents, starting materials, and side products are removed after each step by filtration. After the desired oligonucleotide has been formed, it is freed of all blocking groups, hydrolyzed from the resin, and purified by gel electrophoresis. The four-step cycle is shown in Figure 12.32. Chemical synthesis takes place in the 3' ® 5' direction (the reverse of the biological polymerization direction).
Figure 12.32 Solid phase oligonucleotide synthesis. The four-step cycle starts with the first base in nucleoside form (N-1) attached by its 3'-OH group to an insoluble, inert resin or matrix, typically either controlled pore glass (CPG) or silica beads. Its 5'-OH is blocked with a dimethoxytrityl (DMTr) group (a). If the base has reactive ONH2 functions, as in A, G, or C, then N-benzoyl or N-isobutyryl derivatives are used to prevent their reaction (b). In step 1, the DMTr protecting group is removed by trichloroacetic acid treatment. Step 2 is the coupling step: the second base (N-2) is added in the form of a nucleoside phosphoramidite derivative whose 5'-OH bears a DMTr blocking group so it cannot polymerize with itself (c). The presence of a weak acid, such as tetrazole, activates the phosphoramidite, and it rapidly reacts with the free 5'-OH of N-1, forming a dinucleotide linked by a phosphite group. Chemical synthesis thus takes place in the 3' ® 5' direction. Unreacted free 5'-OHs of N-1 (usually only 2 – 6% of the total) are blocked from further participation in the polymerization process by acetylation with acetic anhydride in step 3, referred to as capping. The phosphite linkage between N-1 and N-2 is highly reactive and, in step 4, it is oxidized by aqueous iodine (I2) to form the desired more stable phosphate group. This completes the cycle. Subsequent cycles add successive residues to the resin-immobilized chain. When the chain is complete, it is cleaved from the support with NH 4 OH, which also removes the N-benzoyl and N-isobutyryl protecting groups from the amino functions on the A, G, and C residues.
Chemically Synthesized Genes
Table 12.3 lists some of the genes that have been chemically synthesized. Because protein-coding genes are characteristically much larger than the 150-bp practical limit on oligonucleotide synthesis, their synthesis involves joining a series of oligonucleotides to assemble the overall sequence. A prime example of such synthesis is the gene for rhodopsin.
Figure 12.33 illustrates the strategy used in the total synthesis of the gene for bovine rhodopsin. This gene, which is 1057 base pairs long, encodes the 348 amino-acid photoreceptor protein of the vertebrate retina. Theoretically, no gene is beyond the scope of these methods, a fact that opens the door to an incredibly exciting range of possibilities for investigating structure-function relationships in the organization and expression of hereditary material.
Figure 12.33 Total synthesis of the bovine rhodopsin gene was achieved by joining 72 synthetic oligonucleotides, 36 representing one strand and 36 the complementary strand. These oligonucleotides are overlapping. Once synthesized, the various oligonucleotides, each 15 to 40 nucleotides long, were assembled by annealing and enzymatic ligation into three large fragments, representing nucleotides -5 to 338 (-5 meaning 5 nucleotides before the start of the coding region), 335 to 702, and 699 to 1052. The total gene was then created by joining these fragments. This figure shows only one fragment (fragment PB, comprising nucleotides 699 through 1052), assembled from 20 complementary oligonucleotides whose ends overlap. Odd-numbered oligonucleotides (1, 3, 5, . . . ) compose the 5' ® 3' strand; even-numbered ones (2, 4, 6, . . . ) represent the 3' ® 5' strand. (Vertical arrows indicate nucleotides that were changed from the native gene sequence. Restriction sites are shown boxed in blue lines; those removed from the gene through nucleotide substitutions are shown as yellow shaded boxes.) Note the single-stranded overhangs at either end of the 3' ® 5' strand. The sequences at these overhangs correspond to restriction endonuclease sites (PstI and BamH1), which facilitate subsequent manipulation of the fragment in gene assembly and cloning.
12.7 · Secondary and Tertiary Structure of RNA
RNA molecules (see Chapter 11) are typically single-stranded. Nevertheless, they are often rich in double-stranded regions that form when complementary sequences within the chain come together and join via intrastrand hydrogen bonding. RNA strands cannot fold to form B-DNA type double helices because their 2'-OH groups are a steric hindrance to this conformation. Instead, RNA double helices adopt a conformation similar to the A-form of DNA, having about 11 bp per turn, and the bases strongly tilted from the plane perpendicular to the helix axis (see Figure 12.13). Both tRNA and rRNA have characteristic secondary structures formed in this manner. Secondary structures are presumed to exist in mRNA species as well, although their nature is as yet little understood. (The functions of tRNA, rRNA, and mRNA are discussed in detail in Part IV: Information Transfer.)
Transfer RNA
In tRNA molecules, which contain from 73 to 94 nucleotides in a single chain, a majority of the bases are hydrogen-bonded to one another. Figure 12.34 shows the structure that typifies tRNAs. Hairpin turns bring complementary stretches of bases in the chain into contact so that double-helical regions form. Because of the arrangement of the complementary stretches along the chain, the overall pattern of H-bonding can be represented as a cloverleaf. Each cloverleaf consists of four H-bonded segments — three loops and the stem where the 3'- and 5'-ends of the molecule meet. These four segments are designated the acceptor stem, the D loop, the anticodon loop, and the T y C loop .
Figure 12.34 A general diagram for the structure of tRNA. The positions of invariant bases as well as bases that seldom vary are shown in color. The numbering system is based on yeast tRNA Phe . R = 5 purine; Y = 5 pyrimidine. Dotted lines denote sites in the D loop and variable loop regions where varying numbers of nucleotides are found in different tRNAs.
tRNA Secondary Structure
The acceptor stem is where the amino acid is linked to form the aminoacyl-tRNA derivative, which serves as the amino acid–donating species in protein synthesis; this is the physiological role of tRNA. The amino acid adds to the 3'-OH of the 3'-terminal A nucleotide (Figure 12.35). The 3'-end of tRNA is invariantly CCA-3'-OH. This CCA sequence plus a fourth nucleotide extends beyond the double-helical portion of the acceptor stem. The D loop is so named because this tRNA loop often contains dihydrouridine, or D, residues. In addition to dihydrouridine, tRNAs characteristically contain a number of unusual bases, including inosine, thiouridine, pseudouridine, and hypermethylated purines (see Figure 11.26). The anticodon loop consists of a double-helical segment and seven unpaired bases, three of which are the anticodon. (The anticodon is the three-nucleotide unit that recognizes and base pairs with a particular mRNA codon, a complementary three-base unit in mRNA which is the genetic information that specifies an amino acid.) Reading 3' ® 5', the anticodon is invariably preceded by a purine (often an alkylated one) and followed by a U.
Figure 12.35 Amino acids are linked to the 3'-OH end of tRNA molecules by an ester bond formed between the carboxyl group of the amino acid and the 3'-OH of the terminal ribose of the tRNA.
Anticodon base pairing to the codon on mRNA allows a particular tRNA species to deliver its amino acid to the protein-synthesizing apparatus. It represents the key event in translating the information in the nucleic acid sequence so that the appropriate amino acid is inserted at the right place in the amino acid sequence of the protein being synthesized. Next along the tRNA sequence in the 5' ® 3' direction comes a loop that varies from tRNA to tRNA in the number of residues that it has, the so-called extra or variable loop. The last loop in the tRNA, reading 5' ® 3', is the TyC loop, which contains seven unpaired bases including the sequence y, where y is the symbol for pseudouridine. Ribosomes bind tRNAs through recognition of this y loop. Almost all of the invariant residues common to tRNAs lie within the non-hydrogen-bonded regions of the cloverleaf structure (Figure 12.34). Figure 12.36 depicts the complete nucleotide sequence and cloverleaf structure of yeast alanine tRNA.
Figure 12.36 The complete nucleotide sequence and cloverleaf structure of yeast alanine tRNA.
tRNA Tertiary Structure
Tertiary structure in tRNA arises from hydrogen-bonding interactions between bases in the D loop with bases in the variable and y loops, as shown for yeast phenylalanine tRNA in Figure 12.37.
Figure 12.37 Tertiary interactions in yeast phenylalanine tRNA. The molecule is presented in the conventional cloverleaf secondary structure generated by intrastrand hydrogen bonding. Solid lines connect bases that are hydrogen-bonded when this cloverleaf pattern is folded into the characteristic tRNA tertiary structure (see also Figure 12.36).
Note that these H bonds involve the invariant nucleotides of tRNAs, thus emphasizing the importance of the tertiary structure they create to the function of tRNAs in general. These H bonds fold the D and y arms together and bend the cloverleaf into the stable L-shaped tertiary form (Figure 12.38). Many of these H bonds involve base pairs that are not canonical A : T or G : C pairings (Figure 12.38). The amino acid acceptor stem is at one end of the L, separated by 7 nm or so from the anticodon at the opposite end of the L. The D and y loops form the corner of the L. In the L-conformation, the bases are oriented to maximize hydrophobic stacking interactions between their flat faces. Such stacking is a second major factor contributing to L-form stabilization.
Figure
12.38
(a) The three-dimensional structure of yeast phenylalanine tRNA as deduced
from X-ray diffraction studies of its crystals. The tertiary folding is illustrated
in the center of the diagram with the ribose–phosphate backbone presented as
a continuous ribbon; H bonds are indicated by crossbars. Unpaired bases are
shown as short, unconnected rods. The anticodon loop is at the bottom and the
-CCA 3'-OH acceptor end is at the top right. The various types of noncanonical
hydrogen-bonding interactions observed between bases surround the central molecule.
Three of these structures show examples of unusual H-bonded interactions involving
three bases; these interactions aid in establishing tRNA tertiary structure.
(b) A space-filling model of the molecule. (After Kim, S. H., in Schimmel,
P., Söll, D., and Abelson, J. N., eds., 1979 . Transfer RNA: Structure,
Properties, and Recognition.
Ribosomal RNA
rRNA Secondary Structure
Ribosomes, the protein-synthesizing machinery of cells, are composed of two subunits, called small and large, and ribosomal RNAs are integral components of these subunits (see Table 11.2). A large degree of intrastrand sequence complementarity is found in all rRNA strands, and all assume a highly folded pattern that allows base pairing between these complementary segments. Figure 12.39 shows the secondary structure assigned to the E. coli 16S rRNA. This structure is based on alignment of the nucleotide sequence into H-bonding segments. The reliability of these alignments is then tested through a comparative analysis of whether identical secondary structures can be predicted from nucleotide sequences of 16S-like rRNAs from other species. If so, then such structures are apparently conserved. The approach is based on the thesis that, because ribosomal RNA species (regardless of source) serve common roles in protein synthesis, it may be anticipated that they share structural features. The structure is marvelously rich in short, helical segments separated and punctuated by single-stranded loops.
Figure 12.39 The proposed secondary structure for E. coli 16S rRNA, based on comparative sequence analysis in which the folding pattern is assumed to be conserved across different species. The molecule can be subdivided into four domains—I, II, III, and IV—on the basis of contiguous stretches of the chain that are closed by long-range base-pairing interactions. I, the 5'-domain, includes nucleotides 27 through 556. II, the central domain, runs from nucleotide 564 to 912. Two domains comprise the 3'-end of the molecule. III, the major one, comprises nucleotides 923 to 1391. IV, the 3'-terminal domain, covers residues 1392 to 1541.
Comparison of rRNAs from Various Species
If a phylogenetic comparison is made of the 16S-like rRNAs from an archaebacterium (Halobacterium volcanii), a eubacterium (E. coli), and a eukaryote (the yeast Saccharomyces cerevisiae), a striking similarity in secondary structure emerges (Figure 12.40). Remarkably, these secondary structures are similar despite the fact that the nucleotide sequences of these rRNAs themselves exhibit a low degree of similarity. Apparently, evolution is acting at the level of rRNA secondary structure, not rRNA nucleotide sequence. Similar conserved folding patterns are seen for the 23S-like and 5S-like rRNAs that reside in the large ribosomal subunits of various species. An insightful conclusion may be drawn regarding the persistence of such strong secondary structure conservation despite the millennia that have passed since these organisms diverged: all ribosomes are constructed to a common design and all function in a similar manner.

Figure 12.40 Phylogenetic comparison of secondary structures of 16S-like rRNAs from (a) a eubacterium (E. coli), (b) an archaebacterium (H. volcanii), (c) a eukaryote (S. cerevisiae, a yeast).
rRNA Tertiary Structure
Despite the unity in secondary structural patterns, little is known about the three-dimensional, or tertiary, structure of rRNAs. Even less is known about the quaternary interactions that occur when ribosomal proteins combine with rRNAs and when the ensuing ribonucleoprotein complexes, the small and large subunits, come together to form the complete ribosome. Furthermore, assignments of functional roles to rRNA molecules are still tentative and approximate. (We return to these topics in Chapter 33.)
Isopycnic Centrifugation and Buoyant Density of DNA
Density gradient ultracentrifugation is a variant of the basic technique of ultracentrifugation (discussed in the Appendix to Chapter 5). Density gradient centrifugation can be used to isolate DNA. The densities of DNAs are about the same as concentrated solutions of cesium chloride, CsCl (1.6 to 1.8 g/mL). Centrifugation of CsCl solutions at very high rotational speeds, where the centrifugal force becomes 105 times stronger than the force of gravity, causes the formation of a density gradient within the solution. This gradient is the result of a balance that is established between the sedimentation of the salt ions toward the bottom of the tube and their diffusion upward toward regions of lower concentration. If DNA is present in the centrifuged CsCl solution, it moves to a position of equilibrium in the gradient equivalent to its buoyant density (Figure A12.1). For this reason, this technique is also called isopycnic centrifugation.
Figure A12.1 Density gradient centrifugation is a common method of separating macromolecules, particularly nucleic acids, in solution. A cell extract is mixed with a solution of CsCl to a final density of about 1.7 g/cm 3 and centrifuged at high speed (40,000 rpm, giving relative centrifugal forces of about 200,000 g). The biological macromolecules in the extract will move to equilibrium positions in the CsCl gradient that reflect their buoyant densities.
Cesium chloride
centrifugation is an excellent means of removing RNA and proteins in the purification
of DNA. The density of DNA is typically slightly greater than 1.7 g/cm3,
while the density of RNA is more than 1.8 g/cm3. Proteins have densities
less than 1.3 g/cm3. In CsCl solutions of appropriate density, the
DNA bands near the center of the tube, RNA pellets to the bottom, and the proteins
float near the top. Single-stranded DNA is denser than double-helical DNA. The
irregular structure of randomly coiled ssDNA allows the atoms to pack together
through van der Waals interactions. These interactions compact the molecule
into a smaller volume than that occupied by a hydrogen-bonded double helix.
The
net movement of solute particles in an ultracentrifuge is the result of two
processes: diffusion (from regions of higher concentration to regions of lower
concentration) and sedimentation due to centrifugal force (in the direction
away from the axis of rotation). In general, diffusion rates for molecules are
inversely proportional to their molecular weight—larger molecules diffuse more
slowly than smaller ones. On the other hand, sedimentation rates increase with
increasing molecular weight. A macromolecular species that has reached its position
of equilibrium in isopycnic centrifugation has formed a concentrated band of
material.
Essentially
three effects are influencing the movement of the molecules in creating this
concentration zone: (1) diffusion away to regions of lower concentration; (2)
sedimentation of molecules situated at positions of slightly lower solution
density in the density gradient; and (3) flotation (buoyancy or “reverse sedimentation”)
of molecules that have reached positions of slightly greater solution density
in the gradient. The consequence of the physics of these