Chapter 11

Nucleotides and Nucleic Acids


Francis Crick and James Watson point out features of their model for the
structure of DNA. (©A. Barrington Brown/Science Source/Photo Researchers, Inc.)

Nucleotidesand nucleic acids are biological molecules that possess heterocyclic nitrogenous bases as principal components of their structure. The biochemical roles of nucleotides are numerous; they participate as essential intermediates in virtually all aspects of cellular metabolism. Serving an even more central biological purpose are the nucleic acids, the elements of heredity and the agents of genetic information transfer. Just as proteins are linear polymers of amino acids, nucleic acids are linear polymers of nucleotides. Like the letters in this sentence, the orderly sequence of nucleotide residues in a nucleic acid can encode information. The two basic kinds of nucleic acids are deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). Complete hydrolysis of nucleic acids liberates nitrogenous bases, a five-carbon sugar, and phosphoric acid in equal amounts. The five-carbon sugar in DNA is 2-deoxyribose; in RNA, it is ribose. (See Chapter 7 for a detailed discussion of sugars and other carbohydrates.) DNA is the repository of genetic information in cells, while RNA serves in the transcription and translation of this information (Figure 11.1). An interesting exception to this rule is that some viruses have their genetic information stored as RNA.

Figure 11.1 · The fundamental process of information transfer in cells. Information encoded in the nucleotide sequence of DNA is transcribed through synthesis of an RNA molecule whose sequence is dictated by the DNA sequence. As the sequence of this RNA is read (as groups of three consecutive nucleotides) by the protein synthesis machinery, it is translated into the sequence of amino acids in a protein. This information transfer system is encapsulated in the dogma: DNA ® RNA ® protein.

            This chapter describes the chemistry of nucleotides and the major classes of nucleic acids. Chapter 12 presents methods for determination of nucleic acid primary structure (nucleic acid sequencing) and describes the higher orders of nucleic acid structure. Chapter 13 introduces the molecular biology of recombinant DNA: the construction and uses of novel DNA molecules assembled by combining segments from other DNA molecules.

 

11.1 · Nitrogenous Bases

The bases of nucleotides and nucleic acids are derivatives of either pyrimidine or purine. Pyrimidines are six-membered heterocyclic aromatic rings containing two nitrogen atoms (Figure 11.2a). The atoms are numbered in a clockwise fashion, as shown in the figure. The purine ring structure is represented by the combination of a pyrimidine ring with a five-membered imidazole ring to yield a fused ring system (Figure 11.2b). The nine atoms in this system are numbered according to the convention shown.


Figure 11.2 • (a) The pyrimidine ring system; by convention, atoms are numbered as indicated. (b) The purine ring system, atoms numbered as shown.

            The pyrimidine ring system is planar, while the purine system deviates somewhat from planarity in having a slight pucker between its imidazole and pyrimidine portions. Both are relatively insoluble in water, as might be expected from their pronounced aromatic character.

Common Pyrimidines and Purines

The common naturally occurring pyrimidines are cytosine, uracil, and thymine (5-methyluracil) (Figure 11.3). Cytosine and thymine are the pyrimidines typically found in DNA, whereas cytosine and uracil are common in RNA. To view this generality another way, the uracil component of DNA occurs as the 5-methyl variety, thymine. Various pyrimidine derivatives, such as dihydrouracil, are present as minor constituents in certain RNA molecules.

Figure 11.3 · The common pyrimidine bases—cytosine, uracil, and thymine—in the tautomeric forms predominant at pH 7.

            Adenine (6-amino purine) and guanine (2-amino-6-oxy purine), the two common purines, are found in both DNA and RNA (Figure 11.4). Other naturally occurring purine derivatives include hypoxanthine, xanthine, and uric acid (Figure 11.5).

Figure 11.4  · The common purine bases—adenine and guanine—in the tautomeric forms predominant at pH 7.

Hypoxanthine and xanthine are found only rarely as constituents of nucleic acids. Uric acid, the most oxidized state for a purine derivative, is never found in nucleic acids.

Figure 11.5 · Other naturally occurring purine derivatives—hypoxanthine, xanthine, and uric acid.

 

Properties of Pyrimidines and Purines

The aromaticity of the pyrimidine and purine ring systems and the electron-rich nature of their —OH and —NH2 substituents endow them with the capacity to undergo keto-enol tautomeric shifts. That is, pyrimidines and purines exist as tautomeric pairs, as shown in Figure 11.6 for uracil.

Figure 11.6 · The keto/enol tautomerism of uracil.

The keto tautomer is called a lactam, whereas the enol form is a lactim. The lactam form vastly predominates at neutral pH. In other words, pKa values for ring nitrogen atoms 1 and 3 in uracil are greater than 8 (the pKa value for N-3 is 9.5) (Table 11.1).

Table 11.1

Proton Dissociation Constants (pKa Values) for Nucleotides

Nucleotide

pKa Base-N pK1 Phosphate pK2 Phosphate

5'-AMP

3.8 (N-1) 0.9 6.1

5'-GMP

9.4 (N-1)
2.4 (N-7)

0.7 6.1
5'-CMP 4.5 (N-3) 0.8       6.3
5'-UMP 9.5 (N-3) 1.0      6.4

In contrast, as might be expected from the form of cytosine that predominates at pH 7, the pKa value for N-3 in this pyrimidine is 4.5. Similarly, tautomeric forms can be represented for purines, as given for guanine in Figure 11.7.

Figure 11.7 · The tautomerism of the purine, guanine.

Here, the pKa value is 9.4 for N-1 and less than 5 for N-3. These pKa values specify whether hydrogen atoms are associated with the various ring nitrogens at neutral pH. As such, they are important in determining whether these nitrogens serve as H-bond donors or acceptors. Hydrogen bonding between purine and pyrimidine bases is fundamental to the biological functions of nucleic acids, as in the formation of the double helix structure of DNA (see Section 11.6). The important functional groups participating in H-bond formation are the amino groups of cytosine, adenine, and guanine; the ring nitrogens at position 3 of pyrimidines and position 1 of purines; and the strongly electronegative oxygen atoms attached at position 4 of uracil and thymine, position 2 of cytosine, and position 6 of guanine (see Figure 11.21).
            Another property of pyrimidines and purines is their strong absorbance of ultraviolet (UV) light, which is also a consequence of the aromaticity of their heterocyclic ring structures. Figure 11.8 shows characteristic absorption spectra of several of the common bases of nucleic acids—adenine, uracil, cytosine, and guanine—in their nucleotide forms: AMP, UMP, CMP, and GMP (see Section 11.4). This property is particularly useful in quantitative and qualitative analysis of nucleotides and nucleic acids.

Figure 11.8 · The UV absorption spectra of the common ribonucleotides.

11.2 · The Pentoses of Nucleotides and Nucleic Acids

Five-carbon sugars are called pentoses (see Chapter 7). RNA contains the pentose d-ribose, while 2-deoxy-D-ribose is found in DNA. In both instances, the pentose is in the five-membered ring form known as furanose: D-ribofuranose for RNA and 2-deoxy-D-ribofuranose for DNA (Figure 11.9). When these ribofuranoses are found in nucleotides, their atoms are numbered as 1', 2', 3', and so on to distinguish them from the ring atoms of the nitrogenous bases. As we shall see, the seemingly minor difference of a hydroxyl group at the 2'-position has far-reaching effects on the secondary structures available to RNA and DNA, as well as their relative susceptibilities to chemical and enzymatic hydrolysis.

Figure 11.9 ·  Furanose structures—ribose and deoxyribose.

11.3 · Nucleosides Are Formed by Joining a Nitrogenous Base to a Sugar

Nucleosides are compounds formed when a base is linked to a sugar via a glycosidic bond (Figure 11.10). Glycosidic bonds by definition involve the carbonyl carbon atom of the sugar, which in cyclic structures is joined to the ring O atom (see Chapter 7). Such carbon atoms are called anomeric. In nucleosides, the bond is an N-glycoside because it connects the anomeric C-1' to N-1 of a pyrimidine or to N-9 of a purine. Glycosidic bonds can be either a or b, depending on their orientation relative to the anomeric C atom. Glycosidic bonds in nucleosides and nucleotides are always of the b-configuration, as represented in Figure 11.10.

Figure 11.10 · b-Glycosidic bonds link nitrogenous bases and sugars to form nucleosides.

Nucleosides are named by adding the ending -idine to the root name of a pyrimidine or -osine to the root name of a purine. The common nucleosides are thus cytidine, uridine, thymidine, adenosine, and guanosine. The structures shown in Figure 11.11 are ribonucleosides. Deoxy-ribonucleosides, in contrast, lack a 2'-OH group on the pentose. The nucleoside formed by hypoxanthine and ribose is inosine.

 

Figure 11.11 · The common ribonucleosides—cytidine, uridine, adenosine, and guanosine. Also, inosine drawn in anti conformation.

 

Nucleoside Conformation

In nucleosides, rotation of the base about the glycosidic bond is sterically hindered, principally by the hydrogen atom on the C-2' carbon of the furanose. (This hindrance is most easily seen and appreciated by manipulating accurate molecular models of these structures.) Consequently, nucleosides and nucleotides (see next section) exist in either of two conformations, designated syn and anti (Figure 11.12). For pyrimidines in the syn conformation, the oxygen substituent at position C-2 lies immediately above the furanose ring; in the anti conformation, this steric interference is avoided. Consequently, pyrimidine nucleosides favor the anti conformation. Purine nucleosides can adopt either the syn or anti conformation. In either conformation, the roughly planar furanose and base rings are not coplanar but lie at approximately right angles to one another.

Figure 11.12 · Rotation around the glycosidic bond is sterically hindered; syn versus anti conformations in nucleosides are shown.

 

Nucleosides Are More Water-Soluble Than Free Bases

Nucleosides are much more water-soluble than the free bases because of the hydrophilicity of the sugar moiety. Like glycosides (see Chapter 7), nucleosides are relatively stable in alkali. Pyrimidine nucleosides are also resistant to acid hydrolysis, but purine nucleosides are easily hydrolyzed in acid to yield the free base and pentose.

Human Biochemistry
Adenosine: A Nucleoside with Physiological Activity

For the most part, nucleosides have no biological role other than to serve as component parts of nucleotides. Adenosine is an exception. In mammals, adenosine functions as an autocoid, or “local hormone.” This nucleoside circulates in the bloodstream, acting locally on specific cells to influence such diverse physiological phenomena as blood vessel dilation, smooth muscle contraction, neuronal discharge, neurotransmitter release, and metabolism of fat. For example, when muscles work hard, they release adenosine, causing the surrounding blood vessels to dilate, which in turn increases the flow of blood and its delivery of O2 and nutrients to the muscles. In a different autocoid role, adenosine acts in regulating heartbeat. The natural rhythm of the heart is controlled by a pacemaker, the sinoatrial node, that cyclically sends a wave of electrical excitation to the heart muscles. By blocking the flow of electrical current, adenosine slows the heart rate. Supraventricular tachycardia is a

heart condition characterized by a rapid heartbeat. Intravenous injection of adenosine causes a momentary interruption of the rapid cycle of contraction and restores a normal heart rate. Adenosine is licensed and marketed as AdenocardTM to treat supraventricular tachycardia.
            In addition, adenosine is implicated in sleep regulation. During periods of extended wakefulness, extracellular adenosine levels rise as a result of metabolic activity in the brain, and this increase promotes sleepiness. During sleep, adenosine levels fall. Caffeine promotes wakefulness by blocking the interaction of extracellular adenosine with its neuronal receptors.*

*Porrka-Heiskanen, T., et al., 1997. Adenosine: A mediator of the sleep-inducing effects of prolonged wakefulness. Science 276:1265–1268.

11.4 · Nucleotides Are Nucleoside Phosphates

A nucleotide results when phosphoric acid is esterified to a sugar —OH group of a nucleoside. The nucleoside ribose ring has three —OH groups available for esterification, at C-2', C-3', and C-5'(although 2'-deoxyribose has only two). The vast majority of monomeric nucleotides in the cell are ribonucleotides having 5'-phosphate groups. Figure 11.13 shows the structures of the common four ribonucleotides, whose formal names are adenosine 5'-monophosphate, guanosine 5'-monophosphate, cytidine 5'-monophosphate, and uridine 5'-monophosphate. These compounds are more often referred to by their abbreviations: 5'-AMP, 5'-GMP, 5'-CMP, and 5'-UMP, or even more simply as AMP, GMP, CMP, and UMP. Nucleoside 3'-phosphates and nucleoside 2'-phosphates (3'-NMP and 2'-NMP, where N is a generic designation for “nucleoside”) do not occur naturally, but are biochemically important as products of polynucleotide or nucleic acid hydrolysis. Because the pKa value for the first dissociation of a proton from the phosphoric acid moiety is 1.0 or less (Table 11.1), the nucleotides have acidic properties. This acidity is implicit in the other names by which these substances are known—adenylic acid, guanylic acid, cytidylic acid, and uridylic acid. The pKa value for the second dissociation, pK2, is about 6.0, so at neutral pH or above, the net charge on a nucleoside monophosphate is -2. Nucleic acids, which are polymers of nucleoside monophosphates, derive their name from the acidity of these phosphate groups.

Figure 11.13 · Structures of the four common ribonucleotides—AMP, GMP, CMP, and UMP—together with their two sets of full names, for example, adenosine 5'-monophosphate and adenylic acid. Also shown is the nucleoside 3'-AMP.

Cyclic Nucleotides

Nucleoside monophosphates in which the phosphoric acid is esterified to two of the available ribose hydroxyl groups (Figure 11.14) are found in all cells. Forming two such ester linkages with one phosphate results in a cyclic structure. 3',5'-cyclic AMP, often abbreviated cAMP, and its guanine analog 3',5'-cyclic GMP, or cGMP, are important regulators of cellular metabolism (see Part III: Metabolism and Its Regulation).

Figure 11.14 · Structures of the cyclic nucleotides cAMP and cGMP.

 

Nucleoside Diphosphates and Triphosphates

Additional phosphate groups can be linked to the phosphoryl group of a nucleotide through the formation of phosphoric anhydride linkages, as shown in Figure 11.15. Addition of a second phosphate to AMP creates adenosine 5'-diphosphate, or ADP, and adding a third yields adenosine 5'-triphosphate, or ATP. The respective phosphate groups are designated by the Greek letters a, b, and g, starting with the a-phosphate as the one linked directly to the pentose. The abbreviations GTP, CTP, and UTP represent the other corresponding nucleoside 5'-triphosphates. Like the nucleoside 5'-monophosphates, the nucleoside 5'-diphosphates and 5'-triphosphates all occur in the free state in the cell, as do their deoxyribonucleoside phosphate counterparts, represented as dAMP, dADP, and dATP; dGMP, dGDP, and dGTP; dCMP, dCDP, and dCTP; dUMP, dUDP, and dUTP; and dTMP, dTDP, and dTTP.

Figure 11.15 · Formation of ADP and ATP by the successive addition of phosphate groups via phosphoric anhydride linkages. Note the removal of equivalents of H2O in these dehydration synthesis reactions.

NDPsand NTPs Are Polyprotic Acids

Nucleoside 5'-diphosphates (NDPs)and nucleoside 5'-triphosphates (NTPs) are relatively strong polyprotic acids, in that they dissociate three and four protons, respectively, from their phosphoric acid groups. The resulting phosphate anions on NDPs and NTPs form stable complexes with divalent cations such as Mg2+ and Ca2+. Because Mg2+ is present at high concentrations (5 to 10 mM) intracellularly, NDPs and NTPs occur primarily as Mg2+ complexes in the cell. The phosphoric anhydride linkages in NDPs and NTPs are readily hydrolyzed by acid, liberating inorganic phosphate (often symbolized as Pi) and the corresponding NMP. A diagnostic test for NDPs and NTPs is quantitative liberation of Pi upon treatment with 1 N HCl at 100°C for 7 min.

Nucleoside 5'-Triphosphates Are Carriers of Chemical Energy

Nucleoside 5'-triphosphates are indispensable agents in metabolism because the phosphoric anhydride bonds they possess are a prime source of chemical energy to do biological work. ATP has been termed the energy currency of the cell (Chapter 3). GTP is the major energy source for protein synthesis (see Chapter 33), CTP is an essential metabolite in phospholipid synthesis (see Chapter 25), and UTP forms activated intermediates with sugars that go on to serve as substrates in the biosynthesis of complex carbohydrates and polysaccharides (see Chapter 23). The evolution of metabolism has led to the dedication of one of these four NTPs to each of the major branches of metabolism. To complete the picture, the four NTPs and their dNTP counterparts are the substrates for the synthesis of the remaining great class of biomolecules—the nucleic acids.

The Bases of Nucleotides Serve as “Information Symbols”

Virtually all of the biochemical reactions of nucleotides involve either phosphate or pyrophosphate group transfer: the release of a phosphoryl group from an NTP to give an NDP, the release of a pyrophosphoryl group to give an NMP unit, or the acceptance of a phosphoryl group by an NMP or an NDP to give an NDP or an NTP (Figure 11.16). Interestingly, the pentose and the base are not directly involved in this chemistry. However, a “division of labor” directs ATP to serve as the primary nucleotide in central pathways of energy metabolism, while GTP, for example, is used to drive protein synthesis. Thus, the various nucleotides are channeled in appropriate metabolic directions through specific recognition of the base of the nucleotide. That is, the bases of nucleotides serve solely as information symbols aloof from the covalent bond chemistry that goes on. This role as information symbols extends to nucleotide polymers, the nucleic acids, where the bases serve as the information symbols for the code of genetic information.

Figure 11.16 · Phosphoryland pyrophosphoryl group transfer, the major biochemical reactions of nucleotides.

 

 

11.5 · Nucleic Acids Are Polynucleotides

Nucleic acidsare linear polymers of nucleotides linked 3' to 5' by phosphodiester bridges (Figure 11.17). They are formed as 5'-nucleoside monophosphates are successively added to the 3'-OH group of the preceding nucleotide, a process that gives the polymer a directional sense. Polymers of ribonucleotides are named ribonucleic acid, or RNA. Deoxyribonucleotide polymers are called deoxyribonucleic acid, or DNA. Because C-1' and C-4' in deoxyribonucleotides are involved in furanose ring formation and because there is no 2'-OH, only the 3'- and 5'-hydroxyl groups are available for internucleotide phosphodiester bonds. In the case of DNA, a polynucleotide chain may contain hundreds of millions of nucleotide units. Any structural representation of such molecules would be cumbersome at best, even for a short oligonucleotide stretch.

Figure 11.17 · 3'-5' phosphodiester bridges link nucleotides together to form polynucleotide chains.

 

Shorthand Notations for Polynucleotide Structures

Several conventions have been adopted to convey the sense of polynucleotide structures. A repetitious uniformity exists in the covalent backbone of polynucleotides, in which the chain can be visualized as running from 5' to 3' along the atoms of one furanose and thence across the phosphodiester bridge to the furanose of the next nucleotide in line. Thus, this backbone can be portrayed by the symbol of a vertical line representing the furanose and a slash representing the phosphodiester link, as shown in Figure 11.18.

Figure 11.18 · Furanosesare represented by lines; phosphodiesters are represented by diagonal slashes in this shorthand notation for nucleic acid structures.

 

The diagonal slash runs from the middle of a furanose line to the bottom of an adjacent one to indicate the 3'- (middle) to 5'- (bottom) carbons of neighboring furanoses joined by the phosphodiester bridge. The base attached to each furanose is indicated above it by a one-letter designation: A, C, G, or U (or T). The convention in all notations of nucleic acid structure is to read the polynucleotide chain from the 5'-end of the polymer to the 3'-end. Note that this reading direction actually passes through each phosphodiester from 3' to 5'.

 

Base Sequence

The only significant variation that commonly occurs in the chemical structure of nucleic acids is the nature of the base at each nucleotide position. These bases are not part of the sugar-phosphate backbone but instead serve as distinctive side chains, much like the R groups of amino acids along a polypeptide backbone. They give the polymer its unique identity. A simple notation of these structures is merely to list the order of bases in the polynucleotide using single capital letters—A, G, C, and U (or T). Occasionally, a lowercase “p” is written between each successive base to indicate the phosphodiester bridge, as in GpApCpGpUpA. A “p” preceding the sequence indicates that the nucleic acid carries a PO4 on its 5'-end, as in pGpApCpGpUpA; a “p” terminating the sequence connotes the presence of a phosphate on the 3'-OH end, as in GpApCpGpUpAp.
            A more common method of representing nucleotide sequences is to omit the “p” and write only the order of bases, such as GACGUA. This notation assumes the presence of the phosphodiesters joining adjacent nucleotides. The presence of 3'- or 5'-phosphate termini, however, must still be specified, as in GACGUAp for a 3'-PO4 terminus. To distinguish between RNA and DNA sequences, DNA sequences are typically preceded by a lowercase “d” to denote deoxy, as in d-GACGTA. From a simple string of letters such as this, any biochemistry student should be able to draw the unique chemical structure for a pentanucleotide, even though it may contain over 200 atoms.

11.6 · Classes of Nucleic Acids

The two major classes of nucleic acids are DNA and RNA. DNA has only one biological role, but it is the more central one. The information to make all the functional macromolecules of the cell (even DNA itself) is preserved in DNA and accessed through transcription of the information into RNA copies. Coincident with its singular purpose, there is only a single DNA molecule (or “chromosome”) in simple life forms such as viruses or bacteria. Such DNA molecules must be quite large in order to embrace enough information for making the macromolecules necessary to maintain a living cell. The Escherichia coli chromosome has a molecular mass of 2.9 x 109 D and contains over 9 million nucleotides. Eukaryotic cells have many chromosomes, and DNA is found principally in two copies in the diploid chromosomes of the nucleus, but it also occurs in mitochondria and in chloroplasts, where it encodes some of the proteins and RNAs unique to these organelles.

Table 11.2
Various Kinds of RNA Found in an E. coli Cell

Type

Sedimentation
Coefficient
Molecular Weight
Number of
Nucleotide
Residues
Percentage of
Total Cell
RNA

mRNA

6–25
25,000–1,000,000
75–000
~2

tRNA

~4
23,000–30,000
73–94
16

rRNA

5
16
23
 35,000
550,000
1,100,000
120
1542
2904
82

            In contrast, RNA occurs in multiple copies and various forms (Table 11.2). Cells contain up to eight times as much RNA as DNA. RNA has a number of important biological functions, and on this basis, RNA molecules are categorized into several major types: messenger RNA, ribosomal RNA, and transfer RNA. Eukaryotic cells contain an additional type, small nuclear RNA (snRNA). With these basic definitions in mind, let’s now briefly consider the chemical and structural nature of DNA and the various RNAs. Chapter 12 elaborates on methods to determine the primary structure of nucleic acids by sequencing methods and discusses the secondary and tertiary structures of DNA and RNA. Part IV, Information Transfer, includes a detailed treatment of the dynamic role of nucleic acids in the molecular biology of the cell.

DNA

The DNA isolated from different cells and viruses characteristically consists of two polynucleotide strands wound together to form a long, slender, helical molecule, the DNA double helix. The strands run in opposite directions; that is, they are antiparallel and are held together in the double helical structure through interchain hydrogen bonds (Figure 11.19). These H bonds pair the bases of nucleotides in one chain to complementary bases in the other, a phenomenon called base pairing.

Figure 11.19 · The antiparallel nature of the DNA double helix.

 

Chargaff’sRules

A clue to the chemical basis of base pairing in DNA came from the analysis of the base composition of various DNAs by Erwin Chargaff in the late 1940s. His data showed that the four bases commonly found in DNA (A, C, G, and T) do not occur in equimolar amounts and that the relative amounts of each vary from species to species (Table 11.3). Nevertheless, Chargaff noted that certain pairs of bases, namely, adenine and thymine, and guanine and cytosine, are always found in a 1:1 ratio and that the number of pyrimidine residues always equals the number of purine residues. These findings are known as Chargaff’s rules: [A] = [T]; [C] = [G]; [pyrimidines] = [purines].

Table 11.3
Molar Ratios Leading to the Formulation of Chargaff’s Rules
Source

Adenine
to
Guanine

Thymine
to
Cytosine
Adenine
to
Thymine 
Guanine
to
Cytosine
Purines
to
Pyrimidines

Ox 

1.29 1.43 1.04 1.00 1.1
Human 1.56 1.75 1.00 1.00 1.0
Hen 1.45 1.29 1.06 0.91 0.99
Salmon 1.43 1.43 1.02  1.02   1.02
Wheat 1.22 1.18 1.00 0.97 0.99
Yeast 1.67 1.92 1.03 1.20 1.0
Hemophilusinfluenzae 1.74 1.54  1.07 0.91 1.0
E. coli K-12 1.05  0.95 1.09 0.99 1.0
Avian tubercle bacillus 0.4  0.4  1.09 1.08   1.1
Serratiamarcescens 0.7 0.7 0.95 0.86 0.9
Bacillus schatz 0.7 0.6 1.12 0.89 1.0
Source: After Chargaff, E., 1951. Federation Proceedings 10:654–659.

Watson and Crick’s Double Helix

James Watson and Francis Crick, working in the Cavendish Laboratory at CambridgeUniversityin 1953, took advantage of Chargaff’s results and the data obtained by Rosalind Franklin and Maurice Wilkins in X-ray diffraction studies on the structure of DNA to conclude that DNA was a complementary double helix. Two strands of deoxyribonucleic acid (sometimes referred to as the Watson strand and the Crick strand) are held together by hydrogen bonds formed between unique base pairs, always consisting of a purine in one strand and a pyrimidine in the other. Base pairing is very specific: if the purine is adenine, the pyrimidine must be thymine. Similarly, guanine pairs only with cytosine (Figure 11.20). Thus, if an A occurs in one strand of the helix, T must occupy the complementary position in the opposing strand. Likewise, a G in one dictates a C in the other. Because exceptions to this exclusive pairing of A only with T and G only with C are rare, these pairs are taken as the standard or accepted law, and the A:T and G:C base pairs are often referred to as canonical. As Watson recognized from testing various combinations of bases using structurally accurate models, the A:T pair and the G:C pair form spatially equivalent units (Figure 11.20). The backbone-to-backbone distance of an A;T pair is 1.11 nm, virtually identical to the 1.08 nm chain separation in G:C base pairs.

Figure 11.20 · The Watson–Crick base pairs A:T and G:C.

            The DNA molecule not only conforms to Chargaff’s rules but also has a profound property relating to heredity: The sequence of bases in one strand has a complementary relationship to the sequence of bases in the other strand. That is, the information contained in the sequence of one strand is conserved in the sequence of the other. Therefore, separation of the two strands and faithful replication of each, through a process in which base pairing specifies the nucleotide sequence in the newly synthesized strand, leads to two progeny molecules identical in every respect to the parental double helix (Figure 11.21). Elucidation of the double helical structure of DNA represented one of the most significant events in the history of science. This discovery more than any other marked the beginning of molecular biology. Indeed, upon solving the structure of DNA, Crick proclaimed in The Eagle, a pub just across from the Cavendish lab, “We have discovered the secret of life!”

Figure 11.21 · Replication of DNA gives identical progeny molecules because base pairing is the mechanism determining the nucleotide sequence synthesized within each of the new strands during replication.

 

 

Size of DNA Molecules

Because of the double helical nature of DNA molecules, their size can be represented in terms of the numbers of nucleotide base pairs they contain. For example, the E. coli chromosome consists of 4.64 x 106 base pairs (abbreviated bp) or 4.64 x 103 kilobase pairs (kbp). DNA is a threadlike molecule. The diameter of the DNA double helix is only 2 nm, but the length of the DNA molecule forming the E. coli chromosome is over 1.6 x 106 nm (1.6 mm). Because the long dimension of an E. coli cell is only 2000 nm (0.002 mm), its chromosome must be highly folded. Because of their long, threadlike nature, DNA molecules are easily sheared into shorter fragments during isolation procedures, and it is difficult to obtain intact chromosomes even from the simple cells of prokaryotes.

DNA in the Form of Chromosomes


Figure 11.22 ·
If the cell walls of bacteria such as Escherichia coli are partially digested and the cells are then osmotically shocked by dilution with water, the contents of the cells are extruded to the exterior. In electron micrographs, the most obvious extruded component is the bacterial chromosome, shown here surrounding the cell. (Dr. Gopal Murti/CNRI/PhototakeNYC)

 

DNA occurs in various forms in different cells. The single chromosome of prokaryotic cells (Figure 11.22) is typically a circular DNA molecule. Relatively little protein is associated with prokaryotic chromosomes. In contrast, the DNA molecules of eukaryotic cells, each of which defines a chromosome, are linear and richly adorned with proteins. A class of arginine- and lysine-rich basic proteins called histones interact ionically with the anionic phosphate groups in the DNA backbone to form nucleosomes, structures in which the DNA double helix is wound around a protein “core” composed of pairs of four different histone polypeptides (Figure 11.23; see also Section 12.5 in Chapter 12). Chromosomes also contain a varying mixture of other proteins, so-called non-histone chromosomal proteins, many of which are involved in regulating which genes in DNA are transcribed at any given moment. The amount of DNA in a diploid mammalian cell is typically more than 1000 times that found in an E. coli cell. Some higher plant cells contain more than 50,000 times as much.

Figure 11.23 · A diagram of the histone octamer. Nucleosomes consist of two turns of DNA supercoiled about a histone “core” octamer.

RNA

Messenger RNA

Messenger RNA (mRNA) serves to carry the information or “message” that is encoded in genes to the sites of protein synthesis in the cell, where this information is translated into a polypeptide sequence. Because mRNA molecules are transcribed copies of the protein-coding genetic units that comprise most of DNA, mRNA is said to be “the DNA-like RNA.”

            Messenger RNA is synthesized during transcription, an enzymatic process in which an RNA copy is made of the sequence of bases along one strand of DNA. This mRNA then directs the synthesis of a polypeptide chain as the information that is contained within its nucleotide sequence is translated into an amino acid sequence by the protein-synthesizing machinery of the ribosomes. Ribosomal RNA and tRNA molecules are also synthesized by transcription of DNA sequences, but unlike mRNA molecules, these RNAs are not subsequently translated to form proteins. Only the genetic units of DNA sequence that encode proteins are transcribed into mRNA molecules. In prokaryotes, a single mRNA may contain the information for the synthesis of several polypeptide chains within its nucleotide sequence (Figure 11.24). In contrast, eukaryotic mRNAs encode only one polypeptide, but are more complex in that they are synthesized in the nucleus in the form of much larger precursor molecules called heterogeneous nuclear RNA, or hnRNA. hnRNA molecules contain stretches of nucleotide sequence that have no protein-coding capacity. These noncoding regions are called intervening sequences or introns because they intervene between coding regions, which are called exons. Introns interrupt the continuity of the information specifying the amino acid sequence of a protein and must be spliced out before the message can be translated. In addition, eukaryotic hnRNA and mRNA molecules have a run of 100 to 200 adenylic acid residues attached at their 3'-ends, so-called poly(A) tails. This polyadenylylation occurs after transcription has been completed and is believed to contribute to mRNA stability. The properties of messenger RNA molecules as they move through transcription and translation in prokaryotic versus eukaryotic cells are summarized in Figure 11.24.

Figure 11.24 · The properties of mRNA molecules in prokaryotic versus eukaryotic cells during transcription and translation.

 

Ribosomal RNA

Ribosomes, the supramolecular assemblies where protein synthesis occurs, are about 65% RNA of the ribosomal RNA type. Ribosomal RNA(rRNA) molecules fold into characteristic secondary structures as a consequence of intramolecular hydrogen bond interactions (marginal figure). The different species of rRNA are generally referred to according to their sedimentation coefficients1 (see the Appendix to Chapter 5), which are a rough measure of their relative size (Table 11.2 and Figure 11.25).
            Ribosomesare composed of two subunits of different sizes that dissociate from each other if the Mg2+concentration is below 10-3M. Each subunit is a supramolecular assembly of proteins and RNA and has a total mass of 106daltons or more. E. coli ribosomal subunits have sedimentation coefficients of 30S (the small subunit) and 50S (the large subunit). Eukaryotic ribosomes are somewhat larger than prokaryotic ribosomes, consisting of 40S and 60S subunits. The properties of ribosomes and their rRNAs are summarized in Figure 11.25.

Figure 11.25 · The organization and composition of prokaryotic and eukaryotic ribosomes.

 

The 30S subunit of E. coli contains a single RNA chain of 1542 nucleotides. This small subunit rRNA itself has a sedimentation coefficient of 16S. The large E. coli subunit has two rRNA molecules, a 23S (2904 nucleotides) and a 5S (120 nucleotides). The ribosomes of a typical eukaryote, the rat, have rRNA molecules of 18S (1874 nucleotides) and 28S (4718 bases), 5.8S (160 bases), and 5S (120 bases). The 18S rRNA is in the 40S subunit and the latter three are all part of the 60S subunit.
            Ribosomal RNAs characteristically contain a number of specially modified nucleotides, including pseudouridine residues, ribothymidylic acid, and methylated bases (Figure 11.26). The central role of ribosomes in the biosynthesis of proteins is treated in detail in Chapter 33. Here we briefly note the significant point that genetic information in the nucleotide sequence of an mRNA is translated into the amino acid sequence of a polypeptide chain by ribosomes.


Figure 11.26 ·
Unusual bases of RNA—pseudouridine, ribothymidylic acid, and various methylated bases.

Transfer RNA also has a complex secondary structure due to many intrastrand hydrogen bonds.


Transfer RNA

Transfer RNA (tRNA) serves as a carrier of amino acid residues for protein synthesis. Transfer RNA molecules also fold into a characteristic secondary structure (marginal figure). The amino acid is attached as an aminoacyl ester to the 3'-terminus of the tRNA. Aminoacyl-tRNAs are the substrates for protein biosynthesis. The tRNAs are the smallest RNAs (size range—23 to 30 kD) and contain 73 to 94 residues, a substantial number of which are methylated or otherwise unusually modified. Transfer RNA derives its name from its role as the carrier of amino acids during the process of protein synthesis (see Chapters 32 and 33). Each of the 20 amino acids of proteins has at least one unique tRNA species dedicated to chauffeuring its delivery to ribosomes for insertion into growing polypeptide chains, and some amino acids are served by several tRNAs. For example, five different tRNAs act in the transfer of leucine into proteins. In eukaryotes, there are even discrete sets of tRNA molecules for each site of protein synthesis—the cytoplasm, the mitochondrion, and, in plant cells, the chloroplast. All tRNA molecules possess a 3'-terminal nucleotide sequence that reads -CCA, and the amino acid is carried to the ribosome attached as an acyl ester to the free 3'-OH of the terminal A residue. These aminoacyl-tRNAs are the substrates of protein synthesis, the amino acid being transferred to the carboxyl end of a growing polypeptide. The peptide bond-forming reaction is a catalytic process intrinsic to ribosomes.

Small Nuclear RNAs

Small nuclear RNAs,or snRNAs, are a class of RNA molecules found in eukaryotic cells, principally in the nucleus. They are neither tRNA nor small rRNA molecules, although they are similar in size to these species. They contain from 100 to about 200 nucleotides, some of which, like tRNA and rRNA, are methylated or otherwise modified. No snRNA exists as naked RNA. Instead, snRNA is found in stable complexes with specific proteins forming small nuclear ribonucleoprotein particles, or snRNPs, which are about 10S in size. Their occurrence in eukaryotes, their location in the nucleus, and their relative abundance (1 to 10% of the number of ribosomes) are significant clues to their biological purpose: snRNPs are important in the processing of eukaryotic gene transcripts (hnRNA) into mature messenger RNA for export from the nucleus to the cytoplasm (Figure 11.24).

Significance of Chemical Differences Between DNA and RNA

Two fundamental chemical differences distinguish DNA from RNA:

1. DNA contains 2-deoxyribose instead of ribose.

2. DNA contains thymine instead of uracil.

What are the consequences of these differences and do they hold any significance in common? An argument can be made that, because of these differences, DNA is a more stable polymeric form than RNA. The greater stability of DNA over RNA is consistent with the respective roles these macromolecules have assumed in heredity and information transfer.


Figure 11.27 ·
Deamination of cytosine forms uracil.

            Consider first why DNA contains thymine instead of uracil. The key observation is that cytosine deaminates to form uracil at a finite rate in vivo (Figure 11.27). Because C in one DNA strand pairs with G in the other strand, whereas U would pair with A, conversion of a C to a U could potentially result in a heritable change of a CG pair to a UA pair. Such a change in nucleotide sequence would constitute a mutation in the DNA. To prevent this reaction from leading to changes in nucleotide sequence, a cellular repair mechanism “proofreads” DNA, and when a U arising from C deamination is encountered, it is treated as inappropriate and is replaced by a C. If DNA normally contained U rather than T, this repair system could not readily distinguish U formed by C deamination from U correctly paired with A. However, the U in DNA is “5-methyl-U” or, as it is conventionally known, thymine (Figure 11.28). That is, the 5-methyl group on T labels it as if to say “this U belongs; do not replace it.”

Figure 11.28 · The 5-methyl group on thymine labels it as a special kind of uracil.

 

            The ribose 2'-OH group of RNA is absent in DNA. Consequently, the ubiquitous 3'-O of polynucleotide backbones lacks a vicinal hydroxyl neighbor in DNA. This difference leads to a greater resistance of DNA to alkaline hydrolysis, examined in detail in the following section. To view it another way, RNA is less stable than DNA because its vicinal 2'-OH group makes the 3'-phosphodiester bond susceptible to nucleophilic cleavage (Figure 11.29). For just this reason, it is selectively advantageous for the heritable form of genetic information to be DNA rather than RNA.

Figure 11.29 · The vicinal OOH groups of RNA are susceptible to nucleophilic attack leading to hydrolysis of the phosphodiester bond and fracture of the polynucleotide chain; DNA lacks a 2'-OH vicinal to its 3'-O-phosphodiester backbone. Alkaline hydrolysis of RNA results in the formation of a mixture of 2'- and 3'-nucleoside monophosphates.

 

11.7 · Hydrolysis of Nucleic Acids

Most reactions of nucleic acid hydrolysis break bonds in the polynucleotide backbone. Such reactions are important because they can be used to manipulate these polymeric molecules. For example, hydrolysis of polynucleotides generates smaller fragments whose nucleotide sequence can be more easily determined.

Hydrolysis by Acid or Base

RNA is relatively resistant to the effects of dilute acid, but gentle treatment of DNA with 1 mM HCl leads to hydrolysis of purine glycosidic bonds and the loss of purine bases from the DNA. The glycosidic bonds between pyrimidine bases and 2'-deoxyribose are not affected, and, in this case, the polynucleotide’s sugar-phosphate backbone remains intact. The purine-free polynucleotide product is called apurinic acid.
            DNA is not susceptible to alkaline hydrolysis. On the other hand, RNA is alkali labile and is readily hydrolyzed by dilute sodium hydroxide. Cleavage is random in RNA, and the ultimate products are a mixture of nucleoside 2'- and 3'-monophosphates. These products provide a clue to the reaction mechanism (Figure 11.29). Abstraction of the 2'-OH hydrogen by hydroxyl anion leaves a 2'-O2 that carries out a nucleophilic attack on the δ+ phosphorus atom of the phosphate moiety, resulting in cleavage of the 5'-phosphodiester bond and formation of a cyclic 2',3'-phosphate. This cyclic 2',3'-phosphodiester is unstable and decomposes randomly to either a 2'- or 3'-phosphate ester. DNA has no 2'-OH; therefore DNA is alkali stable.

A Deeper Look
Peptide Nucleic Acids (PNAs) Are Synthetic Mimics of DNA and RNA

Synthetic chemists have invented analogs of DNA (and RNA) in which the sugar–phosphate backbone is replaced by a peptide backbone, creating a polymer appropriately termed a peptide nucleic acid, or PNA. The PNA peptide backbone was designed so that the space between successive bases was the same as in natural DNA (see figure). PNA consists of repeating units of N-(2-aminoethyl)-glycine residues linked by peptide bonds; the bases are attached to this backbone through methylene carbonyl linkages. This chemistry provides six bonds along the backbone between bases and three bonds between the backbone and each base, just like natural DNA. PNA oligomers interact with DNA (and RNA) through specific base-pairing interactions, just as would be expected for a pair of complementary oligonucleotides. PNAs are resistant to nucleases and also are poor substrates for

proteases. PNAs thus show great promise as specific diagnostic probes for unique DNA or RNA nucleotide sequences. PNAs also have potential application as antisense drugs (see problem 5 in the end-of-chapter problems).

Note the repeating six bonds (in bold) between base attachments and the three-bond linker between base (B) and backbone.

Buchardt, O., et al., 1993. Peptide nucleic acids and their potential applications in biotechnology. Trends in Biotechnology 11:384–386.

Enzymatic Hydrolysis

Enzymes that hydrolyze nucleic acids are called nucleases. Virtually all cells contain various nucleases that serve important housekeeping roles in the normal course of nucleic acid metabolism. Organs that provide digestive fluids, such as the pancreas, are rich in nucleases and secrete substantial amounts to hydrolyze ingested nucleic acids. Fungi and snake venom are often good sources of nucleases. As a class, nucleases are phosphodiesterases because the reaction that they catalyze is the cleavage of phosphodiester bonds by H2O. Because each internal phosphate in a polynucleotide backbone is involved in two phosphoester linkages, cleavage can potentially occur on either side of the phosphorus (Figure 11.30).

Figure 11.30 · Cleavage in polynucleotide chains: a cleavage yields 5'-phosphate products, whereas b cleavage gives 3'-phosphate products.

 

Convention labels the 3'-side as a and the 5'-side as b. Cleavage on the a side leaves the phosphate attached to the 5'-position of the adjacent nucleotide, while b-side hydrolysis yields 3'-phosphate products. Enzymes or reactions that hydrolyze nucleic acids are characterized as acting at either a or b. A second convention denotes whether the nucleic acid chain was cleaved at some internal location, endo, or whether a terminal nucleotide residue was hydrolytically removed, exo. Note that exo a cleavage characteristically occurs at the 3'-end of the polymer, whereas exo b cleavage involves attack at the 5'-terminus (Figure 11.31).

Figure 11.31 ·  Snake venom phosphodiesterase and spleen phosphodiesterase are exonucleases that degrade polynucleotides from opposite ends.

 

 

 

Nuclease Specificity

Like most enzymes (see Chapter 14), nucleases exhibit selectivity or specificity for the nature of the substance on which they act. That is, some nucleases act only on DNA (DNases), while others are specific for RNA (the RNases). Still others are nonspecific and are referred to simply as nucleases, as in nuclease S1 (see Table 11.4).

Table 11.4
Specificity of Various Nucleases

Enzyme

DNA, RNA,
or Both
a or b
Specificity

Exonucleases
  Snake venom phosphodiesterase 
  Spleen phosphodiesterase

Both
Both
a
b
Starts at 3'-end, 5'-NMP products
Starts at 5'-end, 3'-NMP products

Endonucleases
  RNase A (pancreas)

RNA
b
Where 3'-PO4 is to pyrimidine; oligos with pyrimidine 3'-PO4 ends
Bacillus subtilis RNase 
RNA
b
Where 3'-PO4 is to purine; oligos with purine 3'-PO4 ends
RNase T1                 
RNA
b
Where 3'-PO4 is to guanine
RNase T2                    
RNA
b
Where 3'-PO4 is to adenine
DNase I (pancreas)                 
DNA
a
Preferably between Py and Pu; nicks dsDNA, creating  3'-OH ends

DNase II (spleen, thymus, 
Staphylococcus aureus
)

DNA
b

Oligo products

Nuclease S1                
Both 
a
Cleaves single-stranded but not double-stranded nucleic acids

Nucleases may also show specificity for only single-stranded nucleic acids or may only act on double helices. Single-stranded nucleic acids are abbreviated by an ss prefix, as in ssRNA; the prefix ds denotes double-stranded. Nucleases may also display a decided preference for acting only at certain bases in a polynucleotide (Figure 11.32), or, as we shall see for restriction endonucleases, some nucleases will act only at a particular nucleotide sequence four to eight nucleotides in length. Table 11.4 lists the various permutations in specificity displayed by these nucleases and gives prominent examples of each. To the molecular biologist, nucleases are the surgical tools for the dissection and manipulation of nucleic acids in the laboratory.

Figure 11.32 · An example of nuclease specificity: The specificity of RNA hydrolysis by bovine pancreatic RNase. This RNase cleaves b at 3'-pyrimidines, yielding oligonucleotides with pyrimidine 3'-PO4 ends.

            Exonucleases degrade nucleic acids by sequentially removing nucleotides from their ends. Two in common use are snake venom phosphodiesterase and bovine spleen phosphodiesterase (Figure 11.31). Because they act on either DNA or RNA, they are referred to by the generic name phosphodiesterase. These two enzymes have complementary specificities. Snake venom phosphodiesterase acts by a cleavage and starts at the free 3'-OH end of a polynucleotide chain, liberating nucleoside 5'-monophosphates. In contrast, the bovine spleen enzyme starts at the 5'-end of a nucleic acid, cleaving b and releasing 3'-NMPs.

Restriction Enzymes

Restriction endonucleasesare enzymes, isolated chiefly from bacteria, that have the ability to cleave double-stranded DNA. The term restriction comes from the capacity of prokaryotes to defend against or “restrict” the possibility of takeover by foreign DNA that might gain entry into their cells. Prokaryotes degrade foreign DNA by using their unique restriction enzymes to chop it into relatively large but noninfective fragments. Restriction enzymes are classified into three types, I, II, or III. Types I and III require ATP to hydrolyze DNA and can also catalyze chemical modification of DNA through addition of methyl groups to specific bases. Type I restriction endonucleases cleave DNA randomly, while type III recognize specific nucleotide sequences within dsDNA and cut the DNA at or near these sites.

 Type II Restriction Endonucleases

Type II restriction enzymeshave received widespread application in the cloning and sequencing of DNA molecules. Their hydrolytic activity is not ATP-dependent, and they do not modify DNA by methylation or other means. Most importantly, they cut DNA within or near particular nucleotide sequences that they specifically recognize. These recognition sequences are typically four or six nucleotides in length and have a twofold axis of symmetry. For example, E. coli has a restriction enzyme, EcoRI, that recognizes the hexanucleotide sequence GAATTC:

Note the twofold symmetry: the sequence read 5' ® 3' is the same in both strands.

            When EcoRI encounters this sequence in dsDNA, it causes a staggered, double-stranded break by hydrolyzing each chain between the G and A residues:

Staggered cleavage results in fragments with protruding single-stranded 5'-ends:

Because the protruding termini of EcoRI fragments have complementary base sequences, they can form base pairs with one another.

Therefore, DNA restriction fragments having such “sticky” ends can be joined together to create new combinations of DNA sequence. If the fragments are derived from DNA molecules of different origin, novel recombinant forms of DNA are created.
            EcoRI leaves staggered 5'-termini. Other restriction enzymes, such as PstI, which recognizes the sequence 5'-CTGCAG-3' and cleaves between A and G, produce cohesive staggered 3'-ends. Still others, such as BalI, act at the center of the twofold symmetry axis of their recognition site and generate blunt ends that are noncohesive. BalIrecognizes 5'-TGGCCA-3' and cuts between G and C.
            Table 11.5 lists many of the commonly used restriction endonucleases and their recognition sites. Because these sites all have twofold symmetry, only the sequence on one strand needs to be designated.

 

Table 11.5
Restriction Endonucleases

About 1000 restriction enzymes have been characterized. They are named by italicized three-letter codes, the first a capital letter denoting the genus of the organism of origin, while the next two letters are an abbreviation of the particular species. Because prokaryotes often contain more than one restriction enzyme, the various representatives are assigned letter and number codes as they are identified. Thus, EcoRI is the initial restriction endonuclease isolated from Escherichia coli, strain R. With one exception (NciI), all known type II restriction endonucleases generate fragments with 5'-PO4 and 3'-OH ends.

Enzyme

Common
Isoschizomers
Recognition
Sequence
Compatible Cohesive Ends
AluI   AG¯CT Blunt
ApyI AtuI, EcoRII CC¯GG  
AsuII    TT¯CGAA ClaI, HpaII, TaqI
Ava   G¯PyCGPuG SalI, XhoI, XmaI
AvrII   C¯CTAGG  
BalI   TGG¯CCA Blunt
BamHI   G¯GATCC BclI, BglII, MboI, Sau3A, XhoII

BclI

  TgGATCA BamHI, BglII, MboI, Sau3A, XhoII
BglII   A¯GATCT BamHI, BclI, MboI, Sau3A, XhoII
BstEII    G¯GTNACC  
BstXI   CCANNNNN¯NTGG  
ClaI   AT¯CGAT AccI, AcyI, AsyII, HpaII, TaqI
DdeI   C¯TNAG  
EcoRI   G¯AATTC  
EcoRII AtuI, ApyI gCC GG  
FnuDII  ThaI CG¯CG Blunt
HaeI   GG¯CC Blunt

HaeII 

  PuGCGC¯Py  
HaeIII   GG¯CC Blunt
HincII   GTPy¯PuAC Blunt
HindIII   A¯AGCTT  
HpaI   GTT¯AAC Blunt
HpaII   C¯CGG AccI, AcyI, AsuII, ClaI, TaqI
KpnI   GGTAC¯C BamHI, BclI, BglII, XhoII
MboI    Sau3A ¯GATC  
MspI   C¯CGG  
MstI   TGC¯GCA Blunt
NotI   GC¯GGCCGC  
PstI   CTGCA¯G  
SacI SstI GAGCT¯C  
SalI   G¯TCGAC AvaI, XhoI
Sau3A   ¯GATC BamHI,BclI,BglII, MboI, XhoII
Sfi   GGCCNNNN¯NGGCC  
SmaI XmaI CCC¯GGG      Blunt
Sph    GCATG¯C  
SstI SacI   GAGCT¯C  
TaqI   T¯CGA AccI, AcyI, AsuII, ClaI,HpaII
XbaI   T¯CTAGA  
XhoI   C¯TCGAG AvaI, SalI
XhoII   ¯GATC BamHI, BclI, BglII, MboI, Sau3A

XmaI

SmaI C¯CCGGG AvaI

Isoschizomers.  Different restriction enzymes sometimes recognize and cleave within identical target sequences. For example, MboI and Sau3A recognize the same tetranucleotide run: 5'-GATC-3'. Both cleave the DNA strands at the same position, namely, on the 5'-side of the G. Such enzymes are called isoschizomers, meaning that they cut at the same site. The enzyme BamHI is an isoschizomer of MboI and Sau3A except that it has greater specificity because it acts only at hexanucleotide sequences reading GGATCC. BamHI cuts between the two G’s, leaving cohesive 5'-ends that can match up with MboI or Sau3A fragments.

Restriction Fragment Size.  Assuming random distribution and equimolar proportions for the four nucleotides in DNA, a particular tetranucleotide sequence should occur every 44 nucleotides, or every 256 bases. Therefore, the fragments generated by a restriction enzyme that acts at a four-nucleotide sequence should average about 250 bp in length. “Six-cutters,” enzymes such as EcoRI or BamHI, will find their unique hexanucleotide sequences on the average once in every 4096 (46)bp of length. Because the genetic code is a triplet code with three bases of DNA specifying one amino acid in a polypeptide sequence, and because polypeptides typically contain at most 1000 amino acid residues, the fragments generated by six-cutters are approximately the size of prokaryotic genes. This property makes these enzymes useful in the construction and cloning of genetically useful recombinant DNA molecules. For the isolation of even larger nucleotide sequences, such as those of genes encoding large polypeptides (or those of eukaryotic genes that are disrupted by large introns), partial or limited digestion of DNA by restriction enzymes can be employed. However, restriction endonucleases that cut only at specific nucleotide sequences 8 or even 13 nucleotides in length are also available, such as NotI and SfiI.

Restriction Mapping

The application of these sequence-specific nucleases to problems in molecular biology is considered in detail in Chapter 13, but one prominent application is described here. Because restriction endonucleases cut dsDNA at unique sites to generate large fragments, they provide a means for mapping DNA molecules that are many kilobase pairs in length. Restriction digestion of a DNA molecule is in many ways analogous to proteolytic digestion of a protein by an enzyme such as trypsin (see Chapter 5): the restriction endonuclease acts only at its specific sites so that a discrete set of nucleic acid fragments is generated. This action is analogous to trypsin cleavage only at Arg and Lysresidues to yield a particular set of tryptic peptides from a given protein. The restriction fragments represent a unique collection of different-sized DNA pieces. Fortunately, this complex mixture can be resolved by electrophoresis (see the Appendix to Chapter 5). Electrophoresis of DNA molecules on gels of restricted pore size (as formed in agarose or polyacrylamide media) separates them according to size, the largest being retarded in their migration through the gel pores while the smallest move relatively unhindered. Figure 11.33 shows a hypothetical electrophoretogram obtained for a DNA molecule treated with two different restriction nucleases, alone and in combination. Just as cleavage of a protein with different proteases to generate overlapping fragments allows an ordering of the peptides, restriction fragments can be ordered or “mapped” according to their sizes, as deduced from the patterns depicted in Figure 11.33.

 

Figure 11.33 · Restriction mapping of a DNA molecule as determined by an analysis of the electrophoretic pattern obtained for different restriction endonuclease digests. (Keep in mind that a dsDNA molecule has a unique nucleotide sequence and therefore a definite polarity; thus, fragments from one end are distinctly different from fragments derived from the other end.)