The Maya encoded their history in hieroglyphs
carved on stelae and temples like these ruins in
Tikal, Guatemala. (Ó George Holton/Photo
Researchers Inc.)

Chapter 32

The Genetic Code

We turn now to the problem of how the sequence of nucleotides in an mRNA molecule is translated into the specific amino acid sequence of a protein. The problem raises both informational and mechanical questions. First, what is the genetic code that allows the information specified in a sequence of bases to be translated into the amino acid sequence of a polypeptide? That is, how is the 4-letter language of nucleic acids translated into the 20-letter language of proteins? Implicit in this question is a mechanistic dilemma for structural biologists: It is easy to see how base pairing establishes a one-to-one correspondence that allows the template-directed synthesis of polynucleotide chains in the processes of replication and transcription. However, there is no obvious chemi-cal affinity between the purine and pyrimidine bases and the 20 different amino acids. Nor is there any structural complementarity or stereochemical connection between polynucleotides and amino acids that might guide the translation of information.

Figure 32.1 · The general structure of tRNA molecules. Circles represent nucleotides in the tRNA sequence. The numbers given indicate the standardized numbering system for tRNAs (which differ in total number of nucleotides). Dots indicate places where the number of nucleotides may vary in different tRNA species. Recall from Chapter 11 that tRNA molecules often have modified or unusual bases.

            Francis Crick reasoned that adapter molecules must bridge this informational gap. These adapter molecules must interact specifically with both nucleic acids (mRNAs) and amino acids. At least 20 different adapter molecules would be needed, at least one for each amino acid. The various adapter molecules would be able to read the genetic code in an mRNA template and align the amino acids according to the template’s directions so that they could be polymerized into a unique polypeptide. Transfer RNAs (tRNAs; Figure 32.1) are the adapter molecules (Chapters 11 and 12). Amino acids are attached to the 3'-OH at the 3'-CCA end of tRNAs as aminoacyl esters. The formation of these aminoacyl-tRNAs, so-called “charged tRNAs,” is catalyzed by specific amino-acyl-tRNA synthetases. There is one of these enzymes for each of the 20 amino acids and each aminoacyl-tRNA synthetase loads its amino acid only onto tRNAs designed to carry it. In turn, these tRNAs specifically recognize unique sequences of bases in the mRNA through complementary base pairing.

Figure 32.2 · (a) An overlapping versus a nonoverlapping code. (b) A continuous versus a punctuated code.

 

32.1 · Elucidating the Genetic Code

Once it was realized that the sequence of bases in a gene specified the sequence of amino acids in a protein, various possibilities for the genetic code came under consideration. How many bases were necessary to specify each amino acid? Is the code overlapping or nonoverlapping (Figure 32.2)? Is the code punctuated or continuous? Mathematical considerations favored a triplet of bases as the minimal code word, or codon, for each amino acid: A doublet code based on pairs of the four possible bases, A, C, G, and U, has 42 = 16 unique arrangements, an insufficient number to encode the 20 amino acids. A triplet code of four bases has 43 = 64 possible code words, more than enough for the task. Genetic results gave early answers to several of the other questions. For example, point mutations in the gene encoding the coat protein of TMV (tobacco mosaic virus, Chapter 29) caused single amino acid substitutions, discounting the possibility that the code was overlapping. A single base change in an overlapping code should cause multiple amino acid changes in the protein. For example, three changes would occur if the code were an overlapping triplet one (Figure 32.2).

The General Nature of the Genetic Code

The genetic code is a triplet code read continuously from a fixed starting point in each mRNA. Specifically, it is defined by the following:

1.  A group of three bases codes for one amino acid.

2.  The code is not overlapping.

3.  The base sequence is read from a fixed starting point without punctuation. That is, the mRNA sequences contain no “commas” signifying appropriate groupings of triplets. If the reading frame is displaced by one base, it remains shifted throughout the subsequent message; no “commas” are present to restore the “correct” frame.

4.  The code is degenerate, meaning that, in most cases, each amino acid can be coded by any of several triplets.

Regarding this latter point, recall that a triplet code yields 64 codons for 20 amino acids. If only 20 of these are used, then the majority of codons would be nonsense in that they would not code for any amino acid. A consequence of degeneracy is that most codons (61 of 64) code for some amino acid.

Elucidating the Genetic Code Through Biochemistry

The actual assignment of codons to the respective amino acids came from in vitro studies using synthetic oligo- and polyribonucleotides as messenger RNAs. Marshall Nirenberg and Heinrich Matthaei discovered that a cell-free system from Escherichia coli catalyzed the synthesis of polyphenylalanine (poly[Phe]) in the presence of polyuridylic acid (poly[U]). This cell-free system contained, among other things, ribosomes, tRNAs, and the soluble enzymes necessary to activate amino acids for protein synthesis. Even though the other 19 amino acids were present in the reaction mixture, only phenylalanine was incorporated into protein when poly[U] served as mRNA. The first codon had been deciphered: UUU codes for Phe. Similar experiments with polyadenylic acid (poly[A]) and polycytidylic acid (poly[C]) yielded polylysine and polyproline, respectively, showing that AAA codes for Lys and CCC codes for Pro.1

Trinucleotides Bound to Ribosomes Promote the Binding of Specific Aminoacyl-tRNAs

Figure 32.3 · The filter-binding assay for elucidation of the genetic code. Reaction mixture includes washed ribosomes, Mg2 1, a particular trinucleotide (pUpUpU in this example), and all 20 aminoacyl-tRNAs, one of which is radioactively (14C) labeled. (a) 14C-labeled prolyl-tRNA. (b) 14C-labeled phenylalanyl-tRNA. Only the aminoacyl-tRNA whose binding is directed by the trinucleotide codon will become bound to the ribosomes and retained on the nitrocellulose filter. The amount of radioactivity retained by the filter is a measure of trinucleotide-directed binding of a particular labeled aminoacyl-tRNA by ribosomes. Use of this binding assay to test the 64 possible codon trinucleotides against the 20 different amino acids quickly enabled researchers to assign triplet code words to the individual amino acids. The genetic code was broken. (Adapted from Nirenberg, M. W., and Leder, P., 1964. RNA codewords and protein synthesis. Science 145:1399 - 1407)

In 1964, Marshall Nirenberg and Philip Leder reported that trinucleotides bound to ribosomes directed the binding of specific aminoacyl-tRNAs. That is, ternary ribosome:trinucleotide:aminoacyl-tRNA complexes could be formed, provided the right trinucleotide and aminoacyl-tRNA combination was present. Aminoacyl-tRNAs were prepared by adding all 20 amino acids to a purified tRNA mixture in the presence of a soluble E. coli fraction containing the necessary aminoacyl-tRNA synthetases. Only one of the amino acids was 14C-labeled in any one binding assay. Trinucleotides are the equivalent of codons, so if a specific trinucleotide promoted the binding of a particular 14C-labeled amino-acyl-tRNA, the base sequence of the trinucleotide must be the code word for that amino acid. Binding was detected because the ribosomes were retained on a nitrocellulose filter while free aminoacyl-tRNAs passed through; only aminoacyl-tRNAs bound by ribosomes were retained (Figure 32.3).
            This system was quickly exploited to elucidate the genetic code. Elucidation of the genetic code was probably the greatest scientific achievement of the 1960s. For their roles in it, Marshall Nirenberg and H. Gobind Khorana shared in the 1968 Nobel Prize for physiology or medicine.

1Because polyguanylic acid (poly[G]) has a very strong tendency to form multistranded helices, it was a poor template for protein synthesis. The fact that GGG codes for Gly was not learned until later.

32.2 · The Nature of the Genetic Code

The complete translation of the genetic code is presented in Table 32.1. Codons, like other nucleotide sequences, are read 5' ® 3'. Codons represent triplets of bases in mRNA or, replacing U with T, triplets along the nontranscribed (nontemplate) strand of DNA. Several noteworthy features characterize the genetic code:

1.  All the codons have meaning. Sixty-one of the 64 codons specify particular amino acids. The remaining 3—UAA, UAG, and UGA—specify no amino acid and thus they are nonsense codons. Nonsense codons serve as termination codons —they are “stop” signals indicating that the end of the protein has been reached.

Table 32.1

The Genetic Code

First
Position
(5'-end)

Second
Position

Third
Position
(3'-end)
  U C A G  
U

UUU Phe

UCU Ser UAU Tyr UGU Cys

U

UUC Phe UCC Ser UAC Tyr UGC Cys C
UUA Leu UCA Ser UAA Stop UGA Stop A
UUG Leu UCG Ser UAG Stop UGG Trp G
C

CUU Leu

CCU Pro CAU His CGU Arg U
CUC Leu CCC Pro CAC His CGC Arg C
CUA Leu CCA Pro CAA Gln CGA Arg A
CUG Leu CCG Pro CAG Gln CGG Arg G
A AUU Ile ACU Thr AAU Asn AGU Ser U
AUC Ile ACC Thr AAC Asn AGC Ser C
AUA Ile  ACA Thr AAA Lys AGA Arg A
AUG Met* ACG Thr AAG Lys  AGG Arg  G
G GUU Val GCU Ala GAU Asp GGU Gly U
GUC Val GCC Ala GAC Asp GGC Gly C
GUA Val GCA Ala GAA Glu GGA Gly A
GUG Val  GCG Ala GAG Glu GGG Gly G

*AUG signals translation initiation as well as coding for Met residues.
Third-Base Degeneracy Is Color-Coded

Third-Base Relationship

Third Bases with
Same Meaning
Number of Codons
 
 
 
 
 
 
 
Third base irrelevant
Purines      
Pyrimidines
Three out of four
Unique definitions 

Unique definition
U, C, A, G
A or G 
U or C
U, C, A
G only

A only
32 (8 families)
12 (6 pairs)
14 (7 pairs)
3 (AUX=Ile)
2 (AUG=Met)
   (UGG=Trp)
1 (UGA=Stop)

 

2.  The genetic code is unambiguous. Each of the 61 “sense” codons encodes only one amino acid.

3.  The genetic code is degenerate. With the exception of Met and Trp, every amino acid is coded by more than one codon. Several—Arg, Leu, and Ser—are represented by 6 different codons. Codons coding for the same amino acid are called synonymous codons.

4.  Codons representing the same amino acid or chemically similar amino acids tend to be similar in sequence. Often the third base in a codon is irrelevant, so that, for example, all 4 codons in the GGX family specify Gly, and the UCX family specifies Ser (Table 32.1). This feature is known as third-base degeneracy. Note also that codons with a pyrimidine as second base likely encode amino acids with hydrophobic side chains, and codons with a purine in the second-base position typically specify polar or charged amino acids. The two negatively charged amino acids, Asp and Glu, are encoded by GAX codons; GA-pyrimidine gives Asp and GA-purine specifies Glu. The consequence of these similarities is that mutations are less likely to be deleterious because single base changes in a codon will result either in no change or in a substitution with an amino acid similar to the original amino acid. The degeneracy of the code is evolution’s buffer against mutational disruption.

5.  The genetic code is “universal.” Although certain minor exceptions in codon usage occur (see A Deeper Look: Natural Variations in the Standard Genetic Code), the more striking feature of the code is its universality: Codon assignments are virtually the same throughout all organisms—archaea, eubacteria, and eukaryotes. This conformity means that all extant organisms use the same genetic code, providing strong evidence that they all evolved from a common primordial ancestor.

A DEEPER LOOK
Natural Variations in the Standard Genetic Code
The genomes of mitochondria, some prokaryotes, and lower eukaryotes show some exceptions to the standard genetic code (Table 32.1) in codon assignments. The phenomenon is more common in mitochondria. For example, the termination codon UGA codes for tryptophan in mitochondria from various animals, protozoans, and fungi. AUA, normally an Ile codon, codes for methionine in some animal and fungal mitochondrial genomes, and AGA (an Arg codon) is a termination codon in vertebrate mitochondria, but a Ser codon in fruit fly mitochondria. Mitochondria in several species of yeast use the CUX codons to specify Thr instead of Leu. Higher plant mitochondria use CGG, normally an Arg codon, to specify Trp.
            Less common are genomic codon variations within the genomes of prokaryotic and eukaryotic cells. Among the lower eukaryotes, certain ciliated protozoans (Tetrahymena and Paramecium) use UAA and UGA as glutamine codons rather than stop codons. Instances in prokaryotes include use of the stop codon UGA to specify Trp by Mycoplasma. Perhaps most interesting is the use of some UGA codons by both prokaryotes and eukaryotes (including humans) to specify selenocysteine,

an analog of cysteine in which the sulfur atom is replaced by a selenium atom. Indeed, the identification of selenocysteine residues in proteins from bacteria, archaea, and eukaryotes has led some people to nominate selenocysteine as the 21st amino acid! Selenocysteine formation requires a novel selenocysteine-specific tRNA known as tRNASec. This tRNASec is loaded with a Ser residue by seryl-tRNA synthetase, the aminoacyl-tRNA synthetase for serine. Then, in an ATP-dependent process, the Ser-O is replaced by Se. Translation of certain selected UGA codons by selenocysteinyl-tRNASec requires additional proteins and formation of a stable stem-loop secondary structure in the mRNA next to the particular UGA codon.


Selenocysteine

Adapted from Fox, T. D., 1987. Natural variation in the genetic code. Annual Review of Genetics 21:67-91; and Low, S. C., and Berry , M. J., 1996. Knowing when not to stop. Trends in Biochemical Sciences 21:203-208.

 

32.3 · The Second Genetic Code: Aminoacyl-tRNA Synthetase Recognition of the Proper Substrates      

Figure 32.4 · Reduction of cysteinyl-tRNACys with Raney nickel converts the cysteine -CH2SH R group to -CH3. That is, Cys is transformed into Ala.

Codon recognition is achieved by aminoacyl-tRNAs. In order for accurate translation to occur, the appropriate aminoacyl-tRNA must “read” the codon through base pairing via its anticodon loop (Chapter 12). Once an aminoacyl-tRNA has been synthesized, the amino acid part makes no contribution to accurate translation of the mRNA. Von Ehrenstein proved this point by loading 14C-cysteine onto its particular tRNA, tRNACys. The product cysteinyl-tRNACys was chemically reduced so that its -SH group was removed to yield Ala-tRNACys (Figure 32.4). The Ala-tRNACys was then added to an in vitro hemoglobin-synthesizing system and the product Hb was analyzed. Alanine was found at positions in the Hb amino acid sequence normally occupied by Cys. The protein-synthesizing machinery was unable to recognize Ala-tRNACys as “foreign” or inappropriate. Thus, the amino acid attached to a tRNA vehicle is passively chauffeured and becomes inserted into a growing peptide chain as dictated through codon-anticodon recognition.
            Thus, a second genetic code exists, the code by which each aminoacyl-tRNA synthetase discriminates between the 20 amino acids and the many tRNAs and uniquely picks out its proper substrates—one specific amino acid and the tRNA(s) appropriate to it—from among the more than 400 possible combinations. The appropriate tRNA(s) are those having anticodons that can base-pair with the codon(s) specifying the particular amino acid. Clearly, the proper amino acids must be loaded onto the various tRNAs so that the mRNA is translated with fidelity. Although the primary genetic code is key to understanding the central dogma of molecular biology on how DNA encodes proteins, the second genetic code is just as crucial to the fidelity of information transfer.        

cognate · kindred; in this sense, cognate refers to those tRNAs having anticodons that can read one or more of the codons that specify one particular amino acid.

Cells have 20 different aminoacyl-tRNA synthetases, one for each amino acid. Each of these enzymes catalyzes the ATP-dependent esterification of its specific amino acid substrate to the 3'-end of its cognate tRNA molecules (Figure 32.5). The aminoacyl-tRNA synthetase reaction serves two purposes:

1.  It activates the amino acid so that it will readily react to form a peptide bond.

2.  It bridges the information gap between amino acids and codons.

‑‑‑The underlying mechanisms of molecular recognition used by each amino-acyl-tRNA synthetase to bring the proper amino acid to its cognate tRNA are the embodiment of the second genetic code.

Figure 32.5 · The aminoacyl-tRNA synthetase reaction. (a) The overall reaction. (b) The overall reaction commonly proceeds in two steps: (i) formation of an aminoacyl-adenylate, and (ii) transfer of the activated amino acid moiety of the mixed anhydride ' to either the 29-OH (class I aminoacyl-tRNA synthetases) or 39-OH (class II aminoacyl-tRNA synthetases) of the ribose on the terminal adenylic acid at the 39-CCA terminus common to all tRNAs. Those aminoacyl-tRNAs formed as 29-aminoacyl esters undergo a transesterification that moves the aminoacyl function to the 39-O of tRNA. Only the 39-esters are substrates for protein synthesis.

The Aminoacyl-tRNA Synthetase Reaction

Amino acid activation is a two-step process:

1.  Activation of the amino acid through the ATP-dependent formation of an aminoacyl adenylate. Ever-present pyrophosphatases in cells quickly hydrolyze the pyrophosphate product of this reaction, rendering amino acid activation thermodynamically favorable and essentially irreversible.

2.  Transfer of the aminoacyl group from the aminoacyl adenylate to a specific tRNA. Aminoacyl-tRNA synthetases that must discriminate between similar amino acids (such as Ile and Val) show two levels of specificity in the two-step aminoacyl-tRNA synthetase reaction. The specificity at the first step is not absolute, as shown by the ability of isoleucyl-tRNA synthetase to catalyze an ATP 3 4PPi exchange reaction with either isoleucine or valine, that is, reaction (i) of Figure 32.5. Although valyl adenylate is synthesized, no valyl-tRNAIle is released. That is, the overall specificity of the isoleucyl-tRNA synthetase reaction is virtually absolute. The enzyme has an editing function that establishes this specificity: Synthesis of misacylated valyl-tRNAIle triggers an editing deacylase site in the enzyme that hydrolyzes the misacylated aminoacyl-tRNA.

The Two Classes of Aminoacyl-tRNA Synthetases

Table 32.2
The Two Classes of Aminoacyl-tRNA Synthetases
Class I
Class II
Arg
Ala
Cys
Asn
Gln
Asp
Glu
Gly

Ile

His
Leu
Lys

Met

Phe
Trp
Pro

Tyr

Ser
Val
Thr

Despite their common enzymatic function, aminoacyl-tRNA synthetases are a diverse group of proteins in terms of size, amino acid sequence, and oligomeric structure. The subunits show a broad range of sizes (for example, from 334 to 951 amino acid residues in the E. coli enzymes), and four different patterns of subunit organization are seen—a, a2, a4, and a2 b2. In higher eukaryotes, at least some aminoacyl-tRNA synthetases assemble into large multiprotein complexes. The aminoacyl-tRNA synthetases fall into two fundamental classes on the basis of similar amino acid sequence motifs, oligomeric state, and acylation function (Table 32.2). Class I enzymes are chiefly monomeric (a1), whereas class II aminoacyl-tRNA synthetases are always oligomeric (usually homo­dimers). Furthermore, class I aminoacyl-tRNA synthetases first add the amino acid to the 2'-OH of the terminal adenylate residue of tRNA before shifting it to the 3'-OH; class II enzymes add it directly to the 3'-OH (Figure 32.5). Only the 3'-aminoacyl-tRNA esters are substrates for protein synthesis.
Figure 32.6 · Mirror-symmetric interactions of class I versus class II aminoacyl-tRNA synthetases with their tRNA substrates. The two different classes of aminoacyl-tRNA synthetases bind to opposite faces of tRNA molecules. On the left is a space-filling model of the class I glutaminyl-tRNAGln synthetase. Class I synthetases bind to the side of their tRNA substrates shown as closest in this figure (the model tRNA structure is tRNAPhe for purposes of illustration). On the right is a space-filling model of the class II aspartyl-tRNAAsp synthetase; this class of synthetase binds to the side of tRNA closest to it here. (Adapted from Arnez, J. G., and Moras, D., 1997. Structural and functional considerations of the aminoacylation reaction. Trends in Biochemical Sciences 22:211- 216, Figure 5)

     Class I aminoacyl-tRNA synthetases have active site structures based on a parallel b-sheet nucleotide-binding fold (named the Rossman fold, after its discoverer) and two conserved sequence motifs (HIGH and KMSKS) that complete the ATP-binding site. In contrast, class II aminoacyl-tRNA synthetases share a set of conserved sequence motifs (motifs 1, 2, and 3). Motif 1 forms part of the dimerization motif, and motifs 2 and 3 contribute essential residues to the active site. These results suggest that the catalytic domains of these enzymes evolved from two different ancestral predecessors. Apparently, amino­acyl-tRNA synthetases are ranked among the oldest proteins because different forms of these enzymes were present very early in evolution. X-ray crystallographic structures have been solved for a majority of the 20 aminoacyl-tRNA synthetases. Class I and class II aminoacyl-tRNA synthetases interact with the tRNA 3'-terminal CCA and acceptor stem in a mirror-symmetric fashion with respect to each other (Figure 32.6). Class I enzymes bind to the tRNA acceptor stem helix from the minor-groove side, whereas class II enzymes bind it from the major-groove side.
 Figure 32.7 · (a) E. coli glutamyl-tRNA synthetase, a class I enzyme. (b) Thermus thermophilus glycyl-tRNA synthetase, a class II enzyme.(Adapted from Cusack, S., 1995. Eleven down and nine to go. Nature Structural Biology 2:824-831, Figures 2 and 5)

    Both class I and class II aminoacyl-tRNA synthetases can be approximated as two-domain structures, as can their L-shaped tRNA substrates, which have the acceptor stem/CCA-3'-OH at one end and the anticodon stem-loop at the other (see Figures 12.38 and 32.8). This L-shaped tertiary structure of tRNAs separates the 3'-CCA acceptor end from the anticodon loop by a distance of 7.6 nm. The two domains of tRNAs have distinct functions: The 3'-CCA end is the site of aminoacylation, and the anticodon-containing domain interacts with the mRNA template. The two domains of tRNAs interact with the separate domains in the synthetases. One of the two major aminoacyl-tRNA synthetase domains is the catalytic domain (which defines the difference between class I and class II enzymes); this domain interacts with the tRNA 3'-CCA end. The other major domain in aminoacyl-tRNA synthetases is highly variable and interacts with parts of the tRNA beyond the acceptor-TyC stem-loop domain, including, in some cases, the anticodon. Ribbon structures of representative class I and class II aminoacyl-tRNA synthetases are shown in Figure 32.7.

Selective tRNA Recognition by Aminoacyl-tRNA Synthetases

Aside from the need to uniquely recognize their cognate amino acids, amino-acyl-tRNAsynthetases must be able to discriminate between the varioustRNAs. The structural features that permit the synthetases to recognize and amino-acylatetheir cognate tRNA(s) are not universal. That is, a

Figure 32.8 · Ribbon diagram of tRNA tertiary structure. Numbers represent the consensus nucleotide sequence (see Figure 32.1). The locations of nucleotides recognized by the various aminoacyl-tRNA synthetases are indicated; shown within the boxes are one-letter designations of the amino acids whose respective aminoacyl-tRNA synthetases interact at the discriminator base (position 73), acceptor stem, variable pocket and/or loop, or anticodon. The inset shows additional recognition sites in those tRNAs having a variable loop that forms a stem-loop structure. (Adapted from Saks, M. E., Sampson, J. R., and Abelson, J. N., 1994. The transfer RNA problem: A search for rules. Science 263:191 - 197, Figure 2)

common set of rules does not govern tRNA recognition by these enzymes. Most surprising is the fact that the recognition features are not limited to the anticodon and, in some instances, do not even include the anticodon. For most tRNAs, a set of sequence elements is recognized by its specific aminoacyl-tRNA synthetase, rather than a single distinctive nucleotide or base pair. These elements include one or more of the following: (a) at least one base in the anticodon; (b) one or more of the three base pairs in the acceptor stem; and (c) the base at canonical position 73 (the unpaired base preceding the CCA end), referred to as the discriminator base because this base is invariant in the tRNAs for a particular amino acid. Figure 32.8 presents a ribbon diagram of a tRNA molecule showing the common location of nucleotides that contribute to specific recognition by the respective aminoacyl-tRNA synthetases for each of the 20 amino acids. Interestingly, the same set of tRNA features that serves as positive determinants for binding and aminoacylation of the tRNA by its cognate aminoacyl-tRNA synthetase may act as negative determinants that prohibit binding and amino-acylation by other (noncognate) aminoacyl-tRNA synthetases. Because no common set of rules exists, the second genetic code is an operational code based on aminoacyl-tRNA synthetase recognition of varying sequence and structural features in the different tRNA molecules during the operation of aminoacyl-tRNA synthesis. Some examples of this code are given in Figure 32.9 and the discussion below.

Figure 32.9 · Major identity elements in four tRNA species. Each base in the tRNA is represented by a circle. Numbered filled circles indicate positions of identity elements within the tRNA that are recognized by its specific aminoacyl-tRNA synthetase. (Adapted from Schulman, L. H., and Abelson, J., 1988. Recent excitement in understanding transfer RNA identity. Science 240:1591- 1592)

 

tRNA Recognition Sites in E. coli Glutaminyl-tRNAGln Synthetase

Figure 32.10 · (a) A solvent-accessible representation of E. coli glutaminyl-tRNAGln synthetase complexed with tRNAGln and ATP, derived from analysis of the crystal structure of the complex. The protein is colored blue. The sugar-phosphate backbone of the tRNA is red; its bases are yellow. The protein:tRNA contact region extends along one side of the entire length of this extended protein. The acceptor stem of the tRNA and the ATP (green) fit into a cleft at the top of the protein in this view. The enzyme also interacts extensively with the anticodon (lower tip of tRNAGln). (b) Diagram showing the structure of tRNAGln, as represented by its phosphorus atoms (purple spheres), in complex with E. coli glutaminyl-tRNAGln synthetase, as represented in the terms of its Ca atoms (blue). ((a) adapted from Rould, M. A., et al., 1989. Structure of E. coli glutaminyl-tRNA synthetase complexed with tRNAGln and ATP at 2.8 Å resolution. Science 246:1135; photo courtesy of Thomas A. Steitz of Yale University)

E. coli glutaminyl-tRNAGln synthetase is a 63.4-kD (553-residue) class I mono-meric enzyme. The crystal structure of glutaminyl-tRNAGln synthetase complexed with tRNAGln reveals that the enzyme shares a continuous interaction with its cognate tRNA that extends from the anticodon to the acceptor stem along the entire inside of the L-shaped tRNA (Figure 32.10). Specific recognition elements include enzyme contacts with the discriminator base, acceptor stem, and anticodon, particularly the central U in the CUG anticodon. The carboxylate group of Asp235 makes sequence-specific H bonds in the tRNA minor groove with the 2-NH2 group of G3 in the base pair G3:C70 of the acceptor stem. A mutant glutaminyl-tRNAGln synthetase with Asn substituted for Asp at position 235 shows relaxed specificity; that is, it now incorrectly acylates a noncognate tRNA with Gln.

The Identity Elements Recognized by Some Aminoacyl-tRNA Synthetases Reside in the Anticodon

Alteration of the anticodons of either tRNATrp or tRNAVal to CAU, the anticodon for the methionine codon AUG, transforms each of these tRNAs into tRNAMet. That is, these tRNAs are now recognized and charged with methionine by methionyl-tRNA synthetase. Similarly, reversing the methionine CAU anticodon of tRNAMet to UAC transforms this tRNA into a tRNAVal. Clearly, methionyl-tRNA synthetase and valyl-tRNA synthetase rely on the anticodon for crucial identity elements.

Five Different Bases in Yeast tRNAPhe Serve as Its Identity Elements

The structure of yeast tRNAPhe is known in great detail (Figure 32.9; see also Figures 12.37 and 12.38). Five of its bases serve as identity elements for the yeast phenylalanyl-tRNAPhe synthetase: the three bases of the anticodon, residue G20 in the D loop, and A73 near the 3'-end. When yeast tRNAArg, tRNAMet, and tRNATyr were altered so that they each contained a complete set of the five identity elements (namely, G20, G34, A35, A36, and A73), they became excellent substrates of yeast phenylalanyl-tRNAPhe synthetase. The G20 of tRNAPhe may be an important discriminatory nucleotide because a G has not been found at this position in any other yeast tRNA.

Twelve Nucleotides in Common Define the tRNASer Family

Six codons specify serine. Six distinct isoacceptor tRNAs can be aminoacylated by the E. coli seryl-tRNASer synthetase. Five of these tRNAs are the product of E. coli genes; the sixth is encoded by the phage-T4 genome. These 6 tRNAs have anticodons that include UGA, CGA, GGA, and GCU; thus, variations occur at all 3 anticodon base positions. The nucleotide sequences of these 6 tRNAs have been compared and only 12 positions are held in common. These nucleotides include G1, G2, A3 (or U3) and U70 (or A70), C71, C72, and G73 in the acceptor stem, and C11 and G24 in the dihydrouridine (D) stem. All of these nucleotides except G73 are involved in intrachain H bonds (Figure 32.9). When a leucine-specific tRNA was modified so that it shared all 12 tRNASer identities, it was transformed into a tRNASer.

A Single G:U Base Pair Defines tRNAAlas

Figure 32.11 · A microhelix analog of tRNAAla is aminoacylated by alanyl-tRNAAla synthetase, provided it retains the G3:U70 base pair. Substituting C for U at position 70 abolishes its ability to accept Ala. The sequences of tRNAAla/GGC and its microhelix analog are shown. MicrohelixAla consists only of nucleotides 1 through 13 directly connected to 66 through 76 to re-create the tRNAAla 7-bp acceptor stem. (Adapted from Schimmel, P., 1989. Parameters for the molecular recognition of transfer RNAs. Biochemistry 28:2747- 2759)

In contrast, a single, noncanonical base pair, G3:U70, is the principal element by which alanyl-tRNAAla synthetase recognizes tRNAs as its substrates. All cytoplasmic tRNAAla representatives that have been sequenced thus far, from archaebacteria to eukaryotes, possess this G3:U70. Altering the G3:C70 base pairs found in Lys-specific, Cys-specific, and Phe-specific tRNAs to G3:U70 confers alanine acceptability on these tRNAs. Altering the unusual G3:U70 base pair of tRNAAla to G:C, A:U, or even U:G abolishes its ability to be aminoacylated with Ala. On the other hand, provided the G3:U70 base pair is present, alanyl-tRNAAla synthetase aminoacylates a 24-nucleotide “microhelix” analog of tRNAAla (Figure 32.11).
            Paul Schimmel has deduced that the key structural feature of the G3:U70 determinant is the 2-NH2 group of G3. A series of analogs was prepared in which other base pairs replaced the “G3:U70” base pair in an analog of tRNAAla (Figure 32.12). Only the original G3:U70 analog was aminoacylated with Ala by Ala-tRNA synthetase. Note that in RNA A-form double-helical structures such as tRNAs, the G3 2-amino group is exposed in the minor groove of the helix (Figure 32.12). If G3 base-pairs with U at position 70, this -NH2 is not H-bonded. In the G:C and 2-AP:U analogs, this -NH2 is not free, but ­hydrogen-bonded with C or U; the I:U and A:U analogs lack a 2-NH2. The inverse U3:G70 analog (not shown) places the 2-NH2 in the major groove. Paul Schimmel and his colleagues thus concluded that an unpaired guanine 2-amino group at the proper location within the minor groove earmarks a tRNA for aminoacylation by Ala-tRNA synthetase. Because this structural feature is common to all cytoplasmic tRNAAlas, the various tRNA recognition elements must have been decided very early in evolution.

Figure 32.12 · Structures of various base pairs in relationship to the 3:70 position in tRNA molecules. The double-helical regions of transfer RNAs adopt the A-form double-helical conformation of nucleic acids (Chapter 12), which has a deep, narrow major groove on one side and a shallow, wide minor groove on the other.

 

32.4 · Codon-Anticodon Pairing, Third-Base Degeneracy, and the Wobble Hypothesis                                  

Figure 32.13 · Codon-anticodon pairing. Complementary trinucleotide sequence elements align in antiparallel fashion.

Protein synthesis depends on the codon-directed binding of the proper amino-acyl-tRNAs so that the right amino acids are sequentially aligned according to the specifications of the mRNA undergoing translation. This alignment is achieved via codon-anticodon pairing in antiparallel orientation (Figure 32.13). However, considerable degeneracy exists in the genetic code at the third position. Conceivably, this degeneracy could be handled in either of two ways: (a) codon-anticodon recognition could be highly specific so that a complementary anticodon is required for each codon, or (b) fewer than 61 anticodons could be used for the “sense” codons if certain allowances were made in the base-pairing rules. Then, some anticodons could recognize more than one codon. Nirenberg demonstrated as early as 1965 that poly (U) bound all the Phe-tRNAPhe even though UUC is also a Phe codon. This result suggested that the phenylalanine-specific tRNAs could recognize both UUU and UUC. The yeast tRNAAla isolated by Robert Holley in 1965 (Chapter 12) bound to three codons: GCU, GCC, and GCA.

The Wobble Hypothesis

Figure 32.14 · Various base-pairing alternatives. (a) G:A is unlikely because the 2-NH2 of G cannot form one of its H bonds; even water is sterically excluded. U:C may be possible even though the two C=O are juxtaposed. Two U:U arrangements are feasible. G:U and I:U are both possible and somewhat similar. The purine pair I:A is also possible. (b) The relative positions of the glycosidic C1 9 atoms in various base-pairing alternatives. The positional variation seen for the codon C19 carbon atom is a measure of wobble. The U:C, C:U, and either of the two possible U:U base pairs bring the respective glycosidic C19 atoms closer than the standard position; C19 atoms in I:U, G:U, and U:G pairs are spaced similar to the standard; the I:A pair moves them farther apart. (Adapted from Crick, F. H. C., 1966. Codon-anticodon pairing: The wobble hypothesis. Journal of Molecular Biology 19:548- 555)

 

Francis Crick considered these results and tested alternative base-pairing possibilities by model building. He hypothesized that the first two bases of the codon and the last two bases of the anticodon form canonical Watson-Crick A:U or G:C base pairs, but pairing between the third base of the codon and the first base of the anticodon follows less stringent rules. That is, a certain amount of play, or wobble, might occur in base pairing at this position. The first base of the anticodon is sometimes referred to as the wobble position. Crick examined the steric consequences of various noncanonical base pairs. The purine inosine was included because it was known to be a component of tRNAs. In some pairs, the bases were rather close together, as revealed by the relative positions of their respective C1' atoms (Figure 32.14). In Figure 32.14b, the C1' of the first nucleotide in the anticodon is taken as fixed and the relative position of the corresponding codon third-nucleotide C1' is shown. The genetic code must often distinguish between pyrimidines (U or C) versus purines (A or G) in the third position (as in the codons for Phe versus Leu or His versus Gln). Therefore, pairing possibilities that bring the two C1' atoms close to one another (as in the 2 U ··· U possibilities and the U ··· C/C ··· U possibility in Figure 32.14b) must not be tolerated. Otherwise, anticodon U would not specifically interact with either A or G but instead would indiscriminately read any base in the third position of the codon to contribute either a U:U, U:C, U:A, or U:G pairing to the anticodon-codon interaction.

Table 32.3
Base-Pairing Possibilities at the Third Position of the Codon
Base on the Anticodon
Bases Recognized
on the Codon
U
A, G
C
G
A
U
G
U, C
I
U, C, A
Source: Adapted from Crick, F. H. C., 1966. Codon-anticodon pairing: The wobble hypothesis. Journal of Molecular Biology 19:548-555.

     This constraint leads to a set of rules for pairing between the third base of the codon and the first base of the anticodon (Table 32.3). The wobble rules indicate that a first-base anticodon U could recognize either an A or G in the codon third-base position; first-base anticodon G might recognize either U or C in the third-base position of the codon; and first-base anticodon I might interact with U, C, or A in the codon third position.2
     Note that inosine is a versatile base in establishing degeneracy. (Inosine arises in tRNAs from specific A residues that undergo deamination.) Yeast tRNAAla (Figure 12.36) has I in the wobble position. The wobble rules also predict that four-codon families (like Pro or Thr), where any of the four bases may be in the third position, require at least two different tRNAs. Such four-codon families could be read by two tRNAs whose recognition patterns are either UC and AG or are UCA and G.
     The only codons for a given amino acid that differ in either of the first two bases are the 6-codon families for Leu, Ser, and Arg; these amino acids require at least three different tRNAs. Altogether a minimum of 31 tRNAs are necessary to interpret the 61 sense codons. However, most cells have more than 32 different tRNA species. We saw that E. coli has 5 distinct tRNAs for the 6 Ser codons. Some tRNAs have the same anticodon but differ in their nucleotide sequences. For example, there are two distinct tRNATyr species in E. coli, both having a GUA anticodon capable of reading the UAU and UAC codons for Tyr. All members of the set of tRNAs specific for a particular amino acid—termed isoacceptor tRNAs—are served by one aminoacyl-tRNA synthetase.

2Thus, the first base of the anticodon indicates whether the tRNA can read one, two, or three different codons: anticodons beginning with A or C read only one codon, those beginning with G or U read two, while anticodons beginning with I can read three codons.

The Purpose of Wobble

The first two bases of the codon confer most of the codon-anticodon specificity. The wobble position also contributes to codon recognition and specificity, but hydrogen bonds between noncanonical base pairs are weaker, and thus the pairing here is “looser.” Wobbling is possible because the 5'-side of the anticodon is situated in a conformationally flexible part of the tRNA anti-codon loop. There is a kinetic advantage to wobble: If all three base pairs of the codon-anticodon complex were of the strong Watson - Crick type, codon - anticodon associations would be more stable and the tRNAs would dissociate less readily from the mRNA, slowing the rate of protein synthesis. However, because the wobble position makes only a marginal contribution to codon - anticodon interaction, wobble tends to accelerate the process of translation.

32.5 · Codon Usage

Because more than one codon exists for most amino acids, the possibility for variation in codon usage arises. Indeed, variation in codon usage accommodates the fact that the DNA of different organisms varies in relative A:T/G:C content. However, even in organisms of average base composition, codon usage may be biased. Table 32.4 gives some examples from E. coli and humans reflecting the nonrandom usage of codons. Of over 109,000 Leu codons tabulated in human genes, CUG was used over 48,000 times, CUC over 23,000 times, but UUA just 6,000 times.
            The occurrence of codons in E. coli mRNAs correlates well with the relative abundance of the tRNAs that read them. Preferred codons are represented by the most abundant isoacceptor tRNAs. Further, mRNAs for proteins that are synthesized in abundance tend to employ preferred codons. Rare tRNAs correspond to rarely used codons, and messages containing such codons might experience delays in translation.

32.6 · Nonsense Suppression

Mutations that alter a sense codon to one of the three nonsense codons—UAA, UAG, or UGA—result in premature termination of protein synthesis and the release of truncated (incomplete) polypeptides. Geneticists found that second mutations elsewhere in the genome were able to suppress the effects of nonsense mutations so that the organism survived. The molecular basis for such intergenic suppression was a mystery until it was realized that suppressors were mutations in tRNA genes that altered the anticodon so that the mutant tRNA could now read a particular “stop” codon and insert an amino acid. For example, alteration of the anticodon of a tRNATyr from GUA to CUA allows this tRNA to read the so-called amber stop codon, UAG, and insert Tyr. (The nonsense codons are whimsically named amber [UAG], ochre [UAA], and opal [UGA]). Suppressor tRNAs are typically generated from minor tRNA species within a set of isoacceptor tRNAs, so their recruitment to a new role via mutation does not involve loss of an essential tRNA; that is, the mutation is not particularly deleterious to the organism. Several different suppressor tRNAs for each of the stop codons have been characterized in E. coli.
            An interesting amber suppressor mutation results when the anticodon of a tRNATrp is altered from CAA to CUA. Surprisingly, this suppressor tRNA inserts glutamine, not tryptophan. Thus, suppressor tRNAs don’t necessarily carry the same amino acid as the wild-type tRNA. (Cells carrying this suppressor must also have a wild-type copy of tRNATrp to survive.) This suppressor tRNA is no longer a good substrate for tryptophanyl-tRNATrp synthetase, which evidently selects its tRNAs via anticodon recognition. Instead, glutaminyl-tRNA synthetase charges it with Gln. Apparently, glutaminyl-tRNA synthetase has a relaxed specificity for this tRNA substrate. A single base change has influenced both codon-anticodon recognition and the interaction of the tRNA with aminoacyl-tRNA synthetases.

Table 32.4
Representative Examples of Codon Usage in E. coli and Human Genes
The results are expressed as frequency of occurrence of a codon per 1000 codons tabulated in 1562 E. coli genes and 2681 human genes, respectively. (Because E. coli and human proteins differ somewhat in amino acid composition, the frequencies for a particular amino acid do not correspond exactly between the two species.)

Amino Acid

Codon
E. coli Gene
Frequency/1000
Human Gene
Frequency/1000
Leu CUA
CUC
CUG
CUU
UUA   
UUG
3.2
9.9
54.6
10.2
10.9
11.5
6.1
20.1
42.1
10.8
5.4
11.1
Pro CCA
CCC
CCG
CCU
8.2
4.3
23.8
6.6
15.4
20.6
6.8
16.1
Ala

GCA
GCC
GCG
GCU

15.6
34.4
32.9
13.4
14.4
29.7
7.2
18.9
 Lys AAA
AAG
36.5
12.0
21.9
35.2
Glu GAA
GAG
43.5
19.2
26.4
41.6
Adapted from Wada, K., et al., 1992. Codon usage tabulated from GenBank genetic sequence data. Nucleic Acids Research 20:2111-2118.