“Monk Transcribing Manuscript,” ca. 1470. Jean Mielot (author of Miracles de Notre
Dame) at his desk using a quill and scraping knife (Bibliotheque National de Paris/Mary
Evans Picture Library/London)

Chapter 31

Transcription and the Regulation of Gene Expression

In 1958, Francis Crick enunciated the “central dogma of molecular biology” (Figure 31.1). This scheme outlined the residue-by-residue transfer of biological information as encoded in the primary structure of the informational biopolymers, nucleic acids and proteins. The predominant path of information transfer, DNA ® RNA ® protein, postulated that RNA was an information carrier between DNA and proteins, the agents of biological function. In 1961, François Jacob and Jacques Monod extended this hypothesis to predict that the RNA intermediate, which they dubbed messenger RNA, or mRNA, would have the following properties:

Figure 31.1 · Crick’s 1958 view of the “central dogma of molecular biology”: directional flow of detailed sequence information includes DNAnDNA (replication), DNAnRNA (transcription), RNAnprotein (translation), RNAnDNA (reverse transcription). Note that no pathway exists for the flow of information from proteins to nucleic acids, that is, proteinnRNA or DNA. A possible path from DNA to protein has since been discounted. Interestingly, in 1958, mRNA had not yet been discovered.

1.  Its base composition would reflect the base composition of DNA (a property consistent with genes as protein-encoding units).

2.  It would be very heterogeneous with respect to molecular mass, yet the average molecular mass would be several hundred kD. (A      200-kD RNA contains roughly 750 nucleotides, which could encode a protein of about 250 amino acids—approximately 30 kD—a      reasonable estimate for the average size of polypeptides.)

3.  It would be able to associate with ribosomes because ribosomes are the site of protein synthesis.

4.  It would have a high rate of turnover. (That is, mRNA would be rapidly degraded. Turnover of mRNA would allow the rate of      mRNA synthesis to control the rate of protein synthesis.)

        Since Jacob and Monod’s 1961 hypothesis, it has been realized that cells contain three major classes of RNA—mRNA, ribosomal RNA (rRNA), and transfer RNA (tRNA)—all of which participate in protein synthesis (Chapter 11). All of these RNAs are synthesized from DNA templates by DNA-dependent RNA polymerases in the process known as transcription. However, only mRNAs direct the synthesis of proteins. Thus, not all genes encode proteins; some encode rRNAs or tRNAs. Protein synthesis occurs via the process of translation, wherein the instructions encoded in the sequence of bases in mRNA are translated into a specific amino acid sequence by ribosomes, the “workbenches” of polypeptide synthesis (Chapter 33).
        Transcription is tightly regulated in all cells. In prokaryotes, only about 3% of the genes are undergoing transcription at any given time. The metabolic conditions and the growth status of the cell dictate which gene products are needed at any moment. In a differentiated eukaryotic cell, the figure is around 0.01%. Such differentiated cells express only the information needed for their biological functions, not the full genetic potential encoded in their chromosomes.

31.1 · Transcription in Prokaryotes

In prokaryotes, virtually all RNA is synthesized by a single species of DNA-dependent RNA polymerase. (The only exception is the short RNA primers formed by primase during DNA replication.) Like DNA polymerases, RNA polymerase links ribonucleoside 5'-triphosphates (ATP, GTP, CTP, and UTP, represented generically as NTPs) in an order specified by base pairing with a DNA template:

                                                      n NTP ® (NMP)n + n PPi

The enzyme moves along a DNA strand in the 3' ® 5' direction, joining the 5'-phosphate of an incoming ribonucleotide to the 3'-OH of the previous residue. Thus, the RNA chain grows 5' ® 3' during transcription, just as DNA chains do during replication. The reaction is driven by subsequent hydrolysis of PPi to inorganic phosphate by ubiquitous pyrophosphatase activity.

The Structure and Function of Escherichia coli RNA Polymerase

The RNA polymerase of E. coli , so-called RNA polymerase holoenzyme, is a complex multimeric protein (450 kD) large enough to be visible in the electron microscope. Its subunit composition is a2bb's. The largest subunit, b' (155 kD), functions in DNA binding; b (151 kD) binds the nucleoside triphosphate substrates and interacts with s (70 kD). Any of a number of related proteins, the sigma (s) factors, can serve as the s subunit. Sigma subunits function in recognizing specific sequences on DNA called promoters that identify the location of transcription start sites, where transcription begins. Both b and b' contribute to formation of the catalytic site. The two a subunits (36.5 kD each) are essential for assembly of the enzyme and activation by some regulatory proteins. Dissociation of the s subunit from the holoenzyme leaves the so-called core polymerase (a2bb'), which is catalytically competent but unable to recognize promoters.

The Steps of Transcription in Prokaryotes

Transcription can be divided into four stages: (a) binding of RNA polymerase holoenzyme at promoter sites, (b) initiation of polymerization, (c) chain elongation, and (d) chain termination. A discussion of these stages follows.

Figure 31.2 · Sequence of events in the initiation and elongation phases of transcription as it occurs in prokaryotes. Nucleotides in this region are numbered with reference to the base at the transcription start site, which is designated +1.

Binding of RNA Polymerase to Template DNA

The process of transcription begins when the s subunit of RNA polymerase recognizes a promoter sequence (Figure 31.2), and RNApolymerase holoenzyme and the promoter form a so-called closed promoter complex (Figure 31.2, Step 2). Dissociation constants for RNA polymerase holoenzyme:closed promoter complexes range from 1026 to 1029 M. This stage in RNA polymerase:DNA interaction is referred to as the closed promoter complex because the DNA strands must be unwound so that the RNA polymerase can read and transcribe the DNA template strand into a complementary RNA sequence.
        Once the closed promoter complex is established, the RNA polymerase holoenzyme unwinds about 14 base pairs of DNA (base pairs located at positions -10 to +2, relative to the transcription start site—see later), forming the very stable open promoter complex (Figure 31.2, Step 3). In this complex, RNA polymerase holoenzyme is bound very tightly to the DNA (KD » 10-14 M).
        Promoter sequences can be identified in vitro by DNA footprinting: RNA polymerase holoenzyme is bound to a putative promoter sequence in a DNA duplex and the DNA:protein complex is treated with DNase I. DNase I cleaves the DNA at sites not protected by bound protein, and the set of DNA fragments left after DNase I digestion reveals the promoter (by definition, the promoter is the RNA polymerase holoenzyme binding site1).

1Promoters can also be defined genetically in terms of mutations (nucleotide changes) in this region that block gene expression because they inactivate the promoter.

 

A Deeper Look

Conventions Employed in Expressing the Sequences of Nucleic Acids and Proteins

Certain conventions are useful in tracing the course of information transfer from DNA to protein. The strand of duplex DNA that is read by RNA polymerase is termed the template strand. Thus, the strand that is not read is the nontemplate strand. Because the template strand is read by the RNA polymerase moving 3' ® 5' along it, the RNA product, the so-called transcript, grows in the 5' ® 3' direction (see figure). Note that the nontemplate strand has a nucleotide sequence and direction identical to the RNA transcript, except that the transcript has U residues in place of T. The RNA transcript will eventually be translated into the amino acid sequence of a protein (Chapters 32, 33) by a

process in which successive triplets of bases (termed codons), read 5' ® 3', specify a particular amino acid. Polypeptide chains are synthesized in the NnC direction, and the 5'-end of mRNA encodes the N-terminus of the protein.
            By convention, when the order of nucleotides in DNA is speci­fied, it is the 5' ® 3' sequence of nucleotides in the nontemplate strand that is presented. Consequently, if convention is followed, DNA sequences are rendered in terms that correspond directly to mRNA sequences, which correspond in turn to the amino acid sequences of proteins as read beginning with the N-terminus.

From Rhodes, D., and Fairall, L., 1997. Analysis of sequence-specific DNA-binding proteins. In Protein Function: A Practical Approach, T. E. Creighton, ed., Oxford: IRL Press at Oxford University Press.

        RNA polymerase binding typically protects a nucleotide sequence spanning the region from -40 to +20, where the +1 position is defined as the transcription start site: that base in DNA that specifies the first base in the RNA transcript. The next base, +2, specifies the second base in the transcript. Bases in the 5' or “minus” direction from the transcript start site are numbered -1, -2, and so on. (Note that there is no zero.) Nucleotides in the “minus” direction are said to lie upstream of the transcription start site, whereas nucleotides in the 3' or “plus” direction are downstream of the transcription start site. The transcript start site on the template strand is almost always a pyrimidine, so almost all transcripts begin with a purine.

PROPERTIES OF PROKARYOTIC PROMOTERS.  Prokaryotic promoters vary in size from 20 to 200 bp, but typically consist of a 40-bp region located on the 5'-side of the transcription start site. Within the promoter are two consensussequence elements. (A consensus sequence can be defined as the bases that appear with highest frequency at each position when a series of sequences believed to have common function are compared.) These two elements are the Pribnow box2 near -10, whose consensus sequence is the hexameric TATAAT, and a sequence in the 235 region containing the hexameric consensus TTGACA (Figure 31.3). The Pribnow box and the -35 region are separated by about 17 bp of nonconserved sequence. RNA polymerase holoenzyme uses its s subunit to bind to these sequences, and the more closely the -35 region sequence corresponds to its consensus sequence, the greater is the efficiency of transcription of the gene. The highly expressed rrn genes in E. coli which encode ribosomal RNA (rRNA) have a third sequence element in their promoters, the upstream element (UP element), located about 20 bp immediately upstream of the -35 region. (Transcription from the rrn genes accounts for more than 60% of total RNA synthesis in rapidly growing E. coli cells.) Whereas the s subunit recognizes the -10 and -35 elements, the C-terminal domains (CTD) of the a subunits of RNA polymerase recognize and bind the UP element.

Figure 31.3 · The nucleotide sequences of representative E. coli promoters. (In accordance with convention, these sequences are those of the nontemplate strand where RNA polymerase binds.) Consensus sequences for the -35 region, the Pribnow box, and the initiation site are shown at the bottom. The numbers represent the percent occurrence of the indicated base. (Note: the -35 region is only roughly 35 nucleotides from the transcription start site; the Pribnow box [the -10 region] likewise is located at approximately position -10.) In this figure, sequences are aligned relative to the Pribnow box.

        In order for transcription to begin, the DNA duplex must be “opened” so that RNA polymerase has access to single-stranded template. The efficiency of initiation is inversely proportional to the melting temperature, Tm, in the Pribnow box, suggesting that the A:T-rich nature of this region is aptly suited for facile “melting” of the DNA duplex and creation of the open promoter complex (Figure 31.2). Negative supercoiling facilitates transcription initiation by favoring DNA unwinding.
        The RNA polymerase s subunit is directly involved in melting the dsDNA. Interaction of the s subunit with the nontemplate strand maintains the open complex formed between RNA polymerase and promoter DNA, with the s subunit acting as a sequence-specific single-stranded DNA-binding protein. Such association of the s subunit with the nontemplate strand stabilizes the open promoter complex and leaves the bases along the template strand available to the catalytic site of the RNA polymerase.

Initiation of Polymerization

RNA polymerase has two binding sites for NTPs—the initiation site and the elongation site. The initiation site binds the purine nucleotides ATP and GTP preferentially; most RNAs begin with a purine at the 5'-end. The first nucleotide binds at the initiation site, H-bonding with the +1 base exposed within the open promoter complex (Figure 31.2, Step 4). The second incoming nucleotide binds at the elongation site, H-bonding with the +2 base. The ribonucleotides are then united when the 3'-O of the first nucleotide makes a nucleophilic attack on the a-phosphorus atom of the second nucleotide. A phosphoester bond is formed, and PPi is eliminated. Note that the 5'-end of the transcript starts out with a triphosphate attached to it. Movement of RNA polymerase along the template strand (translocation) to the next base prepares the RNA polymerase to add the next nucleotide (Figure 31.2, Step 5). Once an oligonucleotide 6 to 10 residues long has been formed, the s subunit dissociates from RNA polymerase, signaling the completion of initiation (Figure 31.2, Step 6). The core RNA polymerase goes on to synthesize the remainder of the mRNA. As the core RNA polymerase progresses, advancing the 3'-end of the RNA chain, the DNA duplex is unwound just ahead of it. About 12 base pairs of the growing RNA remain base-paired to the DNA template at any time, with the RNA strand becoming displaced as the DNA duplex rewinds behind the advancing RNA polymerase.
Named for David Pribnow, who, along with David Hogness, first recognized the importance of this sequence element in transcription.

Figure 31.4 · The structures of rifamycin B and rifampicin, specific inhibitors of prokaryotic RNA polymerases. Because these compounds do not inhibit eukaryotic RNA polymerases, they have proven useful in the treatment of tuberculosis and infections caused by Gram-positive bacteria.

        Rifamycin B and its analog, rifampicin, are inhibitors of initiation. Despite their structural similarity (Figure 31.4), they act in different ways. Rifamycin binds to the b subunit of RNA polymerase and blocks binding of incoming NTP at the initiation site. Rifampicin allows the first phosphodiester bond to be formed, but it prevents the translocation of RNA polymerase along the DNA template. However, once the second phosphodiester bond is formed, creating an RNA trinucleotide, rifampicin is without effect.

Chain Elongation

Figure 31.5 · Cordycepin is the name given 3'-deoxyadenosine.

Elongation of the RNA transcript is catalyzed by the core polymerase, because once a short oligonucleotide chain has been synthesized, the s subunit dissociates. Cordycepin (Figure 31.5) is an inhibitor of chain elongation in prokaryotes. This nucleoside can be phosphorylated in vivo to give 3'-deoxyadenosine 5'-triphosphate, which can bind to the core polymerase and add to the growing RNA. However, because cordycepin lacks a 3'-OH, it aborts further elongation. The accuracy of transcription is such that, about once every 104 nucleotides, an error is made and the wrong base is inserted. Because many transcripts are made per gene, this error rate is acceptable. Also, the nature of the genetic code is such that errors are often innocuous (Chapter 32).
 Figure 31.6 · Supercoiling versus transcription. (a) If the RNA polymerase followed the template strand around the axis of the DNA duplex, no supercoiling of the DNA would occur, but the RNA chain would be wrapped around the double helix once every 10 bp. This possibility seems unlikely because it would be difficult to disentangle the transcript from the DNA duplex. (b) Alternatively, topoisomerases could remove the supercoils. A topo-isomerase capable of relaxing positive supercoils situated ahead of the advancing transcription bubble would “relax” the DNA. A second topo-isomerase behind the bubble would remove the negative supercoils. (Adapted from Futcher, B., 1988. Supercoiling and transcription, or vice versa? Trends in Genetics 4:271-272)

       Chain elongation does not proceed at a constant rate, but varies between 20 to 50 nucleotides per second. The RNA polymerase slows down and even pauses in G:C-rich regions due to the greater difficulty in unwinding G:C base pairs. As the RNA polymerase moves along the template, the DNA double helix is unwound ahead of it and recloses after the polymerase has passed by. Only a short stretch of RNA:DNA hybrid duplex exists at any time. Two possibilities can be envisioned for the course of the new RNA chain. In one, the RNA chain is wrapped around the DNA as the RNA polymerase follows the template strand around the axis of the DNA duplex, but this possibility seems unlikely due to its potential for tangling the nucleic acid strands (Figure 31.6a). The more likely possibility involves supercoiling of the DNA, so that positive supercoils are created ahead of the transcription bubble and negative supercoils are created behind it (Figure 31.6b). To prevent torsional stress from inhibiting transcription, topoisomerases act to remove these supercoils from the DNA segment undergoing transcription (Figure 31.6b).

Chain Termination

Two types of transcription termination mechanisms operate in bacteria: one that is dependent on a specific protein termination factor called r (pronounced “rho”) and another that is not dependent on this protein. In the latter, termination of transcription is determined by specific sequences in the DNA called termination sites. These sites are not characterized by a unique base where transcription halts. Instead, these sites consist of three structural features whose base-pairing possibilities lead to termination:

Figure 31.7 · The termination site for the E. coli trp operon (the trp operon encodes the enzymes of tryptophan biosynthesis). The inverted repeats give rise to a stem-loop or “hairpin” structure ending in a series of U residues.

 

1.  Inverted repeats, which are typically G:C-rich, so a stable stem-loop structure can form in the transcript via intrachain hydrogen bonding (Figure 31.7).

2.  A nonrepeating segment that punctuates the inverted repeats.

3.  A run of 6 to 8 As in the DNA template, coding for Us in the transcript.

Figure 31.8 · The r factor mechanism of transcription termination. r factor (a) attaches to a recognition site on mRNA and (b) moves along it behind RNA polymerase. (c) When RNA polymerase pauses at the termination site, r factor unwinds the DNA:RNA hybrid in the transcription bubble, (d) releasing the nascent mRNA.

Termination then occurs as follows: A G:C-rich, stem-loop structure, or “hair-pin,” forms in the transcript. The hairpin apparently causes the RNA polymerase to pause, whereupon the A:U base pairs between the transcript and the DNA template strand are displaced through formation of somewhat more stable A:T base pairs between the template and nontemplate strands of the DNA. The result is spontaneous dissociation of the nascent transcript from DNA.
        The alternative mechanism of termination, factor-dependent termination, is less common and mechanistically more complex.
r Factor is an ATP-dependent helicase (hexamer of 50-kD subunits) that catalyzes the unwinding of RNA:DNA hybrid duplexes (or RNA:RNA duplexes). The r factor recognizes and binds to C-rich regions in the RNA transcript. These regions must lack secondary structure and be unoccupied by translating ribosomes for r factor to bind. Once bound, r factor advances in the 5' ® 3' direction until it reaches the transcription bubble (Figure 31.8). There it catalyzes the unwinding of the transcript and template, releasing the nascent RNA chain. It is likely that the RNA polymerase stalls in a G:C-rich termination region, allowing r factor to overtake it.

31.2 · Transcription in Eukaryotes

Eukaryotic cells have three classes of RNA polymerase, each of which synthesizes a different class of RNA. All three enzymes are found in the nucleus. RNA polymerase I is localized to the nucleolus and transcribes the major ribosomal RNA genes. RNA polymerase II transcribes protein-encoding genes, and thus it is responsible for the synthesis of mRNA. RNA polymerase III transcribes tRNA genes, the ribosomal RNA genes encoding 5S rRNA, and a variety of other small RNAs, including several involved in mRNA processing and protein transport.
        All three RNA polymerase types are large, complex multimeric proteins (500 to 700 kD), consisting of 10 or more types of subunits. Although the three differ in overall subunit composition, they have several smaller subunits in common. Further, all possess two large subunits (each 140 kD or greater) having sequence similarity to the large b and b' subunits of E. coli RNA polymerase, indicating that the fundamental catalytic site of RNA polymerase is conserved among its various forms.
Figure 31.9 · The structure of a-amanitin, one of a series of toxic compounds known as amatoxins that are found in the mushroom Amanita phalloides.

        In addition to their different functions, the three classes of RNA polymerase can be distinguished by their sensitivity to a-amanitin (Figure 31.9), a bicyclic octapeptide produced by the poisonous mushroom Amanita phalloides (the “destroying angel” mushroom). a-Amanitin blocks RNA chain elongation. Although RNA polymerase I is resistant to this compound, RNA polymerase II is very sensitive and RNA pol III is less sensitive.
        The existence of three classes of RNA polymerases acting on three distinct sets of genes implies that at least three categories of promoters exist to maintain this specificity. All three polymerases interact with their promoters via so-called transcription factors, DNA-binding proteins that recognize and accurately initiate transcription at specific promoter sequences. For RNA polymerase I, its templates are the rRNA genes. Ribosomal RNA genes are present in multiple copies. Optimal expression of these genes requires the first 150 nucleotides in the immediate 5'-upstream region, but the precise locations and sequences of the promoter(s) are not known with certainty.
        RNA polymerase III interacts with transcription factors TFIIIA, TFIIIB, and TFIIIC. Interestingly, TFIIIA and/or TFIIIC bind to specific recognition sequences that in some instances are located within the coding regions of the genes, not in the 5'-untranscribed region upstream from the transcription start site. TFIIIB associates with TFIIIA or TFIIIC already bound to the DNA and in turn facilitates the association of RNA pol III to establish an initiation complex.

The Structure and Function of RNA Polymerase II

As the enzyme responsible for the regulated synthesis of mRNA, RNA polymerase II has aroused greater interest than RNA pol I and pol III. RNA pol II must be capable of transcribing a great diversity of genes, yet it must carry out its function at any moment only on those genes whose products are appropriate to the needs of the cell in its everchanging metabolism and growth. The RNA pol II from yeast (Saccharomyces cerevisiae) has been extensively characterized. Yeast is viewed by molecular biologists as an excellent eukaryotic prototype. The yeast RNA pol II consists of 10 different polypeptides, designated RPB1 through RPB10, ranging in size from 220 to 10 kD (Table 31.1).3 RPB1 and RPB2 functions are homologous to those of the prokaryotic RNA polymerase b and b9 subunits: RPB1 has a DNA-binding site, RPB2 binds nucleotide substrates, and both contribute to the catalytic site. RPB3 is the functional homolog of the prokaryotic a; there are two RPB3 subunits per enzyme and RPB3 is essential for assembly of the polymerase. RPB4 resembles s subunit in amino acid sequence. RPB 3, 4, and 7 are unique to RNA pol II, whereas RPB 5, 6, 8, and 10 are common to all three eukaryotic RNA polymerases. RPB 4 and 7 readily dissociate from RNA pol II.
        The RPB1 subunit has an unusual structural feature not found in prokaryotes: Its C-terminal domain (CTD) contains 27 repeats of the amino acid sequence PTSPSYS. (The analogous subunit in RNA pol II enzymes of other eukaryotes has this heptapeptide tandemly repeated as many as 52 times.) Note that the side chains of 5 of the 7 residues in this repeat have -OH groups, endowing the CTD with considerable hydrophilicity and multiple sites for phosphorylation. This domain may project more than 50 nm from the globular enzyme. The CTD is essential to RNA pol II function. Only RNA pol II whose CTD is not phosphorylated can initiate transcription. However, transcription elongation proceeds only after protein phosphorylation within the CTD, suggesting that phosphorylation triggers the conversion of an initiation complex into an elongation complex. Following termination of transcription, a phosphatase recycles RNA pol II to its unphosphorylated form.

Table 31.1
Yeast RNA Polymerase II Subunits
Subunit Side (kD)* Features Prokaryotic Homolog
RPB1 220 PTSPSYS CTD b'
RPB2 150 NTP binding b
RPB3 45 Core assembly a
RPB4 32 Promoter recognition s
RPB5 27 In pol I, II, and III
RPB6 23 In pol I, II, and III
RPB7 17 Unique to pol II
RPB8 14 In pol I, II, and III
RPB9 13
RPB10 10 In pol I, II, and III
*Protein sizes estimated from protein mobilities in SDS-polyacrylamide gel electrophoresis. Actual protein molecular weights deviate somewhat from these values.
Source: Adapted from Woychik, N. A., and Young, R. A., 1990. RNA polymerase II:Subunit structure and function. Trends in Biochemical Sciences 15:347-351.

3RPB stands for RNA polymerase B; RNA pol I, II, and III are sometimes called RNA pol A, B, and C.

Transcription Initiation by RNA Polymerase II

Figure 31.10 · The TATA box in selected eukaryotic genes. The consensus sequence of a number of such promoters is presented in the lower part of the figure, the numbers giving the percent occurrence of various bases at the positions indicated.

Promoters

RNA polymerase II promoters commonly consist of two separate sequence features, the core element, near the transcription start site, where general transcription factors bind, and more distantly located regulatory elements, known variously as enhancers or silencers. These latter elements are recognized by specific DNA-binding proteins that activate transcription above basal levels (enhancers) or repress transcription (silencers). The core region often consists of a TATA box (a TATAAA consensus element) and the transcription start site; the TATA motif is usually located at position -25 (Figure 31.10). An important role of the TATA box is to indicate the site of the initiator element, or Inr, where transcription is initiated. The initiator element Inr encompasses the transcription start site. The sequence of Inr is not highly conserved between genes; a consensus Inr for one gene family is -3YYCAYYYYY+6 (where Y represents any pyrimidine). Regulatory elements occurring near the core promoter (within 50 to 200 bp), the so-called promoter proximal elements, possess one or more binding sites for interaction with DNA-binding regulatory proteins and show great variation in sequence. Other regulatory elements, so-called distal enhancer (or silencer) elements, where another group of DNA-binding regulatory proteins bind, can be located far from the core promoter, either upstream or, rarely, downstream.

Initiation of Transcription in Eukaryotes

Figure 31.11 · Transcription initiation. (a) Model of the yeast TATA-binding protein (TBP) in complex with a yeast DNA TATA sequence. The sugar-phosphate backbone of the TATA box is shown in yellow; the TATA base pairs are in red; adjacent DNA segments are in blue. The saddle-shaped TBP (green) is unusual in that it binds in the minor groove of DNA, sitting on the DNA like a saddle on a horse. TBP-binding pries open the minor groove, creating a 100° bend in the DNA axis and unwinding the DNA within the TATA sequence. The other components of the TFIID heteromer (Table 31.2) sit on TBP, like a “cowboy on a saddle.” All known eukaryotic genes (those lacking a TATA box as well as those transcribed by RNA polymerase I or III) rely on TBP. (Photo courtesy of Paul B. Sigler of Yale University.) (b) Formation of a preinitiation complex at a TATA-containing promoter. Binding of TFIID, the multisubunit protein (>100 kD) consisting of the TATA-binding protein (TBP) and other polypeptides, is stimulated by TFIIA. TFIID bound to the TATA motif recruits TFIIB, forming a DB complex. In association with TFIIF, RNA pol IIA (the nonphosphorylated form of RNA pol II) joins the DB complex to give the DBpol F complex. TFIIE and TFIIH then associate to yield the preinitiation complex. Melting of the DNA duplex around Inr generates the open complex and transcription ensues. (Adapted from Weiss, L., and Reinberg, D., 1992. FASEB Journal 6:3300, Figure 1)

A universal set of proteins, called the basal apparatus, binds the core promoter and initiates transcription. The basal apparatus consists of RNA polymerase II and the general transcription factors (GTFs). There are six GTFs (Table 31.2), five of which are required for transcription: TFIIB, TFIID, TFIIE, TFIIF, and TFIIH. One, TFIIA, stimulates transcription by stabilizing the interaction of TFIID with the TATA box. TFIID consists of TBP (TATA-binding protein), which directly recognizes the TATA box, and a set of TBP-associated factors (TAFs or TAFIIs), which have positive or negative effects on transcription; some are capable of recognizing core promoters lacking a TATA box. TBP binds to the core promoter through contacts made with the minor groove of the DNA, distorting and bending the DNA so that DNA sequences upstream and downstream of the TATA box come into closer proximity (Figure 31.11a). In one model of transcription initiation, once TBP binds to core promoter, TFIIB joins it, followed by RNA polymerase IIA in association with TFIIF. Then other factors join (Figure 31.11b), establishing a competent transcription preinitiation complex. Another model for transcription initiation suggests that RNA polymerase II holoenzyme (RNA polymerase IIA in association with various general transcription factors other than TBP or TFIID) assembles in the absence of any interaction with DNA and then binds to TBP/TFII. In either case, once RNA polymerase IIA and the GTFs have assembled into a preinitiation complex on DNA, an open complex then forms and transcription begins. Figure 31.12a is an illustration of the preinitiation complex with the various components drawn to scale: Figure 31.12b is a computer-modeled representation of the TFIIA-TBP-TFIID-promoter complex.

Figure 31.12 · (a) Structure of the preinitiation complex, showing the distortion of the DNA and the relative positions of RNA polymerase IIA (pol II), the TATA box, TBP, TFIIB (B), and the TFIIE dimer (E). Transcription initiation occurs at a site on DNA within the region encircled by RNA polymerase IIA next to TFIIE. (Adapted from Kornberg, R. D., 1996. RNA polymerase II transcription control. Trends in Biochemical Sciences 21:325-326.) (b) Computer-generated model of the TFIIA-TBP-TFIIB-promoter complex. Note the strong lateral displacement of upstream and downstream DNA segments induced by the proteins. ([a] From Figure 2 of Roeder, R. G., 1996. The role of general initiation factors in transcription by RNA polymerase II. Trends in Biochemical Sciences 21:327-335 and [b] from Figure 3 in Patikoglou, G., and Burley, S. K., 1997. Eukaryotic transcription factor-DNA complexes. Annual Review of Biophysics and Biomolecular Structure 26:289-325. Figure [b] courtesy of Stephen K. Burley of Rockefeller University.) 4A polycistronic mRNA is a single RNA transcript that encodes more than one polypeptide. “Cistron” is a genetic term for a DNA region representing a protein: “cistron” and “gene” are essentially equivalent terms.

Table 31.2
General Transcription Initiation Factors from Human Cells

Factor
Number of
Subunits

(kD) 

Function
TFIID TBP

1 38 Core promoter recognition (TATA); TFIIB recruitment
TAFs 12 15-250 Core promoter recognition (non-TATA elements); positive and negative regulatory functions
TFIIA 3 12,19, 35 Stabilization of TBP binding; stabilization of TAF-DNA interactions
TFIIB 1 35 RNA Pol II-TFIIF recruitment; start-site selection by RNA Pol II
TFIIF 2 30, 74 Promoter targeting of Pol II; destabilization of nonspecific RNA Pol II-DNA interactions
RNA Pol II    12

10-220

Enzymatic synthesis of RNA; TFIIE recruitment
TFIIE 2 34,57 TFIIH recruitment; modulation of TFIIH helicase, ATPase, and kinase activities; promoter melting
TFIIH 9 35-89 Promoter melting using helicase activity; promoter clearance via CTD phosphorylation
Adapted from Table 1 in Roeder, R. G., 1996. The role of general initiation factors in transcription by RNA polymerase II. Trends in Biochemical Sciences 21:327-335.

Transcriptional Activation in Eukaryotes

The regulatory elements or enhancer components of eukaryotic promoters stimulate transcription above basal levels. DNA-binding regulatory proteins that bind to these elements influence transcription from the core promoter through interactions with RNA polymerase that are conveyed through the TAFs or through a set of proteins known as the mediator complex, which is intimately associated with the CTD of RNA polymerase II. Association of the mediator complex with RNA polymerase II is essential to formation of the RNA polymerase II holoenzyme that carries out transcription.

31.3 · Transcription Regulation in Prokaryotes

Figure 31.13 · The general organization of operons. Operons consist of transcriptional control regions and a set of related structural genes, all organized in a contiguous linear array along the chromosome. The transcriptional control regions are the promoter and the operator, which lie next to, or overlap, each other, upstream from the structural genes they control. Operators may lie at various positions relative to the promoter, either upstream or downstream. Expression of the operon is determined by access of RNA polymerase to the promoter, and occupancy of the operator by regulatory proteins influences this access. Induction activates transcription from the promoter; repression prevents it.

 

In bacteria, genes encoding the enzymes of a particular metabolic pathway are often grouped adjacent to one another in a cluster on the chromosome. Such clusters, together with the regulatory sequences that control their transcription, are called operons. This pattern of organization allows all of the genes in the group to be expressed in a coordinated fashion through transcription into a single polycistronic mRNA encoding all the enzymes of the metabolic pathway.4 A regulatory sequence lying adjacent to this unit of transcription determines whether it is transcribed. This sequence is termed the operator (Figure 31.13). The operator is located next to the promoter. Interaction of a regulatory protein with the operator controls transcription of the operon by governing the accessibility of RNA polymerase to the promoter. Although this is the paradigm for prokaryotic gene regulation, it must be emphasized that many regulated prokaryotic genes do not contain operators and are regulated in ways that do not involve protein:operator interactions.

Transcription of Operons Is Controlled by Induction and Repression

Figure 31.14 · The structure of lactose, a b-galactoside.

In prokaryotes, regulation is ultimately responsive to small molecules serving as signals of the nutritional or environmental conditions confronting the cell. Increased synthesis of enzymes in response to the presence of a particular substrate is termed induction. For example, lactose (Figure 31.14) can serve as both carbon and energy source for E. coli . Metabolism of this substrate depends on hydrolysis into its component sugars, glucose and galactose, by the enzyme b-galactosidase. In the absence of lactose, E. coli cells contain very little b-galactosidase (less than 5 molecules per cell). However, lactose availability induces the synthesis of b-galactosidase by activating transcription of the lac operon. One of the genes in the lac operon, lacZ, is the structural gene for b-galactosidase. When its synthesis is fully induced, b-galactosidase can amount to almost 10% of the total soluble protein in E. coli . When lactose is removed from the culture, synthesis of b-galactosidase halts.
        The alternative to induction, namely decreased synthesis of enzymes in response to a specific metabolite, is termed repression. For example, the enzymes of tryptophan biosynthesis in E. coli are encoded in the trp operon. If sufficient Trp is available to the growing bacterial culture, the trp operon is not transcribed, so the Trp biosynthetic enzymes are not made; that is, their synthesis is repressed. Repression of the trp operon in the presence of Trp is an eminently logical control mechanism: If the end product of the pathway is present, why waste cellular resources making unneeded enzymes?
 Figure 31.15 · The structure of IPTG (isopropyl b-thiogalactoside).

       Induction and repression are two faces of the same phenomenon. In induction, a substrate activates enzyme synthesis. Substrates capable of activating synthesis of the enzymes that metabolize them are called co-inducers, or often simply inducers. Some substrate analogs can induce enzyme synthesis even though the enzymes are incapable of metabolizing them. These analogs are called gratuitous inducers. A number of thiogalactosides, such as IPTG (isopropylthiogalactoside, Figure 31.15), are excellent gratuitous inducers of b-galactosidase activity in E. coli . In repression, a metabolite, typically an end product, depresses synthesis of its own biosynthetic enzymes. Such metabolites are called co-repressors.

lac: The Paradigm of Operons

Figure 31.16 · The lac operon. The operon consists of two transcription units. In one unit, there are three structural genes, lacZ, lacY, and lacA, under control of the promoter, plac, and the operator O. In the other unit, there is a regulator gene, lacI, with its own promoter, placI. lacI encodes a 360-residue, 38.6-kD polypeptide that forms a tetrameric lac repressor protein. lacZ encodes b-galactosidase, a tetrameric enzyme of 116-kD subunits. lacY is the b-galactoside permease structural gene, a 46.5-kD integral membrane protein active in b-galactoside transport into the cell. The remaining structural gene encodes a 22.7-kD polypeptide that forms a dimer displaying thiogalactoside transacetylase activity in vitro, transferring an acetyl group from acetyl-CoA to the C-6 OH of thiogalactosides, but the metabolic role of this protein in vivo remains uncertain. lacA mutants show no identifiable metabolic deficiency. Perhaps the lacA protein acts to detoxify toxic analogs of lactose through acetylation.

In 1961, François Jacob and Jacques Monod proposed the operon hypothesis to account for the coordinate regulation of related metabolic enzymes. The operon was considered to be the unit of gene expression, consisting of two classes of genes: the structural genes for the enzymes, and regulatory elements or genes that controlled expression of the structural genes. The two kinds of genes could be distinguished by mutation. Mutations in a structural gene would abolish one particular enzymatic activity, but mutations in a regulatory gene would affect all of the different enzymes under its control. Mutations of both kinds were known in E. coli for lactose metabolism. Bacteria with mutations in either the lacZ gene or the lacY gene (Figure 31.16) could no longer metabolize lactose—the lacZ mutants (lacZ- strains) because b-galactosidase activity was absent, the lacY mutants because lactose was no longer transported into the cell. Lactose transport could still be induced in lacZ mutants, and lacY mutants displayed lactose-inducible b-galactosidase activity. Other mutations defined another gene, the lacI gene. lacI mutants were different because they both expressed b-galactosidase activity and immediately transported lactose, without prior exposure to an inducer. That is, a single mutation led to the expression of lactose metabolic functions independently of inducer. Expression of genes independently of regulation is termed constitutive expression. Thus, lacI had the properties of a regulatory gene. The lac operon includes the regulatory gene lacI, its promoter p, and three structural genes, lacZ, lacY, and lacA, with their own promoter plac and operator O (Figure 31.16).
Figure 31.17 · The mode of action of lac repressor.

 

        The structural genes of the lac operon are controlled by negative regulation. That is, they are transcribed to give an mRNA unless turned off by the lacI gene product. This gene product is the lac repressor, a tetrameric protein (Figure 31.17). The lac repressor has two kinds of binding sites—one for inducer and another for DNA. In the absence of inducer, lac repressor blocks lac gene expression. It accomplishes repression by binding to the operator DNA site upstream from the lac structural genes. Despite the presence of lac repressor, RNA polymerase can still initiate transcription at the plac promoter, but lac repressor blocks elongation of transcription, so initiation is aborted. In lacI mutants, the lac repressor is absent or defective in binding to operator DNA, lac gene transcription is not blocked, and the lac operon is constitutively expressed in these mutants. Note that lacI is normally expressed constitutively from its promoter, so that lac repressor protein is always available to fill its regulatory role. About 10 molecules of lac repressor are present in an E. coli cell.
       Derepression of the lac operon occurs when appropriate b-galactosides occupy the inducer site on lac repressor, causing a conformational change in the protein that lowers the repressor’s affinity for operator DNA. As a tetramer, lac repressor has four inducer binding sites and its response to inducer shows cooperative allosteric effects. Thus, as a consequence of the “inducer”-induced conformational change, the inducer:lac repressor complex dissociates from the DNA, and RNA polymerase transcribes the structural genes (Figure 31.17). Induction reverses rapidly, lac mRNA has a half-life of only 3 minutes, and once the inducer is used up through metabolism by the enzymes, free lac repressor re-associates with the operator DNA, transcription of the operon is halted, and the residual lac mRNA decays.

The lac Operator

Figure 31.18 · The nucleotide sequence of the lac operator. This sequence comprises 36 bp showing nearly palindromic symmetry. The inverted repeats that constitute this approximate twofold symmetry are shaded in rose. The bases are numbered relative to the +1 start site for transcription. The G:C base pair at position +11 represents the axis of symmetry. In vitro studies show that bound lac repressor protects a 26-bp region from -5 to +21 against nuclease digestion. Oc mutants are shown above the operator. Bases that are protected against methylation by dimethyl sulfate or that undergo UV-induced cross-linking to bound lac repressor are indicated below the operator. Note the symmetry of protection at +1 through +4 TTAA to +18 through +21 AATT.

 

The lac operator is a palindromic DNA sequence (Figure 31.18). Palindromes, or “inverted repeats” (Chapter 12), provide a twofold, or dyad, symmetry, a structural feature common at sites in DNA where proteins specifically bind. While the operator consists of 35 bp, 26 of which are protected from nuclease digestion when lac repressor is bound, a central core defined by 13 bp (from +5 to +17) is involved in specific contacts with lac repressor. Mutations at eight sites in this restricted region lead to constitutive expression of the lac operon because repressor can no longer bind (Figure 31.18). These mutants are so-called Oc, or operator-constitutive, mutants. Note that the distribution of Oc mutants is not symmetrical about the axis of symmetry. Further, certain Oc mutations, as in G:CnA:T changes at positions 7 or 9, actually render the palindrome more perfect. The distribution of Oc mutants indicates that repressor contacts with the left half of the palindrome may be more crucial than those with the right half. The operator and promoter (plac) sites overlap: lac repressor protects a region roughly covering nucleotides -5 to +21 from nuclease digestion, whereas RNA polymerase binding and nuclease protection defines plac as falling within the -45 to +18 region.

A Deeper Look
Quantitative Evaluation of lac Repressor:DNA Interactions

The affinity of lac repressor for random DNA ensures that effectively all repressor is DNA bound. Assume that E. coli DNA has a single specific lac operator site for repressor binding and 4.64 x 106 nonspecific sites. (Because the E. coli genome consists of 4.64 x 106 base pairs and any nucleotide sequence even one base out of phase with the operator constitutes a nonspecific binding site, there are 4.64 x 106 nonspecific sites for repressor binding.)

            The binding of repressor to DNA is given by the association constant, KA:

where [repressor:DNA] is the concentration of the repressor:DNA complex, [repressor] is the concentration of free repressor, and [DNA] is the concentration of nonspecific binding sites.

            Rearranging gives the following:

If the number of nonspecific binding sites is 4.64x106, there are (4.64x106)/(6.023x10-3)=0.77x10-17 “moles” of binding sites contained in the volume of a bacterial cell (roughly 10-15 liters). Therefore, [DNA]=(0.77 x 10-17)/(10-15)50.77x10-2 M. Since KA=2x106 M-1 (Table 31.3),

So, the ratio of free repressor to DNA-bound repressor is 6.5 x 1025. Less than 0.01% of repressor is not bound to DNA! The behavior of lac repressor is characteristic of DNA-binding proteins. These proteins bind with low affinity to random DNA sequences, but with much higher affinity to their unique target sites (Table 31.3).

 

Interactions of lac Repressor with DNA

Limited digestion of lac repressor with trypsin removes an N-terminal, 5'-residue fragment from each subunit, leaving a “core” tetramer that is no longer capable of binding to operator DNA. IPTG binding by the “core” tetramer is unaffected. The N-terminal, 5'-residue fragment retains DNA-binding ability. Thus, the protein is composed of an N-terminal, DNA-binding domain, with the rest of the protein functioning in inducer binding and tetramer formation. In the absence of inducer, intact lac repressor nonspecifically binds to duplex DNA with an association constant, KA, of 2 x 106 M-1 (Table 31.3), and to the lac operator DNA sequence with much higher affinity, KA = 2 x 1013 M-1. Thus, lac repressor binds 107 times better to lac operator DNA than to any random DNA sequence. IPTG binds to lac repressor with an association constant of about 106 M-1. The IPTG:lac repressor complex binds to operator DNA with an association constant, KA = 2 x 1010 M-1. Although this affinity is high, it is 3 orders of magnitude less than the affinity of inducer-free repressor for lac operator. There is no difference in the affinity of free lac repressor and lac repressor with IPTG bound for nonoperator DNA. The lac repressor apparently acts by binding to DNA and sliding along it, testing sequences in a one-dimensional search until it finds the lac operator. The lac repressor then binds there with high affinity until inducer causes this affinity to drop by 3 orders of magnitude.

Positive Control of the lac Operon by CAP

Transcription by RNA polymerase from some promoters proceeds with low efficiency unless assisted by an accessory protein that acts as a positive regulator. One such protein is CAP, or catabolite activator protein. Its name derives from the phenomenon of catabolite repression in E. coli. Catabolite repression is a global control that coordinates gene expression with the total physiological state of the cell: As long as glucose is available, E. coli catabolizes it in preference to any other energy source, such as lactose or galactose. Catabolite repression ensures that the operons necessary for metabolism of these alternative energy sources, that is, the lac and gal operons, remain repressed until the supply of glucose is exhausted. Catabolite repression overrides the influence of any inducers that might be present.

Table 31.3
The Affinity of lac Repressor for DNA*

DNA

Repressor
Repressor + Inducer
lac operator 2 x 1013 M-1 2 x 1010 M-1
All other DNA 2 x 106 M-1 2 x 106 M-1
Specificity 107 104

*Values for repressor:DNA binding are given as association constants, KA, for the formation of DNA:repressor complex from DNA and repressor.
Specificity is defined as the ratio (KA for repressor binding to operator DNA)/(KA for repressor binding to random DNA).


Figure 31.19 · The mechanism of catabolite repression and CAP action. Glucose instigates catabolite repression by lowering cAMP levels. cAMP is necessary for CAP binding near promoters of operons whose gene products are involved in the metabolism of alternative energy sources such as lactose, galactose, and arabinose. The binding sites for the CAP-(cAMP)2 complex are consensus DNA sequences containing the conserved pentamer TGTGA and a less well conserved inverted repeat, TCANA (where N is any nucleotide).

        Catabolite repression is mediated by cAMP levels, which in turn are regulated by glucose. Transport of glucose into the cell is accompanied by deactivation of E. coli adenylyl cyclase, leading to lower cAMP levels. The action of CAP as a positive regulator is cAMP-dependent. cAMP binding to CAP enhances its DNA-binding affinity. CAP, also referred to as CRP (for cAMP receptor protein), is a dimer of identical 210-residue (22.5-kD) polypeptides. The N-terminal domains bind cAMP; the C-terminal domains constitute the DNA-binding site. Two molecules of cAMP are bound per dimer. The CAP-(cAMP)2 complex binds to specific target sites near the promoters of operons (Figure 31.19). Its presence assists closed promoter complex formation by RNA polymerase holoenzyme. For example, CAP binding at the -72 to -52 region of lac DNA promotes formation of an RNA polymerase holoenzyme:plac DNA closed promoter complex. Analysis of the structure of the CAP:DNA complex reveals that the DNA is bent more than 90° about the center of dyad symmetry (Figure 31.20). This bend may be related to the ability of CAP to assist in transcription initiation.

Figure 31.20 · Binding of CAP-(cAMP)2 induces a severe bend in DNA about the center of dyad symmetry at the CAP-binding site. The CAP dimer with two molecules of cAMP bound interacts with 27 to 30 base pairs of duplex DNA. The cAMP-binding domain of CAP protein is shown in blue and the DNA-binding domain in purple. The two cAMP molecules bound by the CAP dimer are indicated in red. For DNA, the bases are shown in white and the sugar-phosphate backbone in yellow. DNA phosphates that interact with CAP are highlighted in red. Binding of CAP-(cAMP)2 to its specific DNA site involves H bonding and ionic interactions between protein functional groups and DNA phosphates, as well as H-bonding interactions in the DNA major groove between amino acid side chains of CAP and DNA base pairs. (Adapted from Schultz, S. C., Shields, G. C., and Steitz, T. A., 1991. Crystal structure of a CAP-DNA complex: The DNA is bent by 90°. Science 253:1001-1007. Photograph courtesy of Professor Thomas A. Steitz of Yale University)

 

Positive Versus Negative Control

Figure 31.21 · Control circuits governing the expression of genes. These circuits can be either negative or positive, inducible or repressible.

Negative- and positive-control systems are fundamentally different. Genes under negative control are transcribed unless they are turned off by the presence of a repressor protein. Often, transcription activation is essentially anti-inhibition; that is, the reversal of negative control. In contrast, genes under positive control are expressed only if an active regulator protein is present. The lac operon illustrates these differences. The action of lac repressor is negative. It binds to operator DNA and blocks transcription; expression of the operon only occurs when this negative control is lifted through release of the repressor. In contrast, regulation of the lac operon by CAP is positive: Transcription of the operon by RNA polymerase is stimulated by CAP’s action as a positive regulator.
        Operons can also be classified as inducible or repressible, or both, depending on how they respond to the small molecules that mediate their expression. Repressible operons are expressed only in the absence of their co-repressors. Inducible operons are transcribed only in the presence of small-molecule co-inducers (Figure 31.21).

The araBAD Operon: Positive and Negative Control by AraC

Figure 31.22 · Regulation of the araBAD operon by the combined action of CAP and AraC protein.

E. coli can use the plant pentose L-arabinose as sole source of carbon and energy. Arabinose is metabolized via conversion to D-xylulose-5-P (a pentose phosphate pathway intermediate and transketolase substrate [Chapter 23]) by three enzymes encoded in the araBAD operon. Transcription of this operon is regu­lated by both catabolite repression and arabinose-mediated induction. CAP functions in catabolite repression; arabinose induction is achieved via the product of the araC gene, which lies next to the araBAD operon on the E. coli chromosome. The araC gene product, the protein AraC,5 is a 292-residue protein consisting of an N-terminal domain (residues 1-170) that binds arabinose and acts as a dimerization motif and a C-terminal (residues 178-292) DNA-binding domain. Regulation of araBAD by AraC is novel in that it acts both negatively and positively. The ara operon has three binding sites for AraC: araO1, located at nucleotides -106 to -144 relative to the araBAD transcription start site; araO2 (spanning positions -265 to -294); and araI, the araBAD promoter. The araI site consists of two “half-sites”; araI1 (nucleotides -56 to -78) and araI2 (-35 to -51). (The araO1 site contributes minimally to ara operon regulation.)
        The details of araBAD regulation are as follows: When AraC protein levels are low, the araC gene is transcribed from its promoter pc (adjacent to araO1) by RNA polymerase (Figure 31.22). araC is transcribed in the direction away from araBAD. When cAMP levels are low and arabinose is absent, an AraC protein dimer binds to two sites, araO2 and the araI1 half-site, forming a DNA loop between them and restricting transcription of araBAD (Figure 31.22). In the presence of L-arabinose, the monomer of AraC bound to the araO2 site is released from that site; it then associates with the unoccupied araI half-site, araI2. L-Arabinose thus behaves as an allosteric effector that alters the conformation of AraC. In the arabinose-liganded conformation, the AraC dimer interacts with CAP-(cAMP)2 to activate transcription by RNA polymerase. Thus, AraC protein is both a repressor and an activator.
 Figure 31.23 · Introduction or removal of half a helical turn in the DNA between araO2 and araI prevents AraC protein from interacting with both sites and achieving araBAD repression. (Adapted from Schleif, R., 1987. The L-arabinose operon. In Escherichia coli and Salmonella typhimurium, vol. 2. Edited by Neidhardt, F. C., et al. Washington, DC: American Society for Microbiology.)

 

       Deletion studies reveal that both araO2 and araI must be present on the chromosome in order for AraC protein to repress araBAD. The DNA loop created when AraC binds both araO2 and araI1 consists of some 210 bp. If 5 bp of DNA (one-half a helical turn) are added or deleted in this intervening region, the two AraC-binding sites are rotated away from each other, so that interaction of AraC with both sites is not possible, and repression is no longer observed (Figure 31.23). The creation of DNA loops by sequence-specific, DNA-binding proteins is a mechanism common to many regulatory phenomena involving DNA. (DNA looping is considered in greater detail later in this chapter.)
        Positive control of the araBAD operon occurs in the presence of L-arabinose and cAMP. Arabinose binding by AraC protein causes the release of araO2, opening of the DNA loop, and association of AraC with araI2. CAP-(cAMP)2 binds at a site between araO1 and araI, and together the AraC-(arabinose)2 and CAP-(cAMP)2 complexes influence RNA polymerase, through protein:protein interactions, to create an active transcription initiation complex. Supercoiling-induced DNA looping may promote protein:protein interactions between DNA-binding proteins by bringing them into juxtaposition.

The trp Operon: Attenuation as a Mechanism to Regulate Gene Expression

Figure 31.24 · The trp operon of E. coli.

 

The trp operon of E. coli (and S. typhimurium) encodes a leader peptide sequence (trpL) and five polypeptides, trpE through trpA (Figure 31.24). The five polypeptides comprise three enzymes that catalyze the formation of tryptophan from chorismate (Chapter 26). Expression of the trp operon is under the control of Trp repressor, a dimer of 108-residue polypeptide chains. When tryptophan is plentiful, Trp repressor binds two molecules of tryptophan and associates with the trp operator that is located within the trp promoter. Trp repressor binding excludes RNA polymerase from the promoter, preventing transcription of the trp operon. When Trp becomes limiting, repression is lifted because Trp repressor lacking bound Trp (Trp apo-repressor) has a lowered affinity for the trp promoter. Thus, the behavior of Trp repressor corresponds to a co-repressor-mediated, negative control circuit (Figure 31.21). The Trp repressor regulates two other operons: trpR and aroH (Figure 31.25). Trp repressor is itself encoded by the trpR operon, and its regulation of this operon serves as an example of autogenous regulation (autoregulation), which is regulation of gene expression by the product of the gene. The aroH operon encodes the Trp-sensitive DAHP synthase isozyme of aromatic amino acid biosynthesis (Chapter 26).

Figure 31.25 · The three operators recognized by Trp repressor.

Attenuation

Figure 31.26 · Amino acid sequences of leader peptides in various amino acid biosynthetic operons regulated by attenuation. Color indicates amino acids synthesized in the pathway catalyzed by the operon’s gene products. (The ilv operon encodes enzymes of isoleucine, leucine, and valine biosynthesis.)

In addition to repression, the trp operon is controlled by transcription attenuation. Unlike the mechanisms discussed thus far, attenuation regulates transcription after it has begun. Charles Yanofsky, the discoverer of this phenomenon, has defined attenuation as any regulatory mechanism that manipulates transcription termination or transcription pausing to regulate gene transcription downstream. In prokaryotes, transcription and translation (Chapters 11 and 33) are coupled, and the translating ribosome is affected by the formation and persistence of pause and termination structures in the mRNA. Attenuation occurs under normal conditions but is blocked when levels of specific charged tRNAs (aminoacyl-tRNAs) are lowered on account of amino acid limitation. In many operons encoding enzymes of amino acid biosynthesis, a transcribed 150- to 300-bp leader region is positioned between the promoter and the first major structural gene. These regions encode a short leader peptide containing multiple codons

Figure 31.27 · Alternative secondary structures for the leader region (trpL mRNA) of the trp operon transcript.

for the pertinent amino acid. For example, the leader peptide of the leu operon has four Leu codons, the trp operon has two tandem Trp codons, and so forth (Figure 31.26). Translation of these codons depends on an adequate supply of the relevant aminoacyl-tRNA, which in turn rests on the availability of the amino acid. When Trp is scarce, the entire trp operon from trpL to trpA is transcribed to give a polycistronic mRNA. But, as [Trp] increases, more and more of the trp transcripts consist of only a 140-nucleotide fragment corresponding to the 5'-end of trpL. Trp availability is causing premature termination of trp transcription, that is, transcription attenuation. The secondary structure of the 160-bp leader region transcript is the principal control element in transcription attenuation (Figure 31.27). This RNA segment includes the coding region for the 14-residue leader peptide. Three critical base-paired hairpins can form in this RNA: the 1:2 pause structure, the 3:4 terminator, and the 2:3 antiterminator. Obviously, the 1:2 pause, 3:4 terminator, and the 2:3 antiterminator represent mutually exclusive alternatives. A significant feature of this coding region is the tandem UGG Trp codons.
Figure 31.28 · The mechanism of attenuation in the trp operon.

        Transcription by RNA polymerase begins and progresses until position 92 is reached, whereupon the 1:2 hairpin is formed, causing RNA polymerase to pause in its elongation cycle. While RNA polymerase is paused, a ribosome begins to translate the leader region of the transcript. Translation by the ribosome releases the paused RNA polymerase and transcription continues, with RNA polymerase and the ribosome moving in unison. As long as Trp is plentiful enough that Trp-tRNATrp is not limiting, the ribosome is not delayed at the two Trp codons and follows closely behind RNA polymerase, translating the message soon after it is transcribed. The presence of the ribosome atop segment 2 blocks formation of the 2:3 antiterminator hairpin, allowing the alternative 3:4 terminator hairpin to form (Figure 31.28). Stable hairpin structures followed by a run of Us are features typical of rho-independent transcription termination signals, so the RNA polymerase perceives this hairpin as a transcription stop signal and transcription is terminated at this point. On the other hand, a paucity of Trp and hence Trp-tRNATrp causes the ribosome to stall on segment 1. This leaves segment 2 free to pair with segment 3 and to form the 2:3 antiterminator hairpin in the transcript. Because this hairpin precludes formation of the 3:4 terminator, termination is prevented and the entire operon is transcribed. Thus, transcription attenuation is determined by the availability of charged tRNATrp and its transitory influence over the formation of alternative secondary structures in the mRNA.

Transcription Is Regulated by a Diversity of Mechanisms

A surprising variety of control mechanisms operate in transcriptional regulation, as we have just seen. Several organizing principles materialize. First, DNA:protein interactions are a central feature in transcriptional control, and the DNA sites where regulatory proteins bind commonly display at least partial dyad symmetry or inverted repeats. Further, DNA-binding proteins themselves are generally even-numbered oligomers (for example, dimers, tetramers) that have an innate twofold rotational symmetry. Second, protein:protein interactions are an essential component of transcriptional activation. We see this latter feature in the activation of RNA polymerase by CAP-(cAMP)2 or AraC-(arabinose)2, to select just two examples. Third, the regulator proteins receive cues that signal the status of the environment (for example, Trp, lactose, cAMP) and act to communicate this information to the genome, typically via the medium of conformational changes and DNA:protein interactions.

Transcriptional Activators Work Through Protein:Protein Contacts with RNA Polymerase

Although transcriptional control is governed by a variety of mechanisms, an underlying principle of transcriptional activation has emerged. Transcriptional activation can take place when a transcriptional activator protein (such as CAP-(cAMP)2 or AraC-(arabinose)2) bound to DNA makes protein:protein contacts with RNA polymerase, and the degree of transcriptional activation is proportional to the strength of the protein:protein interaction. Generally speaking, a nucleotide sequence that provides a binding site for a DNA-binding protein can serve as an activator site if the DNA-binding protein bound there can interact with promoter-bound RNA polymerase. DNA-binding proteins that activate transcription thus have a DNA-binding domain and an activation domain capable of interacting with RNA polymerase. Such activation domains activate transcription through protein:protein interactions with either the a, b, b', or s subunits of RNA polymerase. Further, if the DNA-bound transcriptional activator makes contacts with two different components of RNA polymerase, a synergistic effect takes place such that transcription is markedly elevated. Thus, transcriptional activation at specific genes relies on the presence of one or more activator sites where one or more transcriptional activator proteins can bind and make contacts with RNA polymerase bound at the promoter of the gene. Indeed, transcriptional activators may facilitate the recruitment and binding of RNA polymerase to the promoter. This general principle applies to transcriptional activation in both prokaryotic and eukaryotic cells.

31.4 · Transcription Regulation in Eukaryotes

In eukaryotes, the situation is substantially more complicated. First, the DNA is organized into chromatin, which represses transcription by severely limiting the access of transcriptional regulatory proteins to promoters. Thus, eukaryotic transcription requires factors that can reorganize the chromatin so that the transcriptional machinery can gain access to promoters. One such factor is the yeast Swi/Snf complex, which may occur as a subcomponent of the mediator complex of RNA polymerase II holoenzyme. Swi/Snf is a highly conserved complex containing about 10 proteins that becomes physically and functionally associated with the CTD of RNA pol II. Swi/Snf disrupts nucleosomal arrays in chromatin in an ATP-dependent manner, thereby facilitating the binding of TBP and activator proteins to the DNA template. Another aspect of chromatin remodeling involves reversible acetylation of Lys e-NH3+-groups in nucleosomal histones.
        Acetylation of these amino groups by histone acetyltransferases (HATs) diminishes the electrostatic charges on histones, reducing the affinity of the histone for the negatively charged sugar-phosphate backbone of DNA. The process is reversed by histone deacetylases (HDACs), which remove the N-acetyl groups. Although the overall effects are not straightforward, generally speaking, histone acetylation favors gene expression.
        Not only metabolic activity and cell division but complex patterns of embryonic development and cell differentiation must be coordinated through transcriptional regulation. All this coordinated regulation takes place in cells where the relative quantity (and diversity) of DNA is very great: A typical mammalian cell has 1500 times as much DNA as an E. coli cell. Eukaryotic genes have promoters and other regulatory elements analogous to those found in prokaryotic genes, but the structural genes of eukaryotes are rarely organized in clusters akin to operons. Each eukaryotic gene typically possesses a discrete set of regulatory sequences appropriate to the requirements for its expression. Certain of these sequences provide sites of interaction for general transcription factors, whereas others endow the gene with great specificity in expression by providing targets for specific transcription factors. Further, mRNA stability plays a greater role in eukaryotic gene expression; unlike prokaryotic mRNAs, eukaryotic mRNAs show a wide range in relative half-lives. The longer-lived an mRNA is, the greater the potential for its genetic information to be persistently expressed.

Eukaryotic Promoters, Enhancers, and Response Elements

Figure 31.29 · Promoter regions of several representative eukaryotic genes. (a) The SV40 early genes, the histone H2B gene, and the thymidine kinase gene. Note that these promoters contain different combinations of the various modules. In (b), the function of the modules within the thymidine kinase gene is shown.

 

Promoters

The promoters of eukaryotic genes encoding proteins are defined by modules of short conserved sequences, such as the TATA box, the CAAT box, and the GC box. The presence of a CAAT box, usually located around -80 relative to the transcription start site, signifies a strong promoter. One or more copies of the sequence GGGCGG or its complement (referred to as the GC box) have been found upstream from the transcription start sites of so-called “housekeeping genes.” Housekeeping genes encode proteins commonly present in all cells and essential to normal function; such genes are typically transcribed at more or less steady levels. Sets of the various sequence modules are embedded in the upstream region of such genes and collectively define the promoter. Figure 31.29 depicts the promoter regions of several representative eukaryotic genes. Table 31.4 lists transcription factors that bind to respective modules. These transcription factors typically behave as positive regulatory proteins essential to transcriptional activation by RNA polymerase II at these promoters.

Table 31.4
A Selection of Consensus Sequences That Define Various RNA Polymerase II
Promoter Modules and the Transcription Factors That Bind to Them
Sequence
Module
Consensus
Sequence
DNA
Bound

Factor
Size
(kD)
Abundance
(molecules/cell)
TATA box TATAAAA ~10 bp TBP 27 ?
CAAT box GGCCAATCT ~22 bp CTF/NF1 60 300,000
GC box GGGCGG ~20 bp SP1 105 60,000
Octamer ATTTGCAT ~20 bp Oct-1 76 ?
" "  23 bp Oct-2 52 ?
kB GGGACTTTCC ~10 bp NFkB 44 ?
" ~10 bp H2-TF1 ? ?
ATF GTGACGT ~20 bp ATF ? ?
Source: Adapted from Lewin, B., 1994. Genes V. Cambridge, MA: Cell Press.

 

Enhancers

Figure 31.30 · Enhancers are sequence elements located at varying positions and orientation relative to the promoter that act to enhance transcription initiation. Transcription factors (proteins) bind to enhancers and stimulate RNA polymerase II binding at a nearby promoter.

In addition to these promoter elements, eukaryotic genes are characterized by additional regulatory sequences known as enhancers. Enhancers (also called upstream activation sequences, or UASs) assist initiation. Enhancers differ from promoters in two fundamental ways. First, the location of enhancers relative to the transcription start site is not fixed. Enhancers may be several thousand nucleotides away from the promoter, and they act to enhance transcription initiation even if positioned downstream from the gene. Second, enhancer sequences are bidirectional in that they function in either orientation. That is, enhancers can be removed and then reinserted in the reverse sequence orientation without any diminution in their function. Like promoters, enhancers represent modules of consensus sequence. Enhancers are promiscuous, be­cause they stimulate transcription from any promoter that happens to be in their vicinity. Nevertheless, enhancer function is dependent on recognition by a specific transcription factor. A specific transcription factor bound at an enhancer element interacts with RNA pol II at a nearby promoter via a looping mechanism (Figure 31.30).

Response Elements

Promoter modules in genes responsive to common regulation are termed response elements. Examples include the heat shock element (HSE), the glucocorticoid response element (GRE), and the metal response element (MRE). These various elements are found in the promoter regions of genes whose transcription is activated in response to a sudden increase in temperature (heat shock), glucocorticoid hormones, or toxic heavy metals, respectively (Table 31.5). HSE sequences are recognized by a specific transcription factor, HSTF (for heat shock transcription factor). HSEs are located about 15 bp upstream from the transcription start site of a variety of genes whose expression is dramatically enhanced in response to elevated temperature. Similarly, the response to steroid hormones depends on the presence of a GRE positioned 250 bp upstream of the transcription start point. Binding of a specific transcription factor, the steroid receptor, at a GRE occurs when certain steroids bind to the steroid receptor.

Table 31.5
Response Elements That Identify Genes Coordinately Regulated in Response to Particular Physiological Challenges
Physiological
Challenge
Response
Element
Consensus
Sequence
DNA
Bound

Factor
Size
(kD)
Heat shock HSE CNNGAANNTCCNNG  27 bp HSTF 93
Glucocorticoid GRE TGGTACAAATGTTCT 20 bp Receptor 94
Cadmium MRE CGNCCCGGNCNC ? ? ?
Phorbol ester  TRE TGACTCA 22 bp AP1 39
Serum SRE CCATATTAGG 20 bp SRF 52
Source: Adapted from Lewin, B., 1994. Genes V. Cambridge, MA: Cell Press.

 

Figure 31.31 · The metallothionein gene possesses several constitutive elements in its promoter (the TATA and GC boxes) as well as specific response elements such as MREs and a GRE. The BLEs are elements involved in basal level expression (constitutive expression). TRE is a tumor response element activated in the presence of tumor-promoting phorbol esters such as TPA (tetradecanoyl phorbol acetate).

        Many genes are subject to a multiplicity of regulatory influences. Regulation of such genes is achieved through the presence of an array of different regulatory elements. The metallothionein gene is a good example (Figure 31.31). Metallothionein is a metal-binding protein that protects cells against metal toxicity by binding excess amounts of heavy metals and removing them from the cell. This protein is always present at low levels, but its concentration increases in response to heavy metal ions such as cadmium or in response to glucocorticoid hormones. The metallothionein gene promoter consists of two general promoter elements, namely, a TATA box and a GC box; two basal-level enhancers; four MREs; and one GRE. These elements function independently of one another; any one is able to activate transcription of the gene.

DNA Looping

Figure 31.32 · Enhancer:promoter interaction via a protein-mediated DNA loop. Formation of a DNA loop delivers the enhancer-binding specific transcription factor to RNA polymerase II positioned at the promoter. Protein:protein interactions between the transcription factor and RNA pol II activate transcription.

Because transcription must respond to a variety of regulatory signals, multiple proteins are essential for appropriate regulation of gene expression. These regulatory proteins are the sensors of cellular circumstances, and they communicate this information to the genome by binding at specific nucleotide sequences. However, DNA is virtually a one-dimensional polymer, and there is little space for a lot of proteins to bind at (or even near) a transcription initiation site. DNA looping permits additional proteins to convene at the initiation site and to exert their influence on creating and activating an RNA pol II initiation complex (Figure 31.32). The repertoire of transcriptional regulation is greatly expanded by DNA looping. Further, DNA looping is greatly influenced by negative supercoiling.

31.5 · Structural Motifs in DNA-Binding Regulatory Proteins

Proteins that recognize nucleic acids do so by the basic rule of macromolecular recognition. That is, such proteins present a three-dimensional shape or contour that is structurally and chemically complementary to the surface of a DNA sequence. When the two molecules come into close contact, the numerous atomic interactions that underlie recognition and binding can take place between the two. Nucleotide sequence-specific recognition by the protein involves a set of atomic contacts with the bases and the sugar-phosphate backbone. Hydrogen bonding is critical for recognition, with amino acid side chains providing most of the critical contacts with DNA. Protein contacts with the bases of DNA usually occur within the major groove; protein contacts with the DNA backbone involve both H bonds and salt bridges with oxygen atoms of the phosphodiester linkages. Structural studies on regulatory proteins that bind to specific DNA sequences have revealed that roughly 80% of such proteins can be assigned to one of three principal classes based on their possession of three small, distinctive structural motifs: the helix-turn-helix