
“Monk
Transcribing Manuscript,” ca. 1470. Jean Mielot (author of Miracles de Notre
Dame) at his desk using a quill and scraping knife (Bibliotheque National de
Paris/Mary
Evans Picture Library/London)
Chapter 31
Transcription and the Regulation of Gene Expression
In
1958, Francis Crick enunciated the “central dogma of molecular biology” (Figure
31.1). This scheme outlined the residue-by-residue
transfer of biological information as encoded in the primary structure of the
informational biopolymers, nucleic acids and proteins. The predominant path
of information transfer, DNA ® RNA ®
protein, postulated that RNA was an information carrier between DNA and proteins,
the agents of biological function. In 1961, François Jacob and Jacques Monod
extended this hypothesis to predict that the RNA intermediate, which they dubbed
messenger RNA, or mRNA, would have the following properties:
Figure 31.1 · Crick’s 1958 view of the “central dogma of molecular biology”: directional flow of detailed sequence information includes DNAnDNA (replication), DNAnRNA (transcription), RNAnprotein (translation), RNAnDNA (reverse transcription). Note that no pathway exists for the flow of information from proteins to nucleic acids, that is, proteinnRNA or DNA. A possible path from DNA to protein has since been discounted. Interestingly, in 1958, mRNA had not yet been discovered.
1. Its base composition would reflect the base composition of DNA (a property consistent with genes as protein-encoding units).
2. It would be very heterogeneous with respect to molecular mass, yet the average molecular mass would be several hundred kD. (A 200-kD RNA contains roughly 750 nucleotides, which could encode a protein of about 250 amino acids—approximately 30 kD—a reasonable estimate for the average size of polypeptides.)
3. It would be able to associate with ribosomes because ribosomes are the site of protein synthesis.
4. It would have a high rate of turnover. (That is, mRNA would be rapidly degraded. Turnover of mRNA would allow the rate of mRNA synthesis to control the rate of protein synthesis.)
Since Jacob and
Monod’s 1961 hypothesis, it has been realized that cells contain three major
classes of RNA—mRNA, ribosomal RNA (rRNA), and transfer RNA (tRNA)—all of which
participate in protein synthesis (Chapter
11). All of these RNAs are synthesized from DNA templates by DNA-dependent
RNA polymerases in the process known as transcription. However, only
mRNAs direct the synthesis of proteins. Thus, not all genes encode proteins;
some encode rRNAs or tRNAs. Protein synthesis occurs via the process of translation,
wherein the instructions encoded in the sequence of bases in mRNA are translated
into a specific amino acid sequence by ribosomes, the “workbenches” of polypeptide
synthesis (Chapter 33).
Transcription
is tightly regulated in all cells. In prokaryotes, only about 3% of the genes
are undergoing transcription at any given time. The metabolic conditions and
the growth status of the cell dictate which gene products are needed at any
moment. In a differentiated eukaryotic cell, the figure is around 0.01%. Such
differentiated cells express only the information needed for their biological
functions, not the full genetic potential encoded in their chromosomes.
31.1 · Transcription in Prokaryotes
In prokaryotes, virtually all RNA is synthesized by a single species of DNA-dependent RNA polymerase. (The only exception is the short RNA primers formed by primase during DNA replication.) Like DNA polymerases, RNA polymerase links ribonucleoside 5'-triphosphates (ATP, GTP, CTP, and UTP, represented generically as NTPs) in an order specified by base pairing with a DNA template:
n NTP ® (NMP)n + n PPiThe enzyme moves along a DNA strand in the 3' ® 5' direction, joining the 5'-phosphate of an incoming ribonucleotide to the 3'-OH of the previous residue. Thus, the RNA chain grows 5' ® 3' during transcription, just as DNA chains do during replication. The reaction is driven by subsequent hydrolysis of PPi to inorganic phosphate by ubiquitous pyrophosphatase activity.
The Structure and Function of Escherichia coli RNA Polymerase
The RNA polymerase of E. coli , so-called RNA polymerase holoenzyme, is a complex multimeric protein (450 kD) large enough to be visible in the electron microscope. Its subunit composition is a2bb's. The largest subunit, b' (155 kD), functions in DNA binding; b (151 kD) binds the nucleoside triphosphate substrates and interacts with s (70 kD). Any of a number of related proteins, the sigma (s) factors, can serve as the s subunit. Sigma subunits function in recognizing specific sequences on DNA called promoters that identify the location of transcription start sites, where transcription begins. Both b and b' contribute to formation of the catalytic site. The two a subunits (36.5 kD each) are essential for assembly of the enzyme and activation by some regulatory proteins. Dissociation of the s subunit from the holoenzyme leaves the so-called core polymerase (a2bb'), which is catalytically competent but unable to recognize promoters.
The Steps of Transcription in Prokaryotes
Transcription can be divided into four stages: (a) binding of RNA polymerase holoenzyme at promoter sites, (b) initiation of polymerization, (c) chain elongation, and (d) chain termination. A discussion of these stages follows.
Figure
31.2
· Sequence
of events in the initiation and elongation phases of transcription as it occurs
in prokaryotes. Nucleotides in this region are numbered with reference to the
base at the transcription start site, which is designated +1.
Binding of RNA Polymerase to Template DNA
The process of transcription
begins when the s subunit of RNA polymerase recognizes a promoter sequence
(Figure 31.2), and RNApolymerase holoenzyme and the promoter form a so-called
closed promoter complex (Figure 31.2, Step 2). Dissociation constants
for RNA polymerase holoenzyme:closed promoter complexes range from 1026
to 1029 M. This stage in RNA polymerase:DNA interaction is referred
to as the closed promoter complex because the DNA strands must be unwound so
that the RNA polymerase can read and transcribe the DNA template strand into
a complementary RNA sequence.
Once the
closed promoter complex is established, the RNA polymerase holoenzyme unwinds
about 14 base pairs of DNA (base pairs located at positions -10 to +2, relative
to the transcription start site—see later), forming the very stable open
promoter complex (Figure 31.2, Step 3). In this complex, RNA polymerase
holoenzyme is bound very tightly to the DNA (KD »
10-14 M).
Promoter
sequences can be identified in vitro by DNA footprinting: RNA polymerase
holoenzyme is bound to a putative promoter sequence in a DNA duplex and the
DNA:protein complex is treated with DNase I. DNase I cleaves the DNA at sites
not protected by bound protein, and the set of DNA fragments left after DNase
I digestion reveals the promoter (by definition, the promoter is the RNA polymerase
holoenzyme binding site1).
1Promoters can also be defined genetically in terms of mutations (nucleotide changes) in this region that block gene expression because they inactivate the promoter.
RNA polymerase binding typically protects a nucleotide sequence spanning the region from -40 to +20, where the +1 position is defined as the transcription start site: that base in DNA that specifies the first base in the RNA transcript. The next base, +2, specifies the second base in the transcript. Bases in the 5' or “minus” direction from the transcript start site are numbered -1, -2, and so on. (Note that there is no zero.) Nucleotides in the “minus” direction are said to lie upstream of the transcription start site, whereas nucleotides in the 3' or “plus” direction are downstream of the transcription start site. The transcript start site on the template strand is almost always a pyrimidine, so almost all transcripts begin with a purine.
PROPERTIES OF PROKARYOTIC
PROMOTERS. Prokaryotic promoters vary in size from 20 to 200
bp, but typically consist of a 40-bp region located on the 5'-side of the transcription
start site. Within the promoter are two consensussequence
elements.
(A consensus sequence can be defined as the bases that appear with highest
frequency at each position when a series of sequences believed to have common
function are compared.) These two elements are the Pribnow box2
near -10, whose consensus sequence is the hexameric TATAAT, and a sequence
in the 235 region containing the hexameric consensus TTGACA (Figure 31.3). The
Pribnow box and the -35 region are separated by about 17 bp of nonconserved
sequence. RNA polymerase holoenzyme uses its s subunit to bind to these
sequences, and the more closely the -35 region sequence corresponds to its consensus
sequence, the greater is the efficiency of transcription of the gene. The highly
expressed rrn genes in E. coli which encode ribosomal RNA (rRNA) have
a third sequence element in their promoters, the upstream element (UP
element), located about 20 bp immediately upstream of the -35 region. (Transcription
from the rrn genes accounts for more than 60% of total RNA synthesis in rapidly
growing E. coli cells.) Whereas the s subunit recognizes the -10
and -35 elements, the C-terminal domains (CTD) of the a subunits of RNA polymerase
recognize and bind the UP element.
Figure 31.3 · The nucleotide sequences of representative E. coli promoters. (In accordance with convention, these sequences are those of the nontemplate strand where RNA polymerase binds.) Consensus sequences for the -35 region, the Pribnow box, and the initiation site are shown at the bottom. The numbers represent the percent occurrence of the indicated base. (Note: the -35 region is only roughly 35 nucleotides from the transcription start site; the Pribnow box [the -10 region] likewise is located at approximately position -10.) In this figure, sequences are aligned relative to the Pribnow box.
In order for transcription
to begin, the DNA duplex must be “opened” so that RNA polymerase has access
to single-stranded template. The efficiency of initiation is inversely proportional
to the melting temperature, Tm, in the Pribnow box, suggesting that
the A:T-rich nature of this region is aptly suited for facile “melting” of the
DNA duplex and creation of the open promoter complex (Figure 31.2).
Negative supercoiling facilitates transcription initiation by favoring DNA unwinding.
The RNA polymerase
s subunit is directly involved in melting the dsDNA.
Interaction of the s subunit with the nontemplate
strand maintains the open complex formed between RNA polymerase and promoter
DNA, with the s subunit acting as a sequence-specific
single-stranded DNA-binding protein. Such association of the s
subunit with the nontemplate strand stabilizes the open promoter complex and
leaves the bases along the template strand available to the catalytic site of
the RNA polymerase.
Initiation of Polymerization
RNA polymerase has two
binding sites for NTPs—the initiation site and the elongation site. The initiation
site binds the purine nucleotides ATP and GTP preferentially; most RNAs
begin with a purine at the 5'-end. The first nucleotide binds at the initiation
site, H-bonding with the +1 base exposed within the open promoter complex (Figure
31.2, Step 4). The second incoming nucleotide binds at the elongation site,
H-bonding with the +2 base. The ribonucleotides are then united when the 3'-O
of the first nucleotide makes a nucleophilic attack on the a-phosphorus
atom of the second nucleotide. A phosphoester bond is formed, and PPi
is eliminated. Note that the 5'-end of the transcript starts out with a triphosphate
attached to it. Movement of RNA polymerase along the template strand (translocation)
to the next base prepares the RNA polymerase to add the next nucleotide (Figure
31.2, Step 5). Once an oligonucleotide 6 to 10 residues long has been formed,
the s subunit dissociates from RNA polymerase, signaling the completion
of initiation (Figure 31.2, Step 6). The core RNA polymerase goes on to synthesize
the remainder of the mRNA. As the core RNA polymerase progresses, advancing
the 3'-end of the RNA chain, the DNA duplex is unwound just ahead of it. About
12 base pairs of the growing RNA remain base-paired to the DNA template at any
time, with the RNA strand becoming displaced as the DNA duplex rewinds behind
the advancing RNA polymerase.
Named
for David Pribnow, who, along with David Hogness, first recognized the importance
of this sequence element in transcription.
Figure 31.4 · The structures of rifamycin B and rifampicin, specific inhibitors of prokaryotic RNA polymerases. Because these compounds do not inhibit eukaryotic RNA polymerases, they have proven useful in the treatment of tuberculosis and infections caused by Gram-positive bacteria.
Rifamycin B and its analog, rifampicin, are inhibitors of initiation. Despite their structural similarity (Figure 31.4), they act in different ways. Rifamycin binds to the b subunit of RNA polymerase and blocks binding of incoming NTP at the initiation site. Rifampicin allows the first phosphodiester bond to be formed, but it prevents the translocation of RNA polymerase along the DNA template. However, once the second phosphodiester bond is formed, creating an RNA trinucleotide, rifampicin is without effect.
Chain Elongation
Figure
31.5
· Cordycepin
is the name given 3'-deoxyadenosine.
Elongation of the
RNA transcript is catalyzed by the core polymerase, because once a short oligonucleotide
chain has been synthesized, the s subunit dissociates. Cordycepin (Figure
31.5) is an inhibitor of chain elongation in prokaryotes. This nucleoside can
be phosphorylated in vivo to give 3'-deoxyadenosine 5'-triphosphate, which can
bind to the core polymerase and add to the growing RNA. However, because cordycepin
lacks a 3'-OH, it aborts further elongation. The accuracy of transcription is
such that, about once every 104 nucleotides, an error is made and
the wrong base is inserted. Because many transcripts are made per gene, this
error rate is acceptable. Also, the nature of the genetic code is such that
errors are often innocuous (Chapter
32).
Figure
31.6
·
Supercoiling versus transcription. (a) If the RNA polymerase followed the
template strand around the axis of the DNA duplex, no supercoiling of the DNA
would occur, but the RNA chain would be wrapped around the double helix once
every 10 bp. This possibility seems unlikely because it would be difficult to
disentangle the transcript from the DNA duplex. (b) Alternatively, topoisomerases
could remove the supercoils. A topo-isomerase capable of relaxing positive supercoils
situated ahead of the advancing transcription bubble would “relax” the DNA.
A second topo-isomerase behind the bubble would remove the negative supercoils.
(Adapted from Futcher, B., 1988. Supercoiling and transcription, or vice
versa? Trends in Genetics 4:271-272)
Chain elongation does not proceed at a constant rate, but varies between 20 to 50 nucleotides per second. The RNA polymerase slows down and even pauses in G:C-rich regions due to the greater difficulty in unwinding G:C base pairs. As the RNA polymerase moves along the template, the DNA double helix is unwound ahead of it and recloses after the polymerase has passed by. Only a short stretch of RNA:DNA hybrid duplex exists at any time. Two possibilities can be envisioned for the course of the new RNA chain. In one, the RNA chain is wrapped around the DNA as the RNA polymerase follows the template strand around the axis of the DNA duplex, but this possibility seems unlikely due to its potential for tangling the nucleic acid strands (Figure 31.6a). The more likely possibility involves supercoiling of the DNA, so that positive supercoils are created ahead of the transcription bubble and negative supercoils are created behind it (Figure 31.6b). To prevent torsional stress from inhibiting transcription, topoisomerases act to remove these supercoils from the DNA segment undergoing transcription (Figure 31.6b).
Chain TerminationTwo types of transcription termination mechanisms operate in bacteria: one that is dependent on a specific protein termination factor called r (pronounced “rho”) and another that is not dependent on this protein. In the latter, termination of transcription is determined by specific sequences in the DNA called termination sites. These sites are not characterized by a unique base where transcription halts. Instead, these sites consist of three structural features whose base-pairing possibilities lead to termination:
Figure
31.7 · The termination site for the
E. coli trp operon (the trp operon encodes the enzymes
of tryptophan biosynthesis). The inverted repeats give rise to a stem-loop or
“hairpin” structure ending in a series of U residues.
1. Inverted repeats, which are typically G:C-rich, so a stable stem-loop structure can form in the transcript via intrachain hydrogen bonding (Figure 31.7).
2. A nonrepeating segment that punctuates the inverted repeats.
3. A run of 6 to 8 As in the DNA template, coding for Us in the transcript.
Figure
31.8 · The r
factor mechanism of transcription termination. r
factor (a) attaches to a recognition site on mRNA and (b) moves along it behind
RNA polymerase. (c) When RNA polymerase pauses at the termination site, r
factor unwinds the DNA:RNA hybrid in the transcription bubble, (d) releasing
the nascent mRNA.
Termination then occurs
as follows: A G:C-rich, stem-loop structure, or “hair-pin,” forms in the transcript.
The hairpin apparently causes the RNA polymerase to pause, whereupon the A:U
base pairs between the transcript and the DNA template strand are displaced
through formation of somewhat more stable A:T base pairs between the template
and nontemplate strands of the DNA. The result is spontaneous dissociation of
the nascent transcript from DNA.
The alternative mechanism of termination, factor-dependent termination,
is less common and mechanistically more complex. r
Factor is an
ATP-dependent helicase (hexamer of 50-kD subunits) that catalyzes the unwinding
of RNA:DNA hybrid duplexes (or RNA:RNA duplexes). The r
factor recognizes and binds to C-rich regions in the RNA transcript. These regions
must lack secondary structure and be unoccupied by translating ribosomes for
r
factor to bind. Once bound, r
factor advances in the 5' ® 3' direction until
it reaches the transcription bubble (Figure 31.8). There it catalyzes the unwinding
of the transcript and template, releasing the nascent RNA chain. It is likely
that the RNA polymerase stalls in a G:C-rich termination region, allowing r
factor to overtake it.
31.2 · Transcription in Eukaryotes
Eukaryotic cells have
three classes of RNA polymerase, each of which synthesizes a different class
of RNA. All three enzymes are found in the nucleus. RNA polymerase I
is localized to the nucleolus and transcribes the major ribosomal RNA genes.
RNA polymerase II transcribes protein-encoding genes, and thus it is
responsible for the synthesis of mRNA. RNA polymerase III transcribes
tRNA genes, the ribosomal RNA genes encoding 5S rRNA, and a variety of other
small RNAs, including several involved in mRNA processing and protein transport.
All three
RNA polymerase types are large, complex multimeric proteins (500 to 700 kD),
consisting of 10 or more types of subunits. Although the three differ in overall
subunit composition, they have several smaller subunits in common. Further,
all possess two large subunits (each 140 kD or greater) having sequence similarity
to the large b and b'
subunits of E. coli RNA polymerase, indicating that the fundamental catalytic
site of RNA polymerase is conserved among its various forms.
Figure
31.9 · The structure of a-amanitin,
one of a series of toxic compounds known as amatoxins that are found in the
mushroom Amanita phalloides.
In addition to
their different functions, the three classes of RNA polymerase can be distinguished
by their sensitivity to a-amanitin
(Figure 31.9), a bicyclic octapeptide produced by the poisonous mushroom Amanita
phalloides (the “destroying angel” mushroom). a-Amanitin
blocks RNA chain elongation. Although RNA polymerase I is resistant to this
compound, RNA polymerase II is very sensitive and RNA pol III is less sensitive.
The existence
of three classes of RNA polymerases acting on three distinct sets of genes implies
that at least three categories of promoters exist to maintain this specificity.
All three polymerases interact with their promoters via so-called transcription
factors, DNA-binding proteins that recognize and accurately initiate transcription
at specific promoter sequences. For RNA polymerase I, its templates are the
rRNA genes. Ribosomal RNA genes are present in multiple copies. Optimal expression
of these genes requires the first 150 nucleotides in the immediate 5'-upstream
region, but the precise locations and sequences of the promoter(s) are not known
with certainty.
RNA polymerase
III interacts with transcription factors TFIIIA, TFIIIB, and TFIIIC.
Interestingly, TFIIIA and/or TFIIIC bind to specific recognition sequences that
in some instances are located within the coding regions of the genes,
not in the 5'-untranscribed region upstream from the transcription start site.
TFIIIB associates with TFIIIA or TFIIIC already bound to the DNA and in turn
facilitates the association of RNA pol III to establish an initiation complex.
As the enzyme responsible
for the regulated synthesis of mRNA, RNA polymerase II has aroused greater interest
than RNA pol I and pol III. RNA pol II must be capable of transcribing a great
diversity of genes, yet it must carry out its function at any moment only on
those genes whose products are appropriate to the needs of the cell in its everchanging
metabolism and growth. The RNA pol II from yeast (Saccharomyces cerevisiae)
has been extensively characterized. Yeast is viewed by molecular biologists
as an excellent eukaryotic prototype. The yeast RNA pol II consists of 10 different
polypeptides, designated RPB1 through RPB10, ranging in size from 220 to 10
kD (Table 31.1).3 RPB1 and RPB2
functions are homologous to those of the prokaryotic RNA polymerase b and b9
subunits: RPB1 has a DNA-binding site, RPB2 binds nucleotide substrates, and
both contribute to the catalytic site. RPB3 is the functional homolog of the
prokaryotic a; there are two RPB3 subunits per enzyme and RPB3 is essential
for assembly of the polymerase. RPB4 resembles s subunit in amino acid
sequence. RPB 3, 4, and 7 are unique to RNA pol II, whereas RPB 5, 6, 8, and
10 are common to all three eukaryotic RNA polymerases. RPB 4 and 7 readily dissociate
from RNA pol II.
The RPB1
subunit has an unusual structural feature not found in prokaryotes: Its C-terminal
domain (CTD) contains 27 repeats of the amino acid sequence PTSPSYS. (The
analogous subunit in RNA pol II enzymes of other eukaryotes has this heptapeptide
tandemly repeated as many as 52 times.) Note that the side chains of 5 of the
7 residues in this repeat have -OH groups, endowing the CTD with considerable
hydrophilicity and multiple sites for phosphorylation. This domain may
project more than 50 nm from the globular enzyme. The CTD is essential to RNA
pol II function. Only RNA pol II whose CTD is not phosphorylated can initiate
transcription. However, transcription elongation proceeds only after protein
phosphorylation within the CTD, suggesting that phosphorylation triggers the
conversion of an initiation complex into an elongation complex. Following termination
of transcription, a phosphatase recycles RNA pol II to its unphosphorylated
form.
3RPB stands for RNA polymerase B; RNA pol I, II, and III are sometimes called RNA pol A, B, and C.
Transcription Initiation by RNA Polymerase II
Figure 31.10 · The TATA box in selected eukaryotic genes. The consensus sequence of a number of such promoters is presented in the lower part of the figure, the numbers giving the percent occurrence of various bases at the positions indicated.
Promoters
RNA polymerase II promoters commonly consist of two separate sequence features, the core element, near the transcription start site, where general transcription factors bind, and more distantly located regulatory elements, known variously as enhancers or silencers. These latter elements are recognized by specific DNA-binding proteins that activate transcription above basal levels (enhancers) or repress transcription (silencers). The core region often consists of a TATA box (a TATAAA consensus element) and the transcription start site; the TATA motif is usually located at position -25 (Figure 31.10). An important role of the TATA box is to indicate the site of the initiator element, or Inr, where transcription is initiated. The initiator element Inr encompasses the transcription start site. The sequence of Inr is not highly conserved between genes; a consensus Inr for one gene family is -3YYCAYYYYY+6 (where Y represents any pyrimidine). Regulatory elements occurring near the core promoter (within 50 to 200 bp), the so-called promoter proximal elements, possess one or more binding sites for interaction with DNA-binding regulatory proteins and show great variation in sequence. Other regulatory elements, so-called distal enhancer (or silencer) elements, where another group of DNA-binding regulatory proteins bind, can be located far from the core promoter, either upstream or, rarely, downstream.
Initiation of Transcription in Eukaryotes
Figure
31.11 · Transcription initiation. (a)
Model of the yeast TATA-binding protein (TBP) in complex with a yeast DNA TATA
sequence. The sugar-phosphate backbone of the TATA box is shown in yellow; the
TATA base pairs are in red; adjacent DNA segments are in blue. The saddle-shaped
TBP (green) is unusual in that it binds in the minor groove of DNA, sitting
on the DNA like a saddle on a horse. TBP-binding pries open the minor groove,
creating a 100° bend in the DNA axis and unwinding the DNA within the TATA sequence.
The other components of the TFIID heteromer (Table 31.2) sit on TBP, like a
“cowboy on a saddle.” All known eukaryotic genes (those lacking a TATA box as
well as those transcribed by RNA polymerase I or III) rely on TBP. (Photo
courtesy of Paul B. Sigler of Yale University.) (b) Formation of a preinitiation
complex at a TATA-containing promoter. Binding of TFIID, the multisubunit protein
(>100 kD) consisting of the TATA-binding protein (TBP) and other polypeptides,
is stimulated by TFIIA. TFIID bound to the TATA motif recruits TFIIB, forming
a DB complex. In association with TFIIF, RNA pol IIA (the nonphosphorylated
form of RNA pol II) joins the DB complex to give the DBpol F complex. TFIIE
and TFIIH then associate to yield the preinitiation complex. Melting of the
DNA duplex around Inr generates the open complex and transcription
ensues. (Adapted from Weiss, L., and Reinberg, D., 1992. FASEB Journal
6:3300, Figure 1)
A universal set of proteins,
called the basal apparatus, binds the core promoter and initiates transcription.
The basal apparatus consists of RNA polymerase II and the general transcription
factors (GTFs). There are six GTFs (Table 31.2), five of which are
required for transcription: TFIIB, TFIID, TFIIE, TFIIF, and TFIIH.
One, TFIIA, stimulates transcription by stabilizing the interaction of
TFIID with the TATA box. TFIID consists of TBP (TATA-binding
protein), which directly recognizes the TATA box, and a set of TBP-associated
factors (TAFs or TAFIIs), which have positive
or negative effects on transcription; some are capable of recognizing core promoters
lacking a TATA box. TBP binds to the core promoter through contacts made with
the minor groove of the DNA, distorting and bending the DNA so that DNA sequences
upstream and downstream of the TATA box come into closer proximity (Figure 31.11a).
In one model of transcription initiation, once TBP binds to core promoter, TFIIB
joins it, followed by RNA polymerase IIA in association with TFIIF. Then other
factors join (Figure 31.11b), establishing a competent transcription preinitiation
complex. Another model for transcription initiation suggests that RNA polymerase
II holoenzyme (RNA polymerase IIA in association 
with
various general transcription factors other than TBP or TFIID) assembles in
the absence of any interaction with DNA and then binds to TBP/TFII. In either
case, once RNA polymerase IIA and the GTFs have assembled into a preinitiation
complex on DNA, an open complex then forms and transcription begins.
Figure 31.12a is an illustration of the preinitiation complex with the various
components drawn to scale: Figure 31.12b is a computer-modeled representation
of the TFIIA-TBP-TFIID-promoter complex.
Figure 31.12 · (a) Structure of the preinitiation complex, showing the distortion of the DNA and the relative positions of RNA polymerase IIA (pol II), the TATA box, TBP, TFIIB (B), and the TFIIE dimer (E). Transcription initiation occurs at a site on DNA within the region encircled by RNA polymerase IIA next to TFIIE. (Adapted from Kornberg, R. D., 1996. RNA polymerase II transcription control. Trends in Biochemical Sciences 21:325-326.) (b) Computer-generated model of the TFIIA-TBP-TFIIB-promoter complex. Note the strong lateral displacement of upstream and downstream DNA segments induced by the proteins. ([a] From Figure 2 of Roeder, R. G., 1996. The role of general initiation factors in transcription by RNA polymerase II. Trends in Biochemical Sciences 21:327-335 and [b] from Figure 3 in Patikoglou, G., and Burley, S. K., 1997. Eukaryotic transcription factor-DNA complexes. Annual Review of Biophysics and Biomolecular Structure 26:289-325. Figure [b] courtesy of Stephen K. Burley of Rockefeller University.) 4A polycistronic mRNA is a single RNA transcript that encodes more than one polypeptide. “Cistron” is a genetic term for a DNA region representing a protein: “cistron” and “gene” are essentially equivalent terms.
Transcriptional Activation in Eukaryotes
The regulatory elements or enhancer components of eukaryotic promoters stimulate transcription above basal levels. DNA-binding regulatory proteins that bind to these elements influence transcription from the core promoter through interactions with RNA polymerase that are conveyed through the TAFs or through a set of proteins known as the mediator complex, which is intimately associated with the CTD of RNA polymerase II. Association of the mediator complex with RNA polymerase II is essential to formation of the RNA polymerase II holoenzyme that carries out transcription.
31.3 · Transcription Regulation in Prokaryotes
Figure
31.13 · The general organization of
operons. Operons consist of transcriptional control regions and a set of related
structural genes, all organized in a contiguous linear array along the chromosome.
The transcriptional control regions are the promoter and the operator,
which lie next to, or overlap, each other, upstream from the structural genes
they control. Operators may lie at various positions relative to the promoter,
either upstream or downstream. Expression of the operon is determined by access
of RNA polymerase to the promoter, and occupancy of the operator by regulatory
proteins influences this access. Induction activates transcription from the
promoter; repression prevents it.
In bacteria, genes encoding the enzymes of a particular metabolic pathway are often grouped adjacent to one another in a cluster on the chromosome. Such clusters, together with the regulatory sequences that control their transcription, are called operons. This pattern of organization allows all of the genes in the group to be expressed in a coordinated fashion through transcription into a single polycistronic mRNA encoding all the enzymes of the metabolic pathway.4 A regulatory sequence lying adjacent to this unit of transcription determines whether it is transcribed. This sequence is termed the operator (Figure 31.13). The operator is located next to the promoter. Interaction of a regulatory protein with the operator controls transcription of the operon by governing the accessibility of RNA polymerase to the promoter. Although this is the paradigm for prokaryotic gene regulation, it must be emphasized that many regulated prokaryotic genes do not contain operators and are regulated in ways that do not involve protein:operator interactions.
Transcription of Operons Is Controlled by Induction and Repression
Figure
31.14 · The structure of lactose, a
b-galactoside.
In prokaryotes, regulation
is ultimately responsive to small molecules serving as signals of the nutritional
or environmental conditions confronting the cell. Increased synthesis of enzymes
in response to the presence of a particular substrate is termed induction.
For example, lactose (Figure 31.14) can serve as both carbon and energy source
for E. coli . Metabolism of this substrate depends on hydrolysis into
its component sugars, glucose and galactose, by the enzyme b-galactosidase.
In the absence of lactose, E. coli cells contain very little b-galactosidase
(less than 5 molecules per cell). However, lactose availability induces the
synthesis of b-galactosidase by activating transcription
of the lac operon. One of the genes in the lac operon, lacZ,
is the structural gene for b-galactosidase. When
its synthesis is fully induced, b-galactosidase
can amount to almost 10% of the total soluble protein in E. coli . When
lactose is removed from the culture, synthesis of b-galactosidase
halts.
The alternative
to induction, namely decreased synthesis of enzymes in response to a specific
metabolite, is termed repression. For example, the enzymes of tryptophan
biosynthesis in E. coli are encoded in the trp operon. If sufficient
Trp is available to the growing bacterial culture, the trp operon is not transcribed,
so the Trp biosynthetic enzymes are not made; that is, their synthesis is repressed.
Repression of the trp operon in the presence of Trp is an eminently logical
control mechanism: If the end product of the pathway is present, why waste cellular
resources making unneeded enzymes?
Figure
31.15 · The structure of IPTG (isopropyl
b-thiogalactoside).
Induction and repression are two faces of the same phenomenon. In induction, a substrate activates enzyme synthesis. Substrates capable of activating synthesis of the enzymes that metabolize them are called co-inducers, or often simply inducers. Some substrate analogs can induce enzyme synthesis even though the enzymes are incapable of metabolizing them. These analogs are called gratuitous inducers. A number of thiogalactosides, such as IPTG (isopropylthiogalactoside, Figure 31.15), are excellent gratuitous inducers of b-galactosidase activity in E. coli . In repression, a metabolite, typically an end product, depresses synthesis of its own biosynthetic enzymes. Such metabolites are called co-repressors.
lac: The Paradigm of Operons
Figure
31.16 · The lac operon. The operon
consists of two transcription units. In one unit, there are three structural
genes, lacZ, lacY, and lacA, under control of the promoter,
plac, and the operator O. In the other unit, there is a regulator
gene, lacI, with its own promoter, placI. lacI
encodes a 360-residue, 38.6-kD polypeptide that forms a tetrameric lac
repressor protein. lacZ encodes b-galactosidase,
a tetrameric enzyme of 116-kD subunits. lacY is the b-galactoside
permease structural gene, a 46.5-kD integral membrane protein active in b-galactoside
transport into the cell. The remaining structural gene encodes a 22.7-kD polypeptide
that forms a dimer displaying thiogalactoside transacetylase activity in vitro,
transferring an acetyl group from acetyl-CoA to the C-6 OH of thiogalactosides,
but the metabolic role of this protein in vivo remains uncertain. lacA
mutants show no identifiable metabolic deficiency. Perhaps the lacA protein
acts to detoxify toxic analogs of lactose through acetylation.
In 1961, François Jacob
and Jacques Monod proposed the operon hypothesis to account for the coordinate
regulation of related metabolic enzymes. The operon was considered to be the
unit of gene expression, consisting of two classes of genes: the structural
genes for the enzymes, and regulatory elements or genes that controlled expression
of the structural genes. The two kinds of genes could be distinguished by mutation.
Mutations in a structural gene would abolish one particular enzymatic activity,
but mutations in a regulatory gene would affect all of the different enzymes
under its control. Mutations of both kinds were known in E. coli for
lactose metabolism. Bacteria with mutations in either the lacZ
gene or the lacY gene (Figure 31.16) could no longer metabolize
lactose—the lacZ mutants (lacZ- strains)
because b-galactosidase activity was absent, the
lacY mutants because lactose was no longer transported into the
cell. Lactose transport could still be induced in lacZ mutants,
and lacY mutants displayed lactose-inducible b-galactosidase
activity. Other mutations defined another gene, the lacI gene. lacI
mutants were different because they both expressed b-galactosidase
activity and immediately transported lactose, without prior exposure to an inducer.
That is, a single mutation led to the expression of lactose metabolic
functions independently of inducer. Expression of genes independently of regulation
is termed constitutive expression. Thus, lacI had the properties
of a regulatory gene. The lac operon includes the regulatory gene lacI,
its promoter p, and three structural genes, lacZ, lacY, and lacA,
with their own promoter plac and operator O (Figure 31.16).
Figure
31.17 · The mode of action of lac
repressor.
The structural
genes of the lac operon are controlled by negative regulation.
That is, they are transcribed to give an mRNA unless turned off by the lacI
gene product. This gene product is the lac repressor, a tetrameric protein
(Figure 31.17). The lac repressor has two kinds of binding sites—one
for inducer and another for DNA. In the absence of inducer, lac repressor
blocks lac gene expression. It accomplishes repression by binding to
the operator DNA site upstream from the lac structural genes. Despite
the presence of lac repressor, RNA polymerase can still initiate transcription
at the plac promoter, but lac repressor blocks elongation
of transcription, so initiation is aborted. In lacI mutants, the lac
repressor is absent or defective in binding to operator DNA, lac gene
transcription is not blocked, and the lac operon is constitutively expressed
in these mutants. Note that lacI is normally expressed constitutively
from its promoter, so that lac repressor protein is always available
to fill its regulatory role. About 10 molecules of lac repressor are
present in an E. coli cell.
Derepression
of the lac operon occurs when appropriate b-galactosides
occupy the inducer site on lac repressor, causing a conformational change
in the protein that lowers the repressor’s affinity for operator DNA. As a tetramer,
lac repressor has four inducer binding sites and its response to inducer
shows cooperative allosteric effects. Thus, as a consequence of the “inducer”-induced
conformational change, the inducer:lac repressor complex dissociates
from the DNA, and RNA polymerase transcribes the structural genes (Figure 31.17).
Induction reverses rapidly, lac mRNA has a half-life of only 3 minutes,
and once the inducer is used up through metabolism by the enzymes, free lac
repressor re-associates with the operator DNA, transcription of the operon is
halted, and the residual lac mRNA decays.
The lac Operator
Figure
31.18 · The nucleotide sequence of the
lac operator. This sequence comprises 36 bp showing nearly palindromic
symmetry. The inverted repeats that constitute this approximate twofold symmetry
are shaded in rose. The bases are numbered relative to the +1 start site for
transcription. The G:C base pair at position +11 represents the axis of symmetry.
In vitro studies show that bound lac repressor protects a 26-bp
region from -5 to +21 against nuclease digestion. Oc mutants are
shown above the operator. Bases that are protected against methylation by dimethyl
sulfate or that undergo UV-induced cross-linking to bound lac repressor
are indicated below the operator. Note the symmetry of protection at +1 through
+4 TTAA to +18 through +21 AATT.
The lac operator is a palindromic DNA sequence (Figure 31.18). Palindromes, or “inverted repeats” (Chapter 12), provide a twofold, or dyad, symmetry, a structural feature common at sites in DNA where proteins specifically bind. While the operator consists of 35 bp, 26 of which are protected from nuclease digestion when lac repressor is bound, a central core defined by 13 bp (from +5 to +17) is involved in specific contacts with lac repressor. Mutations at eight sites in this restricted region lead to constitutive expression of the lac operon because repressor can no longer bind (Figure 31.18). These mutants are so-called Oc, or operator-constitutive, mutants. Note that the distribution of Oc mutants is not symmetrical about the axis of symmetry. Further, certain Oc mutations, as in G:CnA:T changes at positions 7 or 9, actually render the palindrome more perfect. The distribution of Oc mutants indicates that repressor contacts with the left half of the palindrome may be more crucial than those with the right half. The operator and promoter (plac) sites overlap: lac repressor protects a region roughly covering nucleotides -5 to +21 from nuclease digestion, whereas RNA polymerase binding and nuclease protection defines plac as falling within the -45 to +18 region.
Interactions of lac Repressor with DNA
Limited digestion of lac repressor with trypsin removes an N-terminal, 5'-residue fragment from each subunit, leaving a “core” tetramer that is no longer capable of binding to operator DNA. IPTG binding by the “core” tetramer is unaffected. The N-terminal, 5'-residue fragment retains DNA-binding ability. Thus, the protein is composed of an N-terminal, DNA-binding domain, with the rest of the protein functioning in inducer binding and tetramer formation. In the absence of inducer, intact lac repressor nonspecifically binds to duplex DNA with an association constant, KA, of 2 x 106 M-1 (Table 31.3), and to the lac operator DNA sequence with much higher affinity, KA = 2 x 1013 M-1. Thus, lac repressor binds 107 times better to lac operator DNA than to any random DNA sequence. IPTG binds to lac repressor with an association constant of about 106 M-1. The IPTG:lac repressor complex binds to operator DNA with an association constant, KA = 2 x 1010 M-1. Although this affinity is high, it is 3 orders of magnitude less than the affinity of inducer-free repressor for lac operator. There is no difference in the affinity of free lac repressor and lac repressor with IPTG bound for nonoperator DNA. The lac repressor apparently acts by binding to DNA and sliding along it, testing sequences in a one-dimensional search until it finds the lac operator. The lac repressor then binds there with high affinity until inducer causes this affinity to drop by 3 orders of magnitude.
Positive Control of the lac Operon by CAP
Transcription by RNA polymerase from some promoters proceeds with low efficiency unless assisted by an accessory protein that acts as a positive regulator. One such protein is CAP, or catabolite activator protein. Its name derives from the phenomenon of catabolite repression in E. coli. Catabolite repression is a global control that coordinates gene expression with the total physiological state of the cell: As long as glucose is available, E. coli catabolizes it in preference to any other energy source, such as lactose or galactose. Catabolite repression ensures that the operons necessary for metabolism of these alternative energy sources, that is, the lac and gal operons, remain repressed until the supply of glucose is exhausted. Catabolite repression overrides the influence of any inducers that might be present.
Figure
31.19 · The mechanism of catabolite
repression and CAP action. Glucose instigates catabolite repression by lowering
cAMP levels. cAMP is necessary for CAP binding near promoters of operons whose
gene products are involved in the metabolism of alternative energy sources such
as lactose, galactose, and arabinose. The binding sites for the CAP-(cAMP)2
complex are consensus DNA sequences containing the conserved pentamer TGTGA
and a less well conserved inverted repeat, TCANA (where N is any nucleotide).
Catabolite repression is mediated by cAMP levels, which in turn are regulated by glucose. Transport of glucose into the cell is accompanied by deactivation of E. coli adenylyl cyclase, leading to lower cAMP levels. The action of CAP as a positive regulator is cAMP-dependent. cAMP binding to CAP enhances its DNA-binding affinity. CAP, also referred to as CRP (for cAMP receptor protein), is a dimer of identical 210-residue (22.5-kD) polypeptides. The N-terminal domains bind cAMP; the C-terminal domains constitute the DNA-binding site. Two molecules of cAMP are bound per dimer. The CAP-(cAMP)2 complex binds to specific target sites near the promoters of operons (Figure 31.19). Its presence assists closed promoter complex formation by RNA polymerase holoenzyme. For example, CAP binding at the -72 to -52 region of lac DNA promotes formation of an RNA polymerase holoenzyme:plac DNA closed promoter complex. Analysis of the structure of the CAP:DNA complex reveals that the DNA is bent more than 90° about the center of dyad symmetry (Figure 31.20). This bend may be related to the ability of CAP to assist in transcription initiation.
Figure 31.20 · Binding of CAP-(cAMP)2 induces a severe bend in DNA about the center of dyad symmetry at the CAP-binding site. The CAP dimer with two molecules of cAMP bound interacts with 27 to 30 base pairs of duplex DNA. The cAMP-binding domain of CAP protein is shown in blue and the DNA-binding domain in purple. The two cAMP molecules bound by the CAP dimer are indicated in red. For DNA, the bases are shown in white and the sugar-phosphate backbone in yellow. DNA phosphates that interact with CAP are highlighted in red. Binding of CAP-(cAMP)2 to its specific DNA site involves H bonding and ionic interactions between protein functional groups and DNA phosphates, as well as H-bonding interactions in the DNA major groove between amino acid side chains of CAP and DNA base pairs. (Adapted from Schultz, S. C., Shields, G. C., and Steitz, T. A., 1991. Crystal structure of a CAP-DNA complex: The DNA is bent by 90°. Science 253:1001-1007. Photograph courtesy of Professor Thomas A. Steitz of Yale University)
Positive Versus Negative Control
Figure
31.21 · Control circuits governing the
expression of genes. These circuits can be either negative or positive, inducible
or repressible.
Negative- and positive-control
systems are fundamentally different. Genes under negative control are transcribed
unless they are turned off by the presence of a repressor protein. Often,
transcription activation is essentially anti-inhibition; that is, the
reversal of negative control. In contrast, genes under positive control are
expressed only if an active regulator protein is present. The lac
operon illustrates these differences. The action of lac repressor is
negative. It binds to operator DNA and blocks transcription; expression of the
operon only occurs when this negative control is lifted through release of the
repressor. In contrast, regulation of the lac operon by CAP is positive:
Transcription of the operon by RNA polymerase is stimulated by CAP’s action
as a positive regulator.
Operons
can also be classified as inducible or repressible, or both, depending
on how they respond to the small molecules that mediate their expression. Repressible
operons are expressed only in the absence of their co-repressors. Inducible
operons are transcribed only in the presence of small-molecule co-inducers (Figure
31.21).
Figure
31.22 · Regulation of the araBAD
operon by the combined action of CAP and AraC protein.
E. coli can use
the plant pentose L-arabinose as sole source of carbon and energy. Arabinose
is metabolized via conversion to D-xylulose-5-P (a pentose phosphate pathway
intermediate and transketolase substrate [Chapter
23]) by three enzymes encoded in the araBAD operon. Transcription
of this operon is regulated by both catabolite repression and arabinose-mediated
induction. CAP functions in catabolite repression; arabinose induction is achieved
via the product of the araC gene, which lies next to the araBAD operon
on the E. coli chromosome. The araC gene product, the protein
AraC,5 is a 292-residue protein consisting of an N-terminal
domain (residues 1-170) that binds arabinose and acts as a dimerization motif
and a C-terminal (residues 178-292) DNA-binding domain. Regulation of araBAD
by AraC is novel in that it acts both negatively and positively. The
ara operon has three binding sites for AraC: araO1,
located at nucleotides -106 to -144 relative to the araBAD transcription
start site; araO2 (spanning positions -265 to -294); and araI,
the araBAD promoter. The araI site consists of two “half-sites”;
araI1 (nucleotides -56 to -78) and araI2
(-35 to -51). (The araO1 site contributes minimally to ara
operon regulation.)
The details
of araBAD regulation are as follows: When AraC protein levels
are low, the araC gene is transcribed from its promoter pc
(adjacent to araO1) by RNA polymerase (Figure 31.22). araC
is transcribed in the direction away from araBAD. When cAMP levels are
low and arabinose is absent, an AraC protein dimer binds to two sites,
araO2 and the araI1 half-site, forming a DNA loop between
them and restricting transcription of araBAD (Figure 31.22). In the presence
of L-arabinose, the monomer of AraC bound to the araO2
site is released from that site; it then associates with the unoccupied araI
half-site, araI2. L-Arabinose thus behaves as an allosteric
effector that alters the conformation of AraC. In the arabinose-liganded
conformation, the AraC dimer interacts with CAP-(cAMP)2 to
activate transcription by RNA polymerase. Thus, AraC protein is both
a repressor and an activator.
Figure
31.23 · Introduction or removal of half
a helical turn in the DNA between araO2 and araI prevents AraC
protein from interacting with both sites and achieving araBAD repression.
(Adapted from Schleif, R., 1987. The L-arabinose operon. In Escherichia
coli and Salmonella typhimurium,
vol. 2. Edited by Neidhardt, F. C., et al. Washington, DC: American Society
for Microbiology.)
Deletion studies
reveal that both araO2 and araI must be present on the chromosome
in order for AraC protein to repress araBAD. The DNA loop created
when AraC binds both araO2 and araI1 consists
of some 210 bp. If 5 bp of DNA (one-half a helical turn) are added or deleted
in this intervening region, the two AraC-binding sites are rotated away
from each other, so that interaction of AraC with both sites is not possible,
and repression is no longer observed (Figure 31.23). The creation of DNA loops
by sequence-specific, DNA-binding proteins is a mechanism common to many regulatory
phenomena involving DNA. (DNA looping is considered in greater detail later
in this chapter.)
Positive
control of the araBAD operon occurs in the presence of L-arabinose and
cAMP. Arabinose binding by AraC protein causes the release of araO2,
opening of the DNA loop, and association of AraC with araI2.
CAP-(cAMP)2 binds at a site between araO1 and araI,
and together the AraC-(arabinose)2 and CAP-(cAMP)2
complexes influence RNA polymerase, through protein:protein interactions, to
create an active transcription initiation complex. Supercoiling-induced DNA
looping may promote protein:protein interactions between DNA-binding proteins
by bringing them into juxtaposition.
The trp Operon: Attenuation as a Mechanism to Regulate Gene Expression
Figure
31.24 · The trp operon of E.
coli.
The trp operon
of E. coli (and S. typhimurium) encodes a leader peptide sequence
(trpL) and five polypeptides, trpE through trpA (Figure
31.24). The five polypeptides comprise three enzymes that catalyze the formation
of tryptophan from chorismate (Chapter
26). Expression of the trp operon is under the control of Trp repressor,
a dimer of 108-residue polypeptide chains. When tryptophan is plentiful, Trp
repressor binds two molecules of tryptophan and associates with the trp
operator that is located within the trp promoter. Trp repressor binding excludes
RNA polymerase from the promoter, preventing transcription of the trp
operon. When Trp becomes limiting, repression is lifted because Trp repressor
lacking bound Trp (Trp apo-repressor) has a lowered affinity for the trp promoter.
Thus, the behavior of Trp repressor corresponds to a co-repressor-mediated,
negative control circuit (Figure 31.21). The Trp repressor regulates two other
operons: trpR and aroH
(Figure 31.25). Trp repressor is itself encoded by the trpR operon, and
its regulation of this operon serves as an example of autogenous regulation
(autoregulation), which is regulation of gene expression by the product
of the gene. The aroH operon encodes the Trp-sensitive DAHP synthase isozyme
of aromatic amino acid biosynthesis (Chapter
26).
Figure 31.25 · The three operators recognized by Trp repressor.
Attenuation
Figure
31.26 · Amino acid sequences of leader
peptides in various amino acid biosynthetic operons regulated by attenuation.
Color indicates amino acids synthesized in the pathway catalyzed by the operon’s
gene products. (The ilv operon encodes enzymes of isoleucine, leucine,
and valine biosynthesis.)
In addition to repression,
the trp operon is controlled by transcription attenuation. Unlike
the mechanisms discussed thus far, attenuation regulates transcription after
it has begun. Charles Yanofsky, the discoverer of this phenomenon, has defined
attenuation as any regulatory mechanism that manipulates transcription termination
or transcription pausing to regulate gene transcription downstream. In prokaryotes,
transcription and translation (Chapters
11 and 33) are coupled,
and the translating ribosome is affected by the formation and persistence of
pause and termination structures in the mRNA. Attenuation occurs under normal
conditions but is blocked when levels of specific charged tRNAs (aminoacyl-tRNAs)
are lowered on account of amino acid limitation. In many operons encoding enzymes
of amino acid biosynthesis, a transcribed 150- to 300-bp leader region is positioned
between the promoter and the first major structural gene. These
regions encode a short leader peptide containing multiple codons
Figure 31.27 · Alternative secondary structures for the leader region (trpL mRNA) of the trp operon transcript.
for the pertinent amino
acid. For example, the leader peptide of the leu operon has four Leu
codons, the trp operon has two tandem Trp codons, and so forth (Figure
31.26). Translation of these codons depends on an adequate supply of the relevant
aminoacyl-tRNA, which in turn rests on the availability of the amino acid. When
Trp is scarce, the entire trp operon from trpL to trpA
is transcribed to give a polycistronic mRNA. But, as [Trp] increases, more and
more of the trp transcripts consist of only a 140-nucleotide fragment corresponding
to the 5'-end of trpL. Trp availability is causing premature termination
of trp transcription, that is, transcription attenuation. The secondary
structure of the 160-bp leader region transcript is the principal control element
in transcription attenuation (Figure 31.27). This RNA segment includes the coding
region for the 14-residue leader peptide. Three critical base-paired hairpins
can form in this RNA: the 1:2 pause structure, the 3:4 terminator,
and the 2:3 antiterminator. Obviously, the 1:2 pause, 3:4 terminator,
and the 2:3 antiterminator represent mutually exclusive alternatives. A significant
feature of this coding region is the tandem UGG Trp codons.
Figure
31.28 · The mechanism of attenuation
in the trp operon.
Transcription by RNA polymerase begins and progresses until position 92 is reached, whereupon the 1:2 hairpin is formed, causing RNA polymerase to pause in its elongation cycle. While RNA polymerase is paused, a ribosome begins to translate the leader region of the transcript. Translation by the ribosome releases the paused RNA polymerase and transcription continues, with RNA polymerase and the ribosome moving in unison. As long as Trp is plentiful enough that Trp-tRNATrp is not limiting, the ribosome is not delayed at the two Trp codons and follows closely behind RNA polymerase, translating the message soon after it is transcribed. The presence of the ribosome atop segment 2 blocks formation of the 2:3 antiterminator hairpin, allowing the alternative 3:4 terminator hairpin to form (Figure 31.28). Stable hairpin structures followed by a run of Us are features typical of rho-independent transcription termination signals, so the RNA polymerase perceives this hairpin as a transcription stop signal and transcription is terminated at this point. On the other hand, a paucity of Trp and hence Trp-tRNATrp causes the ribosome to stall on segment 1. This leaves segment 2 free to pair with segment 3 and to form the 2:3 antiterminator hairpin in the transcript. Because this hairpin precludes formation of the 3:4 terminator, termination is prevented and the entire operon is transcribed. Thus, transcription attenuation is determined by the availability of charged tRNATrp and its transitory influence over the formation of alternative secondary structures in the mRNA.
Transcription Is Regulated by a Diversity of Mechanisms
A surprising variety of control mechanisms operate in transcriptional regulation, as we have just seen. Several organizing principles materialize. First, DNA:protein interactions are a central feature in transcriptional control, and the DNA sites where regulatory proteins bind commonly display at least partial dyad symmetry or inverted repeats. Further, DNA-binding proteins themselves are generally even-numbered oligomers (for example, dimers, tetramers) that have an innate twofold rotational symmetry. Second, protein:protein interactions are an essential component of transcriptional activation. We see this latter feature in the activation of RNA polymerase by CAP-(cAMP)2 or AraC-(arabinose)2, to select just two examples. Third, the regulator proteins receive cues that signal the status of the environment (for example, Trp, lactose, cAMP) and act to communicate this information to the genome, typically via the medium of conformational changes and DNA:protein interactions.
Transcriptional Activators Work Through Protein:Protein Contacts with RNA Polymerase
Although transcriptional control is governed by a variety of mechanisms, an underlying principle of transcriptional activation has emerged. Transcriptional activation can take place when a transcriptional activator protein (such as CAP-(cAMP)2 or AraC-(arabinose)2) bound to DNA makes protein:protein contacts with RNA polymerase, and the degree of transcriptional activation is proportional to the strength of the protein:protein interaction. Generally speaking, a nucleotide sequence that provides a binding site for a DNA-binding protein can serve as an activator site if the DNA-binding protein bound there can interact with promoter-bound RNA polymerase. DNA-binding proteins that activate transcription thus have a DNA-binding domain and an activation domain capable of interacting with RNA polymerase. Such activation domains activate transcription through protein:protein interactions with either the a, b, b', or s subunits of RNA polymerase. Further, if the DNA-bound transcriptional activator makes contacts with two different components of RNA polymerase, a synergistic effect takes place such that transcription is markedly elevated. Thus, transcriptional activation at specific genes relies on the presence of one or more activator sites where one or more transcriptional activator proteins can bind and make contacts with RNA polymerase bound at the promoter of the gene. Indeed, transcriptional activators may facilitate the recruitment and binding of RNA polymerase to the promoter. This general principle applies to transcriptional activation in both prokaryotic and eukaryotic cells.
31.4 · Transcription Regulation in Eukaryotes
In eukaryotes, the situation
is substantially more complicated. First, the DNA is organized into chromatin,
which represses transcription by severely limiting the access of transcriptional
regulatory proteins to promoters. Thus, eukaryotic transcription requires factors
that can reorganize the chromatin so that the transcriptional machinery can
gain access to promoters. One such factor is the yeast Swi/Snf complex,
which may occur as a subcomponent of the mediator complex of RNA polymerase
II holoenzyme. Swi/Snf is a highly conserved complex containing about 10 proteins
that becomes physically and functionally associated with the CTD of RNA pol
II. Swi/Snf disrupts nucleosomal arrays in chromatin in an ATP-dependent manner,
thereby facilitating the binding of TBP and activator proteins to the DNA template.
Another aspect of chromatin remodeling involves reversible acetylation of Lys
e-NH3+-groups in nucleosomal
histones.
Acetylation
of these amino groups by histone acetyltransferases (HATs)
diminishes the electrostatic charges on histones, reducing the affinity of the
histone for the negatively charged sugar-phosphate backbone of DNA. The process
is reversed by histone deacetylases (HDACs), which remove
the N-acetyl groups. Although the overall effects are not straightforward, generally
speaking, histone acetylation favors gene expression.
Not only
metabolic activity and cell division but complex patterns of embryonic development
and cell differentiation must be coordinated through transcriptional regulation.
All this coordinated regulation takes place in cells where the relative quantity
(and diversity) of DNA is very great: A typical mammalian cell has 1500 times
as much DNA as an E. coli cell. Eukaryotic genes have promoters and other
regulatory elements analogous to those found in prokaryotic genes, but the structural
genes of eukaryotes are rarely organized in clusters akin to operons. Each eukaryotic
gene typically possesses a discrete set of regulatory sequences appropriate
to the requirements for its expression. Certain of these sequences provide sites
of interaction for general transcription factors, whereas others endow the gene
with great specificity in expression by providing targets for specific transcription
factors. Further, mRNA stability plays a greater role in eukaryotic gene expression;
unlike prokaryotic mRNAs, eukaryotic mRNAs show a wide range in relative half-lives.
The longer-lived an mRNA is, the greater the potential for its genetic information
to be persistently expressed.
Eukaryotic Promoters, Enhancers, and Response Elements
Figure
31.29 · Promoter regions of several
representative eukaryotic genes. (a) The SV40 early genes, the histone H2B gene,
and the thymidine kinase gene. Note that these promoters contain different combinations
of the various modules. In (b), the function of the modules within the thymidine
kinase gene is shown.
Promoters
The promoters of eukaryotic genes encoding proteins are defined by modules of short conserved sequences, such as the TATA box, the CAAT box, and the GC box. The presence of a CAAT box, usually located around -80 relative to the transcription start site, signifies a strong promoter. One or more copies of the sequence GGGCGG or its complement (referred to as the GC box) have been found upstream from the transcription start sites of so-called “housekeeping genes.” Housekeeping genes encode proteins commonly present in all cells and essential to normal function; such genes are typically transcribed at more or less steady levels. Sets of the various sequence modules are embedded in the upstream region of such genes and collectively define the promoter. Figure 31.29 depicts the promoter regions of several representative eukaryotic genes. Table 31.4 lists transcription factors that bind to respective modules. These transcription factors typically behave as positive regulatory proteins essential to transcriptional activation by RNA polymerase II at these promoters.
Enhancers
Figure
31.30 · Enhancers are sequence elements
located at varying positions and orientation relative to the promoter that act
to enhance transcription initiation. Transcription factors (proteins) bind to
enhancers and stimulate RNA polymerase II binding at a nearby promoter.
In addition to these promoter elements, eukaryotic genes are characterized by additional regulatory sequences known as enhancers. Enhancers (also called upstream activation sequences, or UASs) assist initiation. Enhancers differ from promoters in two fundamental ways. First, the location of enhancers relative to the transcription start site is not fixed. Enhancers may be several thousand nucleotides away from the promoter, and they act to enhance transcription initiation even if positioned downstream from the gene. Second, enhancer sequences are bidirectional in that they function in either orientation. That is, enhancers can be removed and then reinserted in the reverse sequence orientation without any diminution in their function. Like promoters, enhancers represent modules of consensus sequence. Enhancers are promiscuous, because they stimulate transcription from any promoter that happens to be in their vicinity. Nevertheless, enhancer function is dependent on recognition by a specific transcription factor. A specific transcription factor bound at an enhancer element interacts with RNA pol II at a nearby promoter via a looping mechanism (Figure 31.30).
Response Elements
Promoter modules
in genes responsive to common regulation are termed response elements.
Examples include the heat shock element (HSE), the glucocorticoid
response element (GRE), and the metal response element (MRE). These
various elements are found in the promoter regions of genes whose transcription
is activated in response to a sudden increase in temperature (heat shock), glucocorticoid
hormones, or toxic heavy metals, respectively (Table 31.5). HSE sequences are
recognized by a specific transcription factor, HSTF (for heat shock
transcription factor). HSEs are located about 15 bp upstream from the transcription
start site of a variety of genes whose expression is dramatically enhanced in
response to elevated temperature. Similarly, the response to steroid hormones
depends on the presence of a GRE positioned 250 bp upstream of the transcription
start point. Binding of a specific transcription factor, the steroid receptor,
at a GRE occurs when certain steroids bind to the steroid receptor.
Figure
31.31 · The metallothionein gene possesses
several constitutive elements in its promoter (the TATA and GC boxes) as well
as specific response elements such as MREs and a GRE. The BLEs are elements
involved in basal level expression (constitutive expression). TRE is a tumor
response element activated in the presence of tumor-promoting phorbol esters
such as TPA (tetradecanoyl phorbol acetate).
Many genes are subject to a multiplicity of regulatory influences. Regulation of such genes is achieved through the presence of an array of different regulatory elements. The metallothionein gene is a good example (Figure 31.31). Metallothionein is a metal-binding protein that protects cells against metal toxicity by binding excess amounts of heavy metals and removing them from the cell. This protein is always present at low levels, but its concentration increases in response to heavy metal ions such as cadmium or in response to glucocorticoid hormones. The metallothionein gene promoter consists of two general promoter elements, namely, a TATA box and a GC box; two basal-level enhancers; four MREs; and one GRE. These elements function independently of one another; any one is able to activate transcription of the gene.
DNA Looping
Figure
31.32 · Enhancer:promoter interaction
via a protein-mediated DNA loop. Formation of a DNA loop delivers the enhancer-binding
specific transcription factor to RNA polymerase II positioned at the promoter.
Protein:protein interactions between the transcription factor and RNA pol II
activate transcription.
Because transcription must respond to a variety of regulatory signals, multiple proteins are essential for appropriate regulation of gene expression. These regulatory proteins are the sensors of cellular circumstances, and they communicate this information to the genome by binding at specific nucleotide sequences. However, DNA is virtually a one-dimensional polymer, and there is little space for a lot of proteins to bind at (or even near) a transcription initiation site. DNA looping permits additional proteins to convene at the initiation site and to exert their influence on creating and activating an RNA pol II initiation complex (Figure 31.32). The repertoire of transcriptional regulation is greatly expanded by DNA looping. Further, DNA looping is greatly influenced by negative supercoiling.
31.5 · Structural Motifs in DNA-Binding Regulatory Proteins
Proteins that recognize nucleic acids do so by the basic rule of macromolecular recognition. That is, such proteins present a three-dimensional shape or contour that is structurally and chemically complementary to the surface of a DNA sequence. When the two molecules come into close contact, the numerous atomic interactions that underlie recognition and binding can take place between the two. Nucleotide sequence-specific recognition by the protein involves a set of atomic contacts with the bases and the sugar-phosphate backbone. Hydrogen bonding is critical for recognition, with amino acid side chains providing most of the critical contacts with DNA. Protein contacts with the bases of DNA usually occur within the major groove; protein contacts with the DNA backbone involve both H bonds and salt bridges with oxygen atoms of the phosphodiester linkages. Structural studies on regulatory proteins that bind to specific DNA sequences have revealed that roughly 80% of such proteins can be assigned to one of three principal classes based on their possession of three small, distinctive structural motifs: the helix-turn-helix