Chapter 29

DNA: Genetic Information, Recombination, and Mutation

Shakespeare's quote speaks to the ephemeral nature of life. Inheritance, throughout life's long history a matter of chance, now can be manipulated by scientists. Dolly, the first mammal to be cloned from an adult cell, is shown here with her naturally conceived first lamb, Bonnie. Hello, Dolly! (AP/Wide World Photos)

The fact that DNA is the material of heredity is common knowledge today, even though no one could have successfully defended such a proposition before the last half of the twentieth century. Heredity, which we can define generally as the tendency of an organism to possess the characteristics of its parent(s), was clearly evident throughout nature and since the dawn of history had served to justify the classification of organisms according to shared similarities. The molecular basis of heredity, however, was not obvious. Early geneticists demonstrated that genes, the elements or units carrying and transferring inherited characteristics from parent to offspring, are contained within the nuclei of cells in association with the chromosomes. Yet the chemical identity of genes remained unknown, and genetics was an abstract science. Even the realization that chromosomes are composed of proteins and nucleic acids did little to define the molecular nature of the gene because at the time no one understood either of these substances.

29.1 × The Discovery That DNA Carries Genetic Information

The material of heredity should have certain properties:

1. It must be very stable so that genetic information can be stored in it and transmitted countless times to subsequent generations.

2. It must be capable of precise copying or replication so that its information is not lost or altered.

3. Although stable, it must also be subject to change in order to account, in the short term, for the appearance of mutant forms and, in the long term, for evolution.

      The first evidence that deoxyribonucleic acid, or DNA, might be the material of heredity came from investigations on Streptococcus pneumoniae, one of the types of bacteria that cause pneumonia. In 1928, Frederick Griffith, an English microbiologist, was comparing the properties of two strains of pneumococcus bacteria. One strain, Type S (S for smooth colonial morphology), is virulent because it is enclosed within a slippery polysaccharide coat, or capsule, that protects it from the immune system of its host. The other strain, Type R (R for rough-looking colonies), lacks an enzyme for the biosynthesis of the polysaccharide coat and is not virulent because it cannot resist attack by the host's immune system. When Griffith injected Type S bacteria into mice, the blood became filled with S bacteria and the mice died. Heat-killed Type S bacteria had no effect on the mice, but if mice were injected with nonvirulent Type R bacteria that had been mixed with heat-killed Type S bacteria, the mice died and virulent Type S bacteria could be recovered from their blood. Somehow, the heat-killed Type S bacteria had transformed the nonvirulent R Type into the virulent S Type (Figure 29.1). In 1931, M.H. Dawson and R. H. P. Sia showed that extracts of heat-killed Type S cells could transform nonpathogenic R cells into genetically stable, pathogenic S cells.

Figure 29.1 × Griffith experiment on pneumococcal transformation: (1) Mice are resistant to Type R Streptococcus pneumoniae bacteria, but (b) are killed by injection with virulent Type S S. pneumoniae bacteria. (c) Injection with heat-killed virulent bacteria does not kill mice, but (d) if heat-killed Type S bacteria are mixed with nonvirulent Type R bacteria, they have the capacity to transform nonvirulent Type R bacteria into the virulent Type S form.

 

The "Transforming Principle" Is DNA

In 1944, Oswald T. Avery and his associates Colin M. MacLeod and Maclyn McCarty at the Rockefeller Institute made the discovery that the substance active in transforming Type R bacteria to virulence was, in fact, DNA. This finding was surprising and not immediately accepted because most scientists at the time thought that proteins, substances chemically more complex and diverse than nucleic acids, were the genetic material. Avery, MacLeod, and McCarty showed that highly purified preparations of "transforming principle" contained no detectable protein and were unaffected by trypsin or chymotrypsin (two proteolytic enzymes) or by pancreatic RNase (which hydrolyzes RNA). However, the transforming substance was readily inactivated by treatment with pancreatic DNase, an enzyme that specifically degrades DNA. Thus, DNA must have been the agent carrying the information that transforms R bacteria to virulence. Because transformation was stably inherited, DNA merited strong consideration as the actual material of heredity.

DNA Is the Hereditary Molecule of Bacteriophage

Further proof that DNA is the material of heredity came from the study of bacteriophage. In 1952, Alfred Hershey and Martha Chase devised an elegant experiment to trace the fates of the two major components of bacteriophage¾coat protein and DNA¾following infection. They took advantage of the fact that nucleic acids lack sulfur and proteins lack phosphorus to uniquely label bacteriophage DNA with 32P and bacteriophage protein with 35S. Bacterio-phage labeled with either isotope were obtained from cultures of bacteriophage T2 grown on Escherichia coli in medium containing radioactive 32P-labeled inorganic phosphate or radioactive 35S-labeled methionine.

Figure 29.2 × Electron micrograph of bacteriophage particle attached to a bacterial cell. A single T4 bacteriophage weighs 5 ´ 10-13 g and consists of 60% DNA and 40% protein. Its volume is about 1/1000 the volume of an E. coli cell. T4 phage heads are 100 nm ´ 65 nm icosahedra attached to tails 100 nm long by 25 nm in diameter. (J. Broek/Biozentrum, University of Basel Science Photo Library)

      Phage infection of bacteria involves attachment of the bacteriophage to the bacterial cell at specific attachment sites. The phage DNA enters the bacterial cell, leaving its protein coat behind on the surface of the bacterium (Figure 29.2). Hershey and Chase mixed labeled bacteriophage T2 with unlabeled E. coli cells, permitting sufficient time for the phage to attach. Then they vigorously agitated the culture in a blender to shear the phage coats from the bacterial surface. Following centrifugation of the culture, infected bacteria could be recovered in the pellet, whereas the phage coats containing most of the 35S label remained suspended in the supernatant. In contrast, when E. coli cells were infected with 32P-labeled T2 phage, the bacterial pellet contained most of the 32P. Furthermore, upon lysis, 30% of the original 32P but only 1% of the 35S was recovered in the bacteriophage progeny produced by the infection (Figure 29.3). Hershey and Chase surmised that the bacteriophage DNA was sufficient for bacteriophage reproduction. That is, DNA must be the material of heredity.

Figure 29.3 × The Hershey and Chase experiement demonstrated that the DNA component of bacteriophage T2 carried the requisite genetic information for bacteriophage reproduction.

29.2 × Genetic Information in Bacteria: Its Organization, Transfer, and Rearrangement

Bacteria are very useful organisms for genetic analysis: Under optimal conditions of growth and reproduction, some bacteria (such as E. coli) divide every 20 minutes, the progeny of each division being a new generation. A genetic experiment can be completed with bacteria in hours, whereas an analogous experiment with a multicellular organism would take months or years because the generation times of such organisms are months or even years in duration. Further, a single milliliter of bacterial culture can contain enormous numbers of bacteria¾as many as 1010 ¾all derived from a single parental bacterium:

A single bacterium growing with a generation time of 20 min can give rise to 1010 progeny in less than 11 hr. N, the number of cells after n number of generations, is given by N = 2n. For N = 1010 = 2n, n = 33.22 (232.22 = 1010). At 0.33 hr per generation, 33.22 generations (the time to accumulate 1010 cells from a single bacterium) occur in about 11 hours.

Because of these vast numbers, very rare genetic events can be observed. That is, a one-in-a-million occurrence could be present in thousands of bacteria in a culture. In addition, because bacteria are haploid organisms (organisms with only one chromosome or one set of chromosomes), each cell contains but one set of genetic instructions. Consequently, any mutation in a gene is not masked or corrected by a second, normal copy of the gene, as it usually is in diploid organisms (organisms having two, essentially duplicate, sets of chromosomes). In haploid organisms like bacteria, the phenotype, or perceptible characteristics of the organism, reflects its genotype, or genetic composition. In contrast, diploid organisms may exhibit a wild-type, or normal, phenotype for any trait, even though their genotype might contain one mutant copy and one wild-type copy of the gene responsible for the trait.

Figure 29.4 × The use of nutritional mutants to demonstrate sexuality in bacteria. The genetic markers are thr-, leu-, thi- (inability to grow in the absence of threonine, leucine, and thiamine, respectively) on one chromosome (Parent A), and phe-, cys-, bio- (defects in genes for phenylalanine, cystine, and biotin synthesis, respectively) on the other chromosome (Parent B). A very few of the bacteria from the mixed culture grew; these bacteria have become thr+ leu+ thi+ phe+ cys+ bio+ as a result of genetic recombination: the formation of chromosomes with different combinations of gene types than those found in the parental chromosomes. That is, genetic recombination is the process of forming new combinations of genes. The production of offspring through sex is one mechanism for genetic recombination.

Mapping the Structure of Bacterial Chromosomes In 1946, Joshua Lederberg and Edward Tatum discovered that genetic information could be transferred between bacteria. They used two strains of E. coli that differed in their growth requirements due to mutations each carried (Figure 29.4). One strain (thr-, leu-, thi-) required threonine, leucine, and thiamine to grow; the other (phe-, cys-, bio-) required phenylalanine, cystine, and biotin. These two strains were mixed together and spread on the surface of a petri plate of minimal medium lacking any of the required supplements. After a day, a very small number of bacterial colonies were observed to be growing. Somehow, these growing bacteria had acquired functional (wild-type) copies of each of the mutant genes. This remarkable result suggested strongly that the chromosomes of the two different cell types were brought together in a process akin to sexual exchange. In order for the progeny cells (which contain but one chromosome) to have acquired genetic information from the parental strains, genetic recombination must have occurred. This represents, in the words of Lederberg and Tatum, "the assortment of genes in new combinations." Apparently, at some point in time, parental DNA molecules must have aligned along regions of homology (sequence similarity), and segments from one of these molecules must have been interchanged with similar segments from the other parents so that some DNA molecules (chromosomes) now carried wild-type thr+ leu+ thi+ phe+ cys+ bio+ genes (Figure 29.4). Lederberg and Tatum speculated that, in order for the various genes to have had the opportunity to recombine, the cells of one strain must have interacted with the cells of the other.

Sexual Conjugation in Bacteria

Figure 29.5 × Electron micrograph of two E. coli cells, one F+, the other F-, joined in sexual conjugation. The pilus joining them in indicated by the arrow. (Fred Marsik/Visuals Unlimited)

The transfer of DNA between bacteria takes place via a process known as sexual conjugation, a phenomenon unsuspected prior to the Lederberg-Tatum experiment. Bacterial cells sometimes contain, in addition to their chromosome, extrachromosomal DNA molecules called plasmids (see Chapter 13). Plasmids represent "extra" or auxiliary genetic information. Bacterial cells are capable of conjugation if they possess a particular plasmid called the F factor (F for fertility). Such F+, or donor, cells have thin, hollow tubes projecting from their surface known as sex pili or F pili (singular = pilus). One or more pili can bind to specific receptors on the surface of cells that lack an F factor (F-, or recipient, cells; Figure 29.5). The pilus provides a connection between the two cells. Upon conjugation, a single strand of the F factor is passed to the F- cell, where its complementary strand is synthesized (Figure 29.6).The recipient F- cell thus becomes F+ by virtue of now having a double-stranded F factor plasmid. The F factor plasmid consists of about 94,000 base pairs; about one-third of this DNA is devoted to about 25 genes that function specifically in the transfer of genetic material from F+ to F- cells. Among these genes are those necessary for the formation of pili. In reality, the F factor is an infectious agent.

 

 

Figure 29.6 × Diagram showing the transfer of F factor from an F+ to an F- cell. A single strand of the F factor is nicked and transferred into the recipient F- cell. The complementary strand is then synthesized within the recipient F- cell to create a new double-stranded F factor, transforming the F- cell into an F+ one.

 

Figure 29.7 × Transfer of segments of the bacterial chromosome from the donor Hfr to the recipient F- cell. Because complete transfer of the Hfr chromosome rarely happens and because replicaiton begins at a site within the F factor, transfer of the entire F factor is seldom achieved. Thus, the recipient cell usually remains F-. The E. coli chromosome can be mapped by interrupted mating of Hfr strins with F- strains. The genetic markers here are thr-, leu- (requirement fo threonine, or leucine, in order to grow), gal-, lac- (inability to grow on galatctose, or lactose, as sole carbon source), aziR (resistance to azice), tonR (resistance to bacteriophage T1), and strR (resistance to sreptomycin). The superscripts +, R, S denote wild-type, resistance, and sensitivity, respecitvely. (a) Ordered transfer of all chromosome during mating,. Mating is interupted by the shearing of the joined cells in a blender at chosen intervals; this separates cells at vaious stages in the transfer of the Hfr chromomsome. The cells are then plated onto selective memdium and scored for their sensitivity to bacteriophage T1 and azide and their ability to grow on glactose or lactose as sole carbon source. (b) The frequencies of geneti markers aziS, tonS, lac+, and gal+ among the recombinants as a function of mating time. Extrapolation to zero gives and indication of when the various markder enter the recipient cell. ( Adapted from Jacob, F., and Wollman, E., 1961. Sexuality and the Genetics of Bacteria, New York: Acadmic Press, p. 135)

High Frequency of Recombination

In rare instances, the F factor will integrate into the bacterial host chromosome. (Plasmids capable of chromosomal integration are termed episomes.) Cells harboring F factor integrated into the chromosome show a much higher frequency of recombination of chromosomal genes upon conjugation, or "mating," with F- cells and so are referred to as Hfr cells, for "high frequency of recombination." In Hfr cells, the conjugal process determined by the F factor operates as it does when the F factor is acting autonomously (Figure 29.7). That is, a single strand is passed to the recipient F- cell, where its complementary strand is synthesized. However, because of its integrated position within the Hfr chromosome, the F factor carries along genes adjacent to it on the chromosome. If conjugation continues long enough, a single-stranded copy of the entire host chromosome is passed to the F- cell. However, conjugation rarely persists the 100 minutes or more required for complete transfer, so usually only part of the Hfr chromosome is transferred.
      Genes from the Hfr chromosomes are transferred into the F- cell in a fixed order. Therefore, the order of the genes along the Hfr chromosome must be fixed, and this order can be mapped by the technique of interrupted mating (Figure 29.7). Further, because the F factor is integrated at different sites in different Hfr strains of E. coli, genes difficult to map because they were transferred very late (and hence rarely) in one Hfr strain are readily mapped in another. The genetic map obtained by the interrupted mating method reveals a circular arrangement of genes, consistent with the circular organization of the E. coli chromosome (Figure 29.8). Other bacterial chromosomes show a similar circular organization.

Figure 29.8 × The genetic map of the E. coli chromosome. This circular map is divided into 100 minutes. The 100 minutes arose historically as the time period necessary for complete gene transfer in interrupted mating experiments. The marker thrL is arbitrarily chosen as minute 0. The complete sequence of the E. coli genome (Science (1997) 277:1453-1474) encompasses 4405 open reading frames (ORFs) encoding some 4289 proteins.

29.3 × The Molecular Mechanism of Recombination

Genetic recombination is the natural process by which genetic information is rearranged to form new associations. Such recombination is a powerful genetic and evolutionary force that reshapes the genomes of all organisms. At the molecular level, genetic recombination is the exchange (or incorporation) of one DNA sequence with (or into) another. For example, homologous recombination involves an exchange of DNA sequences between homologous chromosomes, resulting in the arrangement of genes into new combinations. The process underlying homologous recombination is termed general recombination because the enzymatic machinery that mediates the exchange can use essentially any pair of homologous DNA sequences as substrates. Homologous recombination occurs during the production of gametes (meiosis) in diploid organisms. In higher animals, that is, those with immune systems, recombination also occurs in the DNA of somatic cells responsible for expressing proteins of the immune response, such as the immunoglobulins. This somatic recombination rearranges the immunoglobulin genes, dramatically increasing the potential diversity of immunoglobulins available from a fixed amount of genetic information (see Section 29.4). Homologous recombination can also occur in bacteria. Indeed, even viral chromosomes undergo recombination. For example, if two mutant viral particles simultaneously infect a host cell, a recombination event between the two viral genomes can lead to the formation of a recombination virus chromosome that is wild-type.
      Bacteriophage genomes can insert into bacterial chromosomes by a form of recombination, but because the integration of the bacteriophage DNA into the host DNA occurs only at a unique site on the host chromosome and involves specific DNA sequences on both phage DNA and bacterial DNA, the process is called site-specific recombination. Only a short length of homology (often less than 15 bp) is necessary for site-specific recombination events, and the enzymes involved act only on these sequences. In transposition, yet another type of recombination, particular DNA sequences known as transposons (see discussion later in this chapter) are inserted somewhat independently of any sequence homology on the DNA into which the insertion occurs. However, transposons themselves carry a specific sequence essential to insertion. Transposition serves as a mechanism by which genetic material may be moved from one chromosomal location to another. A fourth, rare form of recombination, illegitimate recombination, occurs between nonhomologous DNA independently of any unique sequence element.

Figure 29.9 × Meselson and Weigle's experiment demonstrated that a physical exchange of chromosome parts actually occurs during recombination. Density-labeled, "heavy" phage, symbolized as ABC phage in the diagram, was used to coinfect bacteria along with "light" phage, the abc phage. The progeny from the infection were collected and subjected to CsCl density gradient centrigugation. Parental-type ABC and abc phage were well-separated in the gradient, but recombinant phage (ABc,Abc,aBc,aBC, and so on) were distributed diffusely between the two parental bands because they contained chromosomes constituted from fragments of both "heavy" and "light" DNA. These recombinant chromosomes formed by breakage and reunion of parental "heavy" and "light" chromosomes.

General Recombination

Recombination occurs by the breakage and reunion of DNA strands, so that a physical exchange of parts takes place. Matthew Meselson and J. J. Weigle demonstrated in 1961 that this happens by coinfecting E. coli with two genetically distinct bacteriophage l strains, one of which had been density-labeled by growth in 13C- and 15N-containing media (Figure 29.9). The phage progeny were recovered and separated by CsCl density gradient centrifugation. Phage particles that displayed recombinant genotypes were distributed throughout the gradient while parental (nonrecombinant) genotypes were found within discrete "heavy" and "light" bands in the density gradient. The results showed that recombinant phage contained DNA derived in varying proportions from both parents. The obvious explanation is that these recombinant DNAs arose via the breakage and rejoining of DNA molecules.

Figure 29.10 × The generation of progeny bacteriophage of two different genotypes from a single phage particle carrying a heteroduplex DNA region within its chromosome. The heteroduplex DNA is composed of one strand that is genotypically XYZ (the + strand), and the other strand that is genotypically XyZ (the - strand). That is, the genotype of the two parental strands for gene Y is different (one is Y, the other y).

      A second important observation made during this type of experiment was that some of the plaques formed by the phage progeny contained phage of two different genotypes, even though each plaque was caused by a single phage infecting one bacterium. Therefore, some infecting phage chromosomes must have contained a region of heteroduplex DNA, duplex DNA in which a part of each strand is contributed by a different parent (Figure 29.10).

The Holliday Model

In 1964, Robin Holliday proposed a model for homologous recombination that has proven influential (Figure 29.11). The two homologous DNA duplexes are first juxtaposed so that their sequences are aligned. This process of chromosome pairing is called synapsis (Figure 29.11a). Holliday suggested that recombination begins by introduction of single-stranded nicks at homologous sites on the two paired chromosomes (Figure 29.11b). The two duplexes partially unwind, and the free, single-stranded end of one duplex begins to base-pair with its nearly complementary, single-stranded

Figure 29.11 × The Holliday model for homologuous recomibination. The + signs and - signs label strands of like polarity. For example, assume that the two strands running 5' ® 3' as read left to right are labeled (d); and the two strands running 3' ® 5' as read left to right are labeled -. Only strands of like polarity exchange DNA during recombination. (See text for detailed description.)

region along the intact strand in the other duplex, and vice versa (Figure 29.11c). This strand invasion is followed by ligation of the free ends from different duplexes to create a cross-stranded intermediate known as a Holliday junction (Figure 29.11d). The cross-stranded junction can now migrate in either direction (branch migration) by unwinding and rewinding of the two duplexes (Figure 29.11e). Branch migration results in strand exchange; heteroduplex regions of varying length are possible. In order for the joint molecule formed by strand exchange to be resolved into two DNA duplex molecules, another pair of nicks must be introduced. Resolution can be represented best if the duplexes are drawn with the chromosome arms bent "up" or "down" to give a planar representation (Figure 29.11f). Nicks then take place, either at E and W, that is, in the - strands that were originally nicked (see Figure 29.11b) or at N and S, that is, the + strands (the strands not previously nicked). Duplex resolution is most easily kept straight by remembering that + strands are complementary to - strands and any resultant duplex must have one of each. Nicks made in the strands originally nicked lead to DNA duplexes in which one strand of each remains intact. Although these duplexes contain heteroduplex regions, they are not recombinant for the markers (AZ, az) that flank the heteroduplex region; such heteroduplexes are called patch recombinants (Figure 29.11g). Nicks introduced into the two strands not previously nicked yield DNA molecules that are both heteroduplex and recombinant for the markers A/a and Z/z; these heteroduplexes are termed splice recombinants (Figure 29.11h). Although this Holliday model explains the outcome of recombination, it provides no mechanistic explanation for the strand exchange reactions and other molecular details of the process.

The Enzymology of General Recombination

To illustrate recombination mechanisms, we focus on general recombination as it occurs in E. coli. The principal players in the process are the RecBCD enzyme complex, which initiates recombination; the RecA protein, which binds single-stranded DNA, forming a nucleoprotein filament capable of strand invasion and homologous pairing; and the RuvA, RuvB, and RuvC proteins, which drive branch migration and process the Holliday junction into recombinant products. Eukaryotic homologs of these prokaryotic recombination proteins have been identified, indicating that the fundamental process of general recombination is conserved across all organisms.

Figure 29.12 × Model of RecBCD-dependent initiation of recombination. (a) RecBCD binds to a duplex DNA end and its helicase activity begins to unwind the DNA double helix. "Rabbit ears" of ssDNA loop out from RecBCD because the rate of DNA unwinding exceeds the rate of ssDNA release by RecBCD. (b) As it unwinds the DNA, SSB (and some RecA) bind to the single-stranded regions; the RecBCD endonuclease activity randomly cleaves the ssDNA, showing a greater tendency to cut the 3'-terminal strand rather than the 5'-terminal strand. (c) When RecBCD encounters a properly oriented c site, the 3'-terminal strand is cleaved just below the 39-end of c. (d) RecBCD now directs the binding of RecA to the 3'-terminal strand, as RecBCD endonuclease activity now acts more often on the 5'-terminal strand. (e) A nucleoprotein filament consisting of RecA-coated 3'-strand ssDNA is formed. This nucleoprotein filament is capable of homologous pairing with a dsDNA and strand invasion (see Figure 29.14). (Adapted from Figure 2 in Eggleston, A. K., and West, S. C., 1996. Exchanging partners: recombination in E. coli. Trends in Genetics 12:20-25; and Figure 3 in Eggleston, A. K., and West, S. C., 1997. Recombination initiation: Easy as A, B, C, D . . . c? Current Biology 7:R745-R749)

The RecBCD Enzyme Complex

The proteins RecB (140 kD; 1180 amino acids), RecC (130 kD; 1122 amino acids), and RecD (67 kD; 608 amino acids) form a multifunctional enzyme complex having both helicase and nuclease activity. The RecBCD complex initiates recombination by attaching to the end of a DNA duplex (or at a double-stranded break in the DNA) and using its ATP-dependent helicase function to unwind the dsDNA (Figure 29.12a). As RecBCD progresses along unwinding the duplex, its nuclease activity cleaves both of the newly formed single strands (although the strand that provided the 3'-end at the RecBCD entry site is cut more frequently than the 5'-terminal strand [Figure 29.12b]).
      Single-stranded DNA-binding protein (SSB) (and some RecA protein) readily binds to the emerging single strands. Sooner or later, RecBCD encounters a particular nucleotide sequence, a so-called Chi (or c) site, characterized by the sequence 5'-GCTGGTGG-3'. These c sites are recombinational "hot spots"; 1009 c sites have been identified in the E. coli genome (on average, about one every 4.5 kb of DNA). When a c sequence is encountered by a RecBCD complex approaching its 3'-side (the ..G-3'-side), RecBCD cleaves the c-bearing DNA strand four to six bases to the 3' side of c (Figure 29.12c). Interaction of RecBCD with the c site causes the D subunit of RecBCD to become irreversibly altered such that the RecBCD complex no longer expresses nuclease activity against the 3'-terminal strand, but nuclease activity against the 5'-terminal strand increases (Figure 29.12d).
Resuming its helicase function, RecBCD unwinds the dsDNA, and collectively these processes generate a ssDNA tail bearing a site at its 3'-terminal end. This ssDNA may reach serveral kilbases in length. At the same time, the c-altered RecBDC complex now promotes preferential binding of RecA protein, instead of SSB, to the 3'-terminal strand to form a Nucleoprotein filament (Figure 29.12e, active in pairing and strand invasion with a homologous region in another dsDNA molecule.

 

The RecA Protein

Figure 29.13 × The structure of RecA protein. (a) Ribbon diagram of the RecA monomer. Note the ADP bound at the site near helices C and D. (b) RecA filament. Four turns of a helical filament that has six RecA monomers per turn. A RecA monomer is highlighted in red. (Adapted from figures 2 and 3 in Roca, A.I., and Cox, M.M., 1997. RecA protein:Structure, function, and role in recombinational DNA repair. Progress in Nucleic Acid Research and Molecualr Biology 56:127-223. Photos courtesy of Michael M. Cox, University of Wisconsin)

 

The RecA protein, or recombinase, is a multifunctional 352-residue (38 kD) enzyme that acts in general recombination to catalyze the ATP-dependent DNA strand exchange reaction, leading to formation of a Holliday junction (Figure 29.11b-f). RecA protein (Figure 29.13a) crystallizes in the absence of DNA to form a helical filament having six monomers per turn (Figure 29.13b). This filament has a deep spiral groove large enough to accommodate three strands of DNA. RecA binds single-stranded DNA with a stoichiometry of one RecA per three nucleotides, and the resultant nucleoprotein filament has a helical pitch of 8.5 to 10 nm and about six RecA monomers per turn. The DNA in both RecA:ssDNA and RecA:dsDNA filaments is extended 150% relative to B-form DNA. The nucleoprotein filament formed by binding of RecA protein to the 3'-terminal ssDNA has affinity for other DNA molecules. In fact, binding of multiple DNA strands is the hallmark of RecA function.


 
Figure 29.14 × Model for the strand exchange function of RecA, as based on the relative DNA affinities of the primary and secondary DNA-binding sites on RecA. (a) ssDNA is bound in the primary DNA-binding site of RecA. (b) dsDNA is bound weakly in the RecA secondary DNA-binding site, and RecA scans this dsDNA for homology. (c) Homology recognition leads to DNA strand exchange as the RecA-bound ssDNA forms a heteroduplex with a newly found complementary strand; this heteroduplex fills RecA's primary DNA-binding site. The strand displaced from the dsDNA now occupies the secondary DNA-binding site on RecA with higher affinity than dsDNA did. Subsequent base pairing between this strand and the 3' ® 5' strand of the incoming DNA (greenish yellow) creates a Holliday junction. (Adapted from Figure 5 in Mazin, A. V., and Kowalczykowoski, S. C., 1996. The specificity of the secondary DNA binding site of RecA protein defines its role in DNA strand exchange. Proceedings of the National Academy of Sciences, USA 93:10673 - 10678)

     In recombination, RecA uses its so-called high-affinity primary DNA-binding site to bind ssDNA (Fig. 29.14). This complex then interacts with other DNA molecules through a secondary DNA-binding site within RecA. This secondary site has higher affinity for ssDNA than dsDNA. The relative affinity of this secondary site suggests a mechanism for RecA in DNA strand exchange during recombination: a recA:ssDNA nucleoprotein complex transiently binds dsDNA in its secondary site and scans along the minor groove of the dsDNA, searching it for sequence homology with its bound ssDNA. When homology is found, a hybrid DNA duplex is formed between the ssDNA in the primary site and the complementary strand found in the scanned dsDNA. Formation of this hybrid duplex displaces the strand of the scanned dsDNA that is most like the ssDNA brought in by RecA. This process is the essence of DNA strand exchange. The displaced DNA strand is bound with higher affinity in the RecA secondary site than the dsDNA that was occupying this site. High-affinity binding of ssDNA in

Figure 29.15 × Model for homologous recombination as promoted by RecA enzyme. (a) RecA protein (and SSB) aid strand invasion of the 3'-ssDNA into a homologous DNA duplex, (b) forming a D-loop. (c) The D-loop strand that has been displaced by strand invasion pairs with its complementary strand in the origninal duplex to form a Holliday junction as strand invasion continues.

the secondary site stabilizes the strand-exchange complex and ensures proper heteroduplex formation by RecA in homologously aligned DNA strands (Figure 29.14).
      Procession of base unpairing of dsDNA and re-pairing into hybrid strands along the DNA duplex initiates branch migration (Figure 29.15b). Branch migration drives the displacement of the homologous DNA strand from the DNA duplex and its replacement with the ssDNA strand, a process known as single-strand assimilation (or single-strand uptake). Strand assimilation does not occur if there is no homology between the ssDNA and the invaded DNA duplex. The DNA strand displaced by the invading 3'-terminal ssDNA is free to anneal with the 5'-terminal strand in the original DNA, a step that is also mediated by RecA protein and SSB (Figure 29.15c). The result is a Holliday junction, the classic intermediate in genetic recombination. Proteins that assist RecA in the formation of a Holliday junction include RecF, RecO, and RecR.

Resolution of the Holliday Junction by the RuvA, RuvB, and RuvC Proteins

Figure 29.16 × Model for the resolution of a Holliday junction in E. coli by the RuvA, RuvB, and RuvC proteins. (a) Ribbon diagram of the RuvA tetramer. RuvA monomers have an overall L shape (one of them is outlined by the dashed white line); four of them form a tetramer with fourfold rotational symmetry, a structure reminiscent of a four-petaled flower. (b) Model for RuvA/RuvB action (first suggested by Parsons, C. A., et al., 1995. Structure of a multisubunit complex that promotes DNA branch migration. Nature 374:375-378.) (left): The RuvA tetramer fits snugly within the Holliday junction point. (center): Oppositely facing RuvB hexameric rings assemble on the heteroduplexes, with the DNA passing through their centers. These RuvB hexamers act as motors to promote branch migration by driving the passage of the DNA duplexes through themselves. (right): Binding of RuvC at the Holliday junction and strand scission by its nuclease activity. The locations of the RuvC active sites are indicated by the scissors. (c) Charge distribution on the concave surface of an RuvA tetramer. Blue indicates positive charge and red, negative charge. Note the overall positive charge on this surface of (RuvA)4, with the exception of the four red (negatively charged) pins at its center. (d) Structural model for the interaction of (RuvA)4 with the hypothesized square-planar Holliday junction center. (Adapted from Figures 1, 2, and 3 in Rafferty, J. B., et al., 1996. Crystal structure of DNA recombination protein RuvA and a model for its binding to the Holliday junction. Science 274:415- 421)


The Holliday junction is then processed into recombination products by RuvA, RuvB, and RuvC. Specifically, RuvA (203 amino acids) and RuvB (336 amino acids) work together as a Holliday junction-specific helicase complex that dissociates the RecA filament and catalyzes branch migration. An RuvA tetramer (Figure 29.16a) fits precisely within the junction point (Figure 29.16b), which has a square-planar geometry, and this RuvA tetramer targets the assembly of RuvB around opposite arms of the DNA junction. The RuvB protein binds to form two oppositely oriented, hexameric [(RuvB)6] ring structures encircling the dsDNAs, one on each side of the Holliday junction. Rotation of the dsDNAs by the RuvB hexameric rings pulls the dsDNAs through (RuvB)6 and unwinds the DNA strands across the "spool" of RuvA, which threads the separated single strands into newly forming hybrid (recombinant) duplexes (Figure 29.16b). The RuvA tetramer is a disklike structure, one face of which has an overall positive charge (Figure 29.16c), with the exception of four negatively charged central pins, each contributed by an RuvA monomer. These four pins fit neatly into the hole at the center of the Holliday junction. The negatively charged sugar-phosphate backbones of the four DNA duplexes of the Holliday junction are threaded along grooves in the positively charged RuvA face, with the negatively charged central pins appropriately situated to transiently separate the dsDNA molecules into their component single strands through repulsive electrostatic interactions with the phosphate backbones of the DNA. The separated strands of each parental duplex are then channeled into grooves in the RuvA face, where they are led into hydrogen-bonding interactions with bases contributed by strands of the other parental DNA to form the two daughter hybrid duplexes flowing out from the RuvAB complex (Figure 29.16b). Figure 29.16d illustrates a model for the RuvA tetramer with the square-planar Holliday junction.
      Depending on how the strands in the Holliday junction are cleaved and resolved, patch or splice recombinant duplexes result (Figure 29.11g and h). RuvC (173 amino acids) is an endonuclease that resolves Holliday junctions into heteroduplex recombinant products (RuvC resolvase). An RuvC dimer binds at the Holliday junction and cuts pairs of DNA strands of similar polarity (Figure 29.16b); whether a patch or a splice recombinant results depends on which DNA pair is cleaved.
      Recombination is a fundamental process that is involved not only in generating genetic diversity, but is also involved in DNA repair and chromosome segregation during cell division. Hexameric ring helicases such as RuvB are DNA-driving molecular motors; similar motors act during DNA replication to propel strand separation and initiate DNA synthesis. Thus, the RuvABC system for processing Holliday junctions may represent a general paradigm for DNA manipulation in all cells.

Transposons

Figure 29.17 × The archetypal transposon has inverted nucleotide-sequence repeats at its termini, represented here as the 12-bp sequence ACGTACGTACGT (a). It acts at a target sequence (shown here as the sequence CATGC) within host DNA by creating a staggered cut (b) whose protruding single-stranded ends are then ligated to the transposon (c). The gaps at the target site are then filled in, and the filled-in strands are ligated (d). Transponson insertion thus generates direct repeats of the target site in the host DNA, and these direct repeats flank the inserted transponson.

In 1950, Barbara McClintock reported the results of her studies on an activator gene in maize (Zea mays or, as it's usually called, corn) that was recognizable principally by its ability to cause mutations in a second gene. Activator genes were thus an internal source of mutation. A most puzzling property was their ability to move relatively freely about the genome. As we have seen, scientists had labored to establish that chromosomes consisted of genes arrayed in a fixed order, so most geneticists viewed as incredible this idea of genes moving around. The recognition that McClintock so richly deserved for her explanation of this novel phenomenon had to await verification by molecular biologists. In 1983, Barbara McClintock was finally awarded the Nobel Prize in physiology or medicine. By this time, it was appreciated that many organisms, from bacteria to humans, possessed similar "jumping genes" able to move from one site to another in the genome. This mobility led to their designation as mobile elements, transposable elements, or, simply, transposons.
      Transposons are segments of DNA that are moved enzymatically from place to place in the genome (Figure 29.17). That is, their location within the DNA is unstable. Transposons range in size from several hundred bp to more than 8 kbp. Transposons contain a gene encoding an enzyme necessary for insertion into a chromosome and for the remobilization of the transposon to different locations. These movements are termed transposition events. The smallest transposons are called insertion sequences, or ISs, signifying their ability to insert apparently at random in the genome. Insertion into a new site causes mutation because it disrupts the DNA sequence at that site. Insertion occurs at sites that show little homology to the insertion sequence or transposon. Although certain transposons (such as E. coli transposon Tn 7) may undergo transposition once per cell generation, most transposition events are infrequent, taking place only once every 104 to 107 generations. Larger and more complex transposons also carry genes that are not involved in the enzymology of insertion and excision of the transposon, such as genes conferring resistance to antibiotics. Episomes, plasmids that can reversibly integrate into bacterial genomes, contain transposons.

29.4 × The Immunoglobulin Genes: Generating Protein Diversity Using Genetic Recombination

The immunoglobulin genes are a highly evolved system for maximizing protein diversity from a finite amount of genetic information. This diversity is essential for gaining immunity to the great variety of infectious organisms and foreign substances that cause disease.

The Immune Response

Only vertebrates show an immune response. If a foreign substance, called an antigen, gains entry to the bloodstream of a vertebrate, the animal responds via a protective system called the immune response. The immune response involves production of proteins capable of recognizing and destroying the antigen. This response is mounted by certain white blood cells¾the B and T cell lymphocytes and the macrophages. B cells are so named because they mature in the bone marrow; T cells mature in the thymus gland. Each of these cell types is capable of gene rearrangement as a mechanism for producing proteins essential to the immune response. Antibodies, which can recognize and bind antigens, are immunoglobulin proteins secreted from B cells. Because antigens can be almost anything, the immune response must have an incredible repertoire of structural recognition. Thus, vertebrates must have the potential to produce immunoglobulins of great diversity in order to recognize virtually any antigen.

The Immunoglobulin G Molecule

Figure 29.18 × Diagram of the organization of the IgG molecule. Two identical L chains are joined with two identical H chains. Each L chain is held to an H chain via an interchain disulfide bond. The variable regions of the four polypeptides lie at the ends of the arms of the Y-shaped molecule. These regions are responsible for the antigen recognition function of the antibody molecules. The actual antigen-binding site is constituted from hypervariable residues within the VL and VH regions. For purposes of illustration, some features are shown on only one or the other L chain or H chain, but all features are common to both chains.

Immunoglobulin G (IgG or g-globulin) is the major class of antibody molecules found circulating in the bloodstream. IgG is a very abundant protein, amounting to 12 mg per mL of serum. It is a 150-kD a2b2-type tetramer. The a or H (for heavy) chain is 50 kD; the b or L (for light) chain is 25 kD. A preparation of IgG from serum is heterogeneous in terms of the amino acid sequences represented in its L and H chains. However, the IgG L and H chains produced from any given B lymphocyte are homogeneous in amino acid sequence. L chains consist of 214 amino acid residues and are organized into two roughly equal segments, the VL and CL regions. The VL designation reflects the fact that L chains isolated from serum IgG show variations in amino acid sequence over the first 108 residues, VL symbolizing this "variable" region of the L polypeptide. The amino acid sequence for residues 109 to 214 of the L polypeptide is constant, as represented by its designation as the "constant light," or CL, region. The heavy, or H, chains consist of 446 amino acid residues. Like L chains, the amino acid sequence for the first 108 residues of H polypeptides is variable, ergo its designation as the VH region, while residues 109 to 446 are constant in amino acid sequence. This "constant heavy" region consists of three quite equivalent domains of homology designated CH1, CH2, and CH3. Each L chain has two intrachain disulfide bonds, one in the VL region and the other in the CL region. The C-terminal amino acid in L chains is cysteine, and it forms an interchain disulfide bond to a neighboring H chain. Each H chain has four intrachain disulfide bonds, one in each of the four regions. Figure 29.18 presents a diagram of IgG organization. Within the variable regions of the L and H chains, certain positions are hypervariable with regard to amino acid composition. These hypervariable residues occur at positions 24 to 34, 50 to 55, and 89 to 96 in the L chains and at positions 31 to 35, 50 to 65, 81 to 85, and 91 to 102 in the H chains. The hypervariable regions are also called complementarity-determining regions, or CDRs, because it is these regions that form the structural site that is complementary to some part of an antigen's structure, providing the basis for antibody:antigen recognition.
 

Figure 29.19 × The characteristic "collapsed b-barrel domain" known as the immunoglobulin fold. The b-barrel structures for both (a) variable and (b) constant regions are shown. (c) A schematic diagram of the 12 collapsed b-barrel domains that make up an IgG molecule. CHO indicates the carbohydrate addition site; Fab denotes one of the two antigen-binding fragments of IgG, and Fc , the proteolytic fragment consisting of the pairs of CH2 and CH3 domains.

     In the immunoglobulin genes, the arrangement of exons correlates with protein structure. In terms of its tertiary structure, the immunoglobulin G molecule is composed of 12 discrete collapsed b-barrel domains, each domain having a Greek key motif (see Figure 6.32). The characteristic structure of this domain is referred to as the immunoglobulin fold (Figure 29.19). Each of IgG's two heavy chains contributes four of these domains and each of its light chains contributes two. The four variable-region domains (one on each chain) are encoded by multiple exons, but the eight constant-region domains are each the product of a single exon. All of these constant-region exons are derived from a single ancestral exon encoding an immunoglobulin fold. The major variable-region exon probably derives from this ancestral exon also. Contemporary immunoglobulin genes are a consequence of multiple duplications of the ancestral exon.
      The discovery of variability in amino acid sequence in otherwise identical polypeptide chains was surprising and almost heretical to protein chemists. For geneticists, it presented a genuine enigma. They noted that mammals, which can make millions of different antibodies, don't have millions of different antibody genes. How can the mammalian genome encode the diversity seen in L and H chains?

The Organization of Immunoglobulin Genes

The answer to the enigma of immunoglobulin sequence diversity is found in the organization of the immunoglobulin genes. The genetic information for an immunoglobulin polypeptide chain is scattered among multiple gene segments along a chromosome in germline cells (sperm and eggs). During vertebrate development and the formation of B lymphocytes, these segments are brought together and assembled by DNA rearrangement (that is, genetic recombination) into complete genes. DNA rearrangement, or gene reorganization, provides a mechanism for generating a variety of protein isoforms from a limited number of genes. DNA rearrangement occurs in only a few genes, namely, those encoding the antigen-binding proteins of the immune response¾the immunoglobulins and the T cell receptors. The gene segments encoding the amino-terminal portion of the immunoglobulin polypeptides are also unusually susceptible to mutation events. The result is a population of B cells whose antibody-encoding genes collectively show great sequence diversity even though a given cell can make only a limited set of immunoglobulin chains. Hence, at least one cell among the B cell population will likely be capable of producing an antibody that will specifically recognize a particular antigen.

DNA Rearrangements Assemble an L-Chain Gene by Combining Three Separate Genes

Figure 29.20 × The organization of mouse immunoglobulin gene segments. The organization in germline cells is shown on the left, and the rearranged organization characteristic of mature B lymphocytes is shown to the right of the arrows. The rearranged states shown are but single examples of the many possibilities for each gene family. (Adapted from Tonewaga, S., 1983. Somatic generation of antibody diversity. Nature 302:575.)

he organization of various immunoglobulin gene segments in the mouse genome is shown in Figure 29.20. L-chain variable-region genes are assembled from two kinds of germline genes, VL and JL(J stands for joining). In mammals, there are two different families of L-chain genes, the k, or kappa, gene family and the l, or lambda, gene family; each family has V and J members. These families are on different chromosomes. In mice, 90% of the L chains are k chains; l L chains are a minor component. Mice have four functional JV k genes (and a fifth nonfunctional one); these J genes lie 2.5 to 4 kb upstream from the single Ck gene that encodes the L-chain constant region. There are at least 200 Vk genes, each with its own Lk segment for encoding the L-chain leader peptide that targets the L chain to the endoplasmic reticulum for IgG assembly and secretion. (This leader peptide is cleaved once the L chain reaches the ER lumen.) The l family of L-chain genes is organized a little differently, with only two Vl genes, each of which is followed downstream by a pair of Jl-Cl units (Figure 29.20). In different mature B lymphocyte cells, Vk and Jk genes have joined in different combinations, and along with the CV k gene, form complete LV k chains with a variety of Vk regions. However, any given B lymphocyte expresses only one Vk-Jk combination. Construction of the mature B lymphocyte L-chain gene has occurred by DNA rearrangements that combine three genes (L-Vk,l, Jk,l, Ck,l) to make one polypeptide!

DNA Rearrangements Assemble an H-Chain Gene by Combining Four Separate Genes

The first 98 amino acids of the 108-residue, H-chain variable region are encoded by a VH gene. Each VH gene has an accompanying LH gene that encodes its essential leader peptide. It is estimated that there are from 200 to 1000 VH genes and they can be subdivided into eight distinct families based on nucleotide sequence homology. The members of a particular VH family are grouped together on the chromosome, separated from one another by 10 to 20 bp. In assembling a mature H-chain gene, a VH gene is joined to a D gene (D for diversity), which encodes amino acids 99 to 113 of the H chain. These amino acids comprise the core of the third CDR in the variable region of H chains. The VH-D gene assemblage is linked in turn to a JH gene, which encodes the remaining part of the variable region of the H chain. The VH, D, and JH genes are grouped in three separate clusters on the same chromosome. The four JH genes lie 7 kb upstream of the eight C genes, the closest of which is Cm. Any of four C genes may encode the constant region of IgG H chains: Cg1 Cg2a, Cg2b, and Cg3. Each C gene is composed of multiple exons (as shown in Figure 29.20 for Cm, but not the other C genes). Ten to twenty D genes are found 1 to 80 kb farther upstream. The VH genes lie even farther upstream. In B lymphocytes, the variable region of a heavy-chain gene is composed of one each of the LH-VH genes, a D gene, and a JH gene joined head to tail. Because the H-chain variable region is encoded in three genes and the joinings can occur in various combinations, the heavy chains have a greater potential for diversity than the light-chain variable regions that are assembled from just two genes (for example, Lk-Vk and Jk). In making heavy-chain genes, four genes have been brought together and reorganized by DNA rearrangement to produce a single polypeptide!

Figure 29.21 × Consensus elements are located above and below germline variable-region genes that recombine to form genes encoding immunoglobulin chains. These consensus elements are complementary and are arranged in a heptamer-nonamer, 12-bp to 23-bp spacer pattern. (Adapted fro Tonewaga,S.,1983. Somatic generation of antibody diversity. Nature 302:575)

 

The Mechanism of V-J and V-D-J Joining in Light- and Heavy-Chain Gene Assembly

Specific nucleotide sequences adjacent to the various variable-region genes suggest a mechanism in which these sequences act as joining signals. All germline V and D genes are followed by a consensus CACAGTG heptamer separated from a consensus ACAAAAACC nonamer by a short, nonconserved 23-bp spacer. Likewise, all germline D and J genes are immediately preceded by a consensus GGTTTTTGT nonamer separated from a consensus CACTGTG heptamer by a short nonconserved 12-bp spacer (Figure 29.21). Note that the consensus elements downstream of a gene are complementary to those upstream from the gene with which it recombines. Indeed, it is these complementary consensus sequences that serve as recombination recognition signals (RSSs) and determine the site of recombination between variable-region genes. Functionally meaningful recombination happens only where one has a 12-bp spacer and the other has a 23-bp spacer (Figure 29.21). Lymphoid cell-specific recombination-activating gene proteins 1 and 2 (RAG1 and RAG2) recognize and bind at these RSSs, presumably through looping out of the 12- and 23-bp spacers and alignment of the homologous heptamer and nonamer regions (Figure 29.22). RAG1 and RAG2 together function as the V(D)J recombinase. The similarity between the organization of flanking repeats in immunoglobulin genes and the reaction catalyzed by RAG1/RAG2 proteins suggests that these genes and the RAG recombinase may have evolved from an ancestral transposon.

Figure 29.22 × Model for V(D)J recombination. A RAG1:RAG2 complex is assembled on DNA in the region of recombination signal sequences (a), and this complex introduces double-stranded breaks in the DNA at the borders of protein-coding sequences and the recombination signal sequences (b). The products of RAG1:RAG2 DNA cleavage are novel: the DNA bearing the recombination signal sequences has blunt ends, whereas the coding DNA has hairpin ends. That is, the two strands of the V and J coding DNA segments are covalently joined as a result of transesterification reactions catalyzed by RAG1:RAG2. To complete the recombination process, the two RSS ends are precisely joined to make a covalently closed circular dsDNA, but the V and J coding ends undergo further processing before they are joined (c). Coding-end processing involves opening of the V and J hairpins and the addition or removal of nucleotides from the strands. This processing means that joining of the V and J coding ends is imprecise, providing an additional means for introducing antibody diversity. Finally, the V and J coding segments are then joined to create a recombinant immunoglobulin-encoding gene (d). The processing and joining reactions require RAG1:RAG2, DNA-dependent protein kinase (DNA-PK, which consists of three subunits¾Ku70, Ku80, and DNA-PKCS ), and DNA ligase. (Adapted from Figure 1 in Weaver, D. T., and Alt, F. W., 1997. From RAGs to stitches. Nature 388:428- 429.)

 

Imprecis

Figure 29.23 × Recombination between the Vk and Jk genes can vary by several nucleotides, giving rise to variations in amino acid sequence and hence diversity in immunoglobulin L chains.

 

e Joining

Joining of the ends of the immunoglobulin-coding regions during gene reorganization is somewhat imprecise. This imprecision actually leads to even greater antibody diversity because new coding arrangements result. Position 96 in k chains is typically encoded by the first triplet in the Jk element. Most k chains have one of four amino acids here, depending on which Jk gene was recruited in gene assembly. However, occasionally only the second and third bases or just the third base of the codon for position 96 is contributed by the Jk gene, with the other one or two nucleotides supplied by the Vk segment (Figure 29.23). So, the precise point where recombination occurs during gene reorganization can vary over several nucleotides, creating even more diversity.

Antibody Diversity

Taking as an example the mouse with perhaps 300 Vk genes, 4 Jk genes, 200 VH genes, 12 D genes, and 4 JH genes, the number of possible combinations is given by 300 x 4 x 200 x 12 x 4. Thus, greater than 107 different antibody molecules can be created from roughly 500 or so different mouse variable-region genes. Including the possibility for Vk-Jk joinings occurring within codons adds to this diversity, as does the high rate of somatic mutation associated with the variable-region genes. (Somatic mutations are mutations that arise in diploid cells and are transmitted to the progeny of these cells within the organism, but not to the offspring of the organism.) Clearly, gene rearrangement is a powerful mechanism for dramatically enhancing the protein-coding potential of genetic information.

29.5 × The Molecular Nature of Mutation

Genes are normally transmitted unchanged from generation to generation, owing to the great precision and fidelity with which genes are copied during chromosome duplication. However, on rare occasions, genetically heritable changes (mutations) occur that result in altered forms. Most mutated genes function less effectively than the unaltered, wild-type allele, but occasionally mutations arise that give the organism a selective advantage. When this occurs, they are propagated to many offspring. Together with recombination, mutation provides for genetic variability within species and, ultimately, the evolution of new species.
      Mutations change the sequence of bases in DNA, either by the substitution of one base pair for another (so-called point mutations), or by the insertion or deletion of one or more base pairs (insertions and deletions).

Point Mutations

Figure 29.24 × Point mutations due to base mispairings. (a) An example based on tautomeric properties. The rare imino tautomer of adenine base-pairs with cytosine rather than thymine. (1) The normal A-T base pair. (2) The A*-C base pair is possible for the adenine tautomer in which a proton has been transferred from the 6-NH2 of adenine to N-1. (3) Pairing of C with the imino tautomer of A (A*) leads to a transition mutation (A-T to G-C) appearing in the next generation. (b) A in the syn conformation pairing with G (G is in the usual anti conformation). (c) T and C form a base pair by H-bonding interactions mediated by a water molecule.

Point mutations are the class of mutations in which one base pair is substituted for another. The two possible kinds of point mutations are transitions, where one purine (or pyrimidine) is replaced by another, as in A ® G (or T ® C), and transversions, where a purine is substituted for a pyrimidine or vice versa.
      Point mutations arise by the pairing of bases with inappropriate partners, by the introduction of base analogs into DNA, or by chemical mutagens. Bases may rarely mispair (Figure 29.24), either because of their tautomeric properties (see Chapter 11), or because of other influences (such as purines flipping from anti to syn conformations, or H2O molecules serving as bridging H-bond donor/acceptors between two mispaired pyrimidines). Even in mispairing, the C1'-C1' distances between bases must still be close to that of a Watson-Crick base pair (11 nm or so¾see Figure 11.20) to maintain the mismatched base pair in the double helix. In tautomerization, for example, an amino group (-NH2), usually an H-bond donor, can tautomerize to an imino form (=NH) and become an H-bond acceptor. Or a keto group (C=O), normally an H-bond acceptor, can tautomerize to an enol C-OH, an H-bond donor. Proofreading mechanisms operating during DNA replication catch most mispairings. The frequency of spontaneous mutation in both E. coli and fruit flies (Drosophila melanogaster) is about 10-10 per base pair per replication.

Figure 29.25 × 5-Bromouracil usually favors the keto tautomer that mimics the basepairing properties of thymine, but it frequently shifts to the enol form, whereupon it can base-pair with guanine, causing a T-A to C-G transition.

 

Mutations Induced by Base Analogs

Base analogs that become incorporated into DNA can induce mutations through changes in base-pairing possibilities. Two examples are 5-bromouracil (5-BU) and 2-aminopurine (2-AP). 5-Bromouracil is a thymine analog and becomes inserted into DNA at sites normally occupied by T; its 5-Br group sterically resembles thymine's 5-methyl group. However, because 5-BU frequently assumes the enol tautomeric form and pairs with G instead of A, a point mutation of the transition type may be induced (Figure 29.25). Less often, 5-BU is inserted into DNA at cytosine sites, not T sites. Then, if it base-pairs in its keto form, mimicking T, a C-G to T-A transition

Figure 29.26 × (a) 2-Aminopurine normally base-pairs with T, but (b) may also pair with cytosine through a single hydrogen bond.

 

ensues. The adenine analog, 2-aminopurine (recall adenine is 6-aminopurine) normally behaves like A and base-pairs with T. However, 2-AP can form a single H bond of sufficient stability with cytosine (Figure 29.26) that occasionally C replaces T in DNA replicating in the presence of 2-AP. Hypoxanthine (Figure 29.27) is an adenine analog that arises in situ in DNA through oxidative deamination of A. Hypoxanthine base-pairs with cytosine, creating an A-T to G-C transition.

Figure 29.27 × Oxidative deamination of adenine in DNA yields hypoxanthine, which base-pairs with cytosine, resulting in an A-T to G-C transition.

 

Chemical Mutagens

Figure 29.28 × Chemical mutagens. (a) HNO2 (nitrous acid) converts cytosine to uracil and adenine to hypoxanthine. (b) Nitrosoamines, organic compounds that react to form nitrous acid, also lead to the oxidative deamination of A and C. (c) Hydroxylamine (NH2OH) reacts with cytosine, converting it to a derivative that base-pairs with adenine instead of guanine. The result is a C-G to T-A transition. (d) Alkylation of G residues to give O6-methylguanine, which base-pairs with T. (e) Alkylating agents include nitrosoamines, nitrosoguanidines, nitrosoureas, alkyl sulfates, and nitrogen mustards. Note that nitrosoamines are mutagenic in two ways: they can react to yield HNO2 or they can act as alkylating agents. The nitrosoguanidine, N-methyl-N'-nitro-N-nitrosoguanidine, is a very potent mutagen used in laboratories to induce mutations in experimental organisms such as Drosophila melanogaster. Ethylmethane sulfate (EMS) and dimethyl sulfate are also favorite mutagens among geneticists.

Chemical mutagens are agents that chemically modify bases so that their base-pairing characteristics are altered. For instance, nitrous acid (HNO2) causes the oxidative deamination of primary amine groups, found in adenine and cytosine. Oxidative deamination of cytosine yields uracil, which base-pairs the way T does and gives a C-G to T-A transition (Figure 29.28a). Hydroxylamine specifically causes C-G to T-A transitions because it reacts specifically with cytosine, converting it to a derivative that base-pairs with adenine instead of guanine (Figure 29.28c). Alkylating agents are also chemical mutagens. Alkylation of reactive sites on the bases with methyl or ethyl groups alters their H-bonding and hence base pairing. For example, methylation of O6 on guanine (giving O6-methylguanine) causes this G to mispair with thymine, resulting in a G-C to A-T transition (Figure 29.28d). Alkylating agents can also induce point mutations of the transversion type. Alkylation of N7 of guanine labilizes its N-glycosidic bond, which leads to elimination of the purine ring, creating a gap in the base sequence. An enzyme, apurinic acid endonuclease, then cleaves the sugar-phosphate backbone of the DNA on the 5'-side, and the gap can be repaired by enzymatic removal of the 5'-sugar phosphate and insertion of a new nucleotide. A transversion results if a pyrimidine nucleotide is inserted in place of the purine during enzymatic repair of this gap. A number of alkylating agents are shown in Figure 29.28e.

Insertions and Deletions

The addition or removal of one or more base pairs leads to insertion or deletion mutations, respectively. Either shifts the triplet reading frame of codons, causing frameshift mutations (misincorporation of all subsequent amino acids) in the protein encoded by the gene. Such mutations can arise when flat aromatic molecules such as acridine orange (see Figure 12.16) insert themselves between successive bases in one or both strands of the double helix. This insertion or, more aptly, intercalation, doubles the distance between the bases as measured along the helix axis. This distortion of the DNA (see Figure 12.16) results in bases being inappropriately inserted or deleted when the DNA is replicated. Disruptions that arise from the insertion of a transposon within a gene also fall in this category of mutation.

Human Biochemistry

Prions: Proteins as Genetic Agents?
Prion is an acronym derived from the words "protein infectious particle." Prions are transmissible agents ("genetic material"?) that are apparently composed only of a protein that has adopted an abnormal conformation. The term prion was coined to distinguish such protein infectious particles capable of causing disease from nucleic acid-carrying agent have been unsuccessful. Prion diseases are novel in that they are genetic and infectious; their occurrence may be sporadic, dominantly inherited, or acquired by infection. Their inheritability questions the principle that nucleic acids are the sole genetic agents.
      PrP, the prion protein, comes in various forms, such as Prpc, the normal cellular prion protein, and PrPsc, the scrapie form of
PrP, a conformational variant of PrPc that is protease-resistant. These two forms are thought to differ only in terms of their secondary structure, with PrPc dominated by a-helical elements (figure, a), and PrPsc having both a-helices and b-strands (figure, b). It has been hypothesized that the presence of PrPsc can cause PrPc to adopt the PrPsc conformation. The various diseases are a consequence of the accumulation of the abnormal PrPsc form, which accumulates as amyloid plaques (amyloid = starch-like), causing vacuolarization of tissues in the central nervous system. The 1997 Nobel Prize in physiology or medicine was awarded to Stanley B. Prusiner for his discovery of prions.
(Adapted from Figure 1 in Prusiner, S. B. (1966). Molecular biology and the pathogenesis of prion diseases. Trends in Biochemical Sciences 21:482-487.)

 

29.6 × RNA as Genetic Material

Whereas the genetic material of cells is double-stranded DNA, virtually all plant viruses, several bacteriophages, and many animal viruses have genomes consisting of RNA. In most cases, this RNA is single-stranded. Viruses with single-stranded genomes use the single strand as a template for synthesis of a complementary strand, which can then serve as template in replicating the original strand. Retroviruses are an interesting group of eukaryotic viruses having single-stranded RNA genomes that replicate through a double-stranded DNA intermediate. Further, the life cycle of retroviruses includes an obligatory step in which the dsDNA is inserted into the host cell genome in a transposition event. Retroviruses are responsible for many diseases, including tumors and other disorders. HIV-1, the human immunodeficiency virus that causes AIDS, is a retrovirus. Tobacco mosaic virus (TMV), an RNA virus infecting plants, was instrumental in establishing that nucleic acids are the substance of heredity. TMV has a molecular mass of 40 x 103 kD and consists of an RNA genome (3 x 103 kD) packaged in a protein coat made of 2130 identical protein chains of 18 kD each (see Figure 1.24). In 1956, Gierer and Schramm demonstrated that the RNA itself was able to produce viral lesions on the surfaces of tobacco leaves, if the leaf surface was lightly scratched so the RNA could gain access to the cells. In 1957, Fraenkel-Conrat and Singer used two different strains of the virus, HR and TMV, and reconstructed virus particles in vitro by mixing isolated proteins and RNAs in the 4 possible combinations:

TMV protein + HR RNA

TMV protein + TMV RNA

HR protein + HR RNA

HR protein + TMV RNA

These reconstituted virus particles were infective, and, when the virus progeny obtained after their infection of host plants were examined, it was found that the protein coat borne by the progeny virus particles was determined by the source of RNA in the virus infecting the plant: TMV RNA always yielded TMV protein coats in the progeny; HR RNA yielded HR protein coats. This experiment was early proof that nucleic acids, not proteins, are the repository of genetic information.

29.7 Transgenic Animals

Figure 29.29 × Transfection can introduce new genes into animals. The rat growth hormone gene carried on a plasmid is injected into a mouse oocyte or fertilized egg that is then implanted in a receptive female mouse. Integration of the plasmid into the mouse genome can be ascertained by Southern analysis of DNA from the newborn mouse. Experssion of the foreign gene can be determined by assaying for the gene product, in this case, rat growth hormone.

An exciting new advance in gene transfer techniques is the ability to introduce genes into animals by transfection. Transfection is defined as the uptake or injection of plasmid DNA into recipient cells. Animals that have acquired new genetic information as a consequence of the introduction of foreign genes are termed transgenic. The methodology involves the injection of plasmids carrying the gene of interest into the nucleus of an oocyte or fertilized egg, followed by implantation of the egg into a receptive female. The technique has been perfected for mice (Figure 29.29). In a small number of cases, 10% or so, the mice that develop from the injected eggs carry the transfected gene integrated into a single chromosomal site. The gene is subsequently inherited by the progeny of the transfected animal as if it were a normal gene. Expression of the donor gene in the transgenic animals is variable because the gene is randomly integrated into the host genome and gene expression is often influenced by chromosomal location. Nevertheless, transfection of animals has produced some startling results, as in the case of the transfection of mice with the gene encoding the rat growth hormone (rGH). The transgenic mice grew to nearly twice the normal size (Figure 29.30). Growth hormone levels in these animals were several hundred times greater than normal. Similar results were obtained in transgenic mice transfected with the human growth hormone (hGH) gene.

 

Figure 29.30 × Photograph showing a transgenic mouse with an active rat growth hormone gene (left). This transgenic mouse is twice the size of a normal mouse (right). (Photo courtesy of Ralph L. Brinster, School of Veterinary Medicine, University of Pennsylvania.)


  The biotechnology of transfection has been extended to farm animals, and transgenic chickens, cows, pigs, rabbits, sheep, and even fish have been produced. The first animal cloned from an adult cell, a sheep named Dolly, represented a milestone in cloning technology. Subsequently, a transgene construct has been used to incorporate the human gene encoding blood coagulation Factor IX into fetal sheep fibroblast cells, and the nuclei from these cells were successfully transferred into sheep oocytes lacking nuclei. These transgenic oocytes were placed in the uterus of receptive female sheep, which subsequently gave birth to transgenic lambs. The Factor IX transgene construct was specifically designed so that Factor IX protein, a medically useful product for the treatment of hemophiliacs, would be expressed in the milk of the transgenic sheep. Similar successes in cows, which produce much more milk, has brought the potential for commercial production of virtually any protein into the realm of reality.
      Such genetic engineering is anticipated eventually to have a major impact on human health. The human genes encoding the a- and b-globin chains of hemoglobin have recently been microinjected into fertilized mouse eggs, and the transgenic mice that developed contained authentic human hemoglobin. Human Hb isolated from transgenic mouse erythrocytes had an oxygen-binding curve identical to that of human HbA, demonstrating that functional human hemoglobin can be synthesized in mice. Transgenic pigs producing human Hb are touted as a source of "human blood substitute" potentially useful during surgical procedures. Such transfection technology also holds promise as a mechanism for "gene therapy" by replacing defective genes in animals with functional genes (Chapter 13). Problems concerning integration and regulation of the transfected gene, including its appropriate expression in the right cells at the proper time during development and growth of the organism, must be brought under control before such therapy becomes commonplace.

A Deeper Look
"Knockout" Mice: A Method To Investigate the Essentiality of a Gene
Homologous recombination can be used to replace a gene with an inactivated equivalent of itself. Inactivation is accomplished by inserting a foreign gene, such as neo, a gene encoding resistance to the drug G418, within one of the exons of a copy of the gene of interest. Homologous recombination between the neo-bearing transgene and DNA in wild-type mouse embryonic stem (ES) cells replaces the target gene with the inactive transgene (figure). ES cells in which homologous recombination has occurred will be resistant to G418, and such cells can be selected. These recombinant ES cells can then be injected into early-stage mouse embryos, where they have a chance of becoming the germline cells of the newborn mouse. If they do, within the gametes of this mouse will be an inactivated target gene. Mating between male and female mice with inactive target genes yields a generation of homozygous "knockout" mice¾mice lacking a functional copy of the targeted gene. Characterization of these "knockout" mice reveals which physiological functions the gene directs.