The Chimera of Arezzo, of Etruscan origin and probably from the 5th century b.c ., was found near Arezzo, Italy, in 1553. Chimeric animals existed only in the imagination of the ancients. But the ability to create chimeric DNA molecules is a very real technology that has opened up a whole new field of scientific investigation. (Scala /Art Resource, Chimera, Museo Archeologico, Florence, Italy)

Chapter 13

Recombinant DNA: Cloning and Creation of Chimeric Genes

In the early 1970s, technologies for the laboratory manipulation of nucleic acids emerged. In turn, these technologies led to the construction of DNA molecules composed of nucleotide sequences taken from different sources. The products of these innovations, recombinant DNA molecules,1 opened exciting new avenues of investigation in molecular biology and genetics, and a new field was born—recombinant DNA technology. Genetic engineering is the application of this technology to the manipulation of genes. These advances were made possible by methods for amplification of any particular DNA segment, regardless of source, within bacterial host cells. Or, in the language of recombinant DNA technology, the cloning of virtually any DNA sequence became feasible.

13.1 · Cloning

In classical biology, a clone is a population of identical organisms derived from a single parental organism. For example, the members of a colony of bacterial cells that arise from a single cell on a petri plate are clones. Molecular biology has borrowed the term to mean a collection of molecules or cells all identical to an original molecule or cell. So, if the original cell on the petri plate harbored a recombinant DNA molecule in the form of a plasmid, the plasmids within the millions of cells in a bacterial colony represent a clone of the original DNA molecule, and these molecules can be isolated and studied. Furthermore, if the cloned DNA molecule is a gene (or part of a gene), that is, it encodes a functional product, a new avenue to isolating and studying this product has opened. Recombinant DNA methodology offers exciting new vistas in biochemistry.

Plasmids

Plasmids are naturally occurring, circular, extrachromosomal DNA molecules (see Chapter 12). Natural strains of the common colon bacterium Escherichia coli isolated from various sources harbor diverse plasmids. Often these plasmids carry genes specifying novel metabolic activities that are advantageous to the host bacterium. These activities range from catabolism of unusual organic substances to metabolic functions that endow the host cells with resistance to antibiotics, heavy metals, or bacteriophages. Plasmids that are able to perpetuate themselves in E. coli, the bacterium favored by bacterial geneticists and molecular biologists, have become the darlings of recombinant DNA technology. Because restriction endonuclease digestion of plasmids can generate fragments with overlapping or “sticky” ends, artificial plasmids can be constructed by ligating different fragments together. Such artificial plasmids were among the earliest recombinant DNA molecules. These recombinant molecules can be autonomously replicated, and hence propagated, in suitable bacterial host cells, provided they still possess a site signaling where DNA replication can begin (a so-called origin of replication or ori sequence).

Plasmids as Cloning Vectors

The idea arose that “foreign” DNA sequences could be inserted into artificial plasmids and that these foreign sequences would be carried into E. coli and propagated as part of the plasmid. That is, these plasmids could serve as cloning vectors to carry genes. (The word vector is used here in the sense of “a vehicle or carrier.”) Plasmids useful as cloning vectors possess three common features: a replicator , a selectable marker, and a cloning site (Figure 13.1). A replicator is an origin of replication, or ori. The selectable marker is typically a gene conferring resistance to an antibiotic. Only those cells containing the cloning vector will grow in the presence of the antibiotic. Therefore, growth on antibiotic-containing media “selects for” plasmid-containing cells. Typically, the cloning site is a sequence of nucleotides representing one or more restriction endonuclease cleavage sites. Cloning sites are located where the insertion of foreign DNA neither disrupts the plasmid’s ability to replicate nor inactivates essential markers.

Figure 13.1One of the first widely used cloning vectors, the plasmid pBR322. This 4363-bp plasmid contains an origin of replication (ori) and genes encoding resistance to the drugs ampicillin (amp') and tetracycline (tet'). The locations of restriction endonuclease cleavage sites are indicated.

 

Virtually Any DNA Sequence Can Be Cloned

Nuclease cleavage at a restriction site opens, or linearizes, the circular plasmid so that a foreign DNA fragment can be inserted. The ends of this linearized plasmid are joined to the ends of the fragment so that the circle is closed again, creating a recombinant plasmid (Figure 13.2).

Figure 13.2 Foreign DNA sequences can be inserted into plasmid vectors by opening the circular plasmid with a restriction endonuclease . The ends of the linearized plasmid DNA are then joined with the ends of a foreign sequence, reclosing the circle to create a chimeric plasmid.

Recombinant plasmids are hybrid DNA molecules consisting of plasmid DNA sequences plus inserted DNA elements (called inserts). Such hybrid molecules are also called chimeric constructs or chimeric plasmids. (The term chimera is borrowed from mythology and refers to a beast composed of the body and head of a lion, the heads of a goat and a snake, and the wings of a bat.) The presence of foreign DNA sequences does not adversely affect replication of the plasmid, so chimeric plasmids can be propagated in bacteria just like the original plasmid. Bacteria often harbor several hundred copies of common cloning vectors per cell. Hence, large amounts of a cloned DNA sequence can be recovered from bacterial cultures. The enormous power of recombinant DNA technology stems in part from the fact that virtually any DNA sequence can be selectively cloned and amplified in this manner. DNA sequences that are difficult to clone include inverted repeats, origins of replication, centromeres , and telomeres. The only practical limitation is the size of the foreign DNA segment: most plasmids with inserts larger than about 10 kbp are not replicated efficiently.
            Bacterial cells may harbor one or many copies of a particular plasmid, depending on the nature of the plasmid replicator. That is, plasmids are classified as high copy number or low copy number. The copy number of most genetically engineered plasmids is high (200 or so), but some are lower.

Construction of Chimeric Plasmids

Creation of chimeric plasmids requires joining the ends of the foreign DNA insert to the ends of a linearized plasmid (Figure 13.2). This ligation is facilitated if the ends of the plasmid and the insert have complementary, single-stranded overhangs. Then these ends can base-pair with one another, annealing the two molecules together. One way to generate such ends is to cleave the DNA with restriction enzymes that make staggered cuts; many such restriction endonucleases are available (see Table 11.5). For example, if the sequence to be inserted is an EcoRI fragment and the plasmid is cut with EcoRI , the single-stranded sticky ends of the two DNAs can anneal (Figure 13.3).

 

Figure 13.3 Restriction endonuclease EcoRI cleaves double-stranded DNA. The recognition site for EcoRI is the hexameric sequence GAATTC:

5' .  . . NpNpNpNpGpApApTpTpCpNpNpNpNp . . . 3'

3' .  . . NpNpNpNpCpTpTpApApGpNpNpNpNp . . . 5'

Cleavage occurs at the G residue on each strand so that the DNA is cut in a staggered fashion, leaving 5'-overhanging single-stranded ends (sticky ends):

5' .  . . NpNpNpNpG     pApApTpTpCpNpNpNpNp . . . 3'

3' .  . . NpNpNpNpCpTpTpApAp     GpNpNpNpNp . . . 5'

An EcoRI restriction fragment of foreign DNA can be inserted into a plasmid having an EcoRI cloning site by (a) cutting the plasmid at this site with EcoRI , annealing the linearized plasmid with the EcoRI foreign DNA fragment, and (b) sealing the nicks with DNA ligase .

 

The interruptions in the sugar-phosphate backbone of DNA can then be sealed with DNA ligase to yield a covalently closed, circular chimeric plasmid. DNA ligase is an enzyme that covalently links adjacent 3'-OH and 5'-PO4 groups. An inconvenience of this strategy is that any pair of EcoRI sticky ends can anneal with each other. So, plasmid molecules can reanneal with themselves, as can the foreign DNA restriction fragments. These DNAs can be eliminated by selection schemes designed to identify only those bacteria containing chimeric plasmids.
            Blunt-end ligation is an alternative method for joining different DNAs . This method depends on the ability of phage T4 DNA ligase to covalently join the ends of any two DNA

 

Figure 13.4 Blunt-end ligation using phage T4 DNA ligase , which catalyzes the ATP-dependent ligation of DNA molecules. AMP and PPi are by-products.

 

molecules (even those lacking 3'- or 5'-overhangs) (Figure 13.4). Some restriction endonucleases cut DNA so that blunt ends are formed (see Table 11.5). Because there is no control over which pair of DNAs are blunt-end ligated by T4 DNA ligase, strategies to identify the desired products must be applied.
            A great number of variations on these basic themes have emerged. For example, short synthetic DNA duplexes whose nucleotide sequence consists of little more than a restriction site can be blunt-end ligated onto any DNA. These short DNAs are known as linkers. Cleavage of the ligated DNA with the restriction enzyme then leaves tailor-made sticky ends useful in cloning reactions (Figure 13.5). Similarly, many vectors contain a polylinker cloning site, a short region of DNA sequence bearing numerous restriction sites.

Figure 13.5 (a) The use of linkers to create tailor-made ends on cloning fragments. Synthetic oligonucleotide duplexes whose sequences represent EcoRI restriction sites are blunt-end ligated to a DNA molecule using T4 DNA ligase . Note that the ligation reaction can add multiple linkers on each end of the blunt-ended DNA. EcoRI digestion removes all but the terminal one, leaving the desired 5'-overhangs. (b) Cloning vectors often have polylinkers consisting of a multiple array of restriction sites at their cloning sites, so restriction fragments generated by a variety of endonucleases can be incorporated into the vector. Note that the polylinker is engineered not only to have multiple restriction sites but also to have an uninterrupted sequence of codons , so this region of the vector has the potential for translation into protein. The sequence shown is the cloning site for the vectors M13mp7 and pUC7; the colored amino acid residues are contiguous with the coding sequence of the lacZ gene carried by this vector (see Figure 13.18). (a , Adapted from Figure 3.16.3; b, adapted from Figure 1.14.2, in Ausubel , F. M., et al., 1987, Current Protocols in Molecular Biology. New York : John Wiley & Sons.)

 

Promoters and Directional Cloning

Note that the strategies discussed thus far create hybrids in which the orientation of the DNA insert within the chimera is random. Sometimes it is desirable to insert the DNA in a particular orientation. For example, an experimenter might wish to insert a particular DNA (a gene) in a vector so that its gene product is synthesized. To do this, the DNA must be placed downstream from a promoter. A promoter is a nucleotide sequence lying upstream of a gene that controls expression of the gene. RNA polymerase molecules bind specifically at promoters and initiate transcription of adjacent genes, copying template DNA into RNA products. One way to insert DNA so that it will be properly oriented with respect to the promoter is to create DNA molecules whose ends have different overhangs. Ligation of such molecules into the plasmid vector can only take place in one orientation, to give directional cloning (Figure 13.6).

Figure 13.6 Directional cloning. DNA molecules whose ends have different overhangs can be used to form chimeric constructs in which the foreign DNA can enter the plasmid in only one orientation. The foreign DNA is digested with two different restriction enzymes (HindIII and BamHI ), and the plasmid is digested with the same two enzymes. Note that pUC19 has a polylinker or universal cloning site (see Figure 13.5b); pUC stands for universal cloning plasmid.

 

Biologically Functional Chimeric Plasmids

The first biologically functional chimeric DNA molecules constructed in vitro were assembled from parts of different plasmids in 1973 by Stanley Cohen, Annie Chang, Herbert Boyer, and Robert Helling. These plasmids were used to transform recipient E. coli cells (transformation means the uptake and replication of exogenous DNA by a recipient cell; see Chapter 29). The bacterial cells were rendered somewhat permeable to DNA by Ca2+ treatment and a brief 42°C heat shock. Although less than 0.1% of the Ca2+-treated bacteria became competent for transformation following such treatment, transformed bacteria could be selected by their resistance to certain antibiotics (Figure 13.7). Consequently, the chimeric plasmids must have been biologically functional in at least two aspects: they replicated stably within their hosts and they expressed the drug resistance markers they carried.

 

 

 

Figure 13.7 A typical bacterial transformation experiment. Here the plasmid pBR322 is the cloning vector. (1) Cleavage of pBR322 with restriction enzyme BamH1, followed by (2) annealing and ligation of inserts generated by BamH1 cleavage of some foreign DNA, (3) creates a chimeric plasmid. (4) The chimeric plasmid is then used to transform Ca2+-treated heat-shocked E. coli cells, and the bacterial sample is plated on a petri plate. (5) Following incubation of the petri plate overnight at 37°C, (6) colonies of ampr bacteria are evident. (7) Replica plating of these bacteria on plates of tetracycline-containing media (8) reveals which colonies are tetr and which are tetracycline sensitive (tets). Only the tets colonies possess plasmids with foreign DNA inserts.

            In general, plasmids used as cloning vectors are engineered to be small, 2.5 kbp to about 10 kbp in size, so that the size of the insert DNA can be maximized. These plasmids have only a single origin of replication, so the time necessary for complete replication depends on the size of the plasmid. Under selective pressure in a growing culture of bacteria, overly large plasmids are prone to delete any nonessential “genes,” such as any foreign inserts. Such deletion would thwart the purpose of most cloning experiments. The useful upper limit on cloned inserts in plasmids is about 10 kbp. Many eukaryotic genes exceed this size.

Figure 13.8Electron micrograph of bacteriophage l. (Robley C. Williams, University of California/BPS)

Bacteriophage l as a Cloning Vector

The genome of bacteriophage l (lambda) (Figure 13.8) is a 48.5-kbp linear DNA molecule that is packaged into the head of the bacteriophage. The middle one-third of this genome is not essential to phage infection, so l phage DNA has been engineered so that foreign DNA molecules up to 16 kbp can be inserted into this region for cloning purposes. In vitro packaging systems are then used to package the chimeric DNA into phage heads which, when assembled with phage tails, form infective phage particles. Bacteria infected with these recombinant phage produce large numbers of phage progeny before they lyse, and large amounts of recombinant DNA can be easily purified from the lysate.

Cosmids

The DNA incorporated into phage heads by bacteriophage l packaging systems must satisfy only a few criteria. It must possess a 14-bp sequence known as cos (which stands for cohesive end site) at each of its ends, and these cos sequences must be separated by no fewer than 36 kbp and no more than 51 kbp of DNA. Essentially any DNA satisfying these minimal requirements will be packaged and assembled into an infective phage particle. Other cloning features such as an ori, selectable markers, and a polylinker are joined to the cos sequence so that the cloned DNA can be propagated and selected in host cells. These features have been achieved by placing cos sequences on either side of cloning sites in plasmids to create cosmid vectors that are capable of carrying DNA inserts about 40 kbp in size (Figure 13.9). Because cosmids lack essential phage genes, they reproduce in host bacteria as plasmids.

Figure 13.9Cosmid vectors for cloning large DNA fragments. (a) Cosmid vectors are plasmids that carry a selectable marker such as ampr, an origin of replication (ori ), a polylinker suitable for insertion of foreign DNA, and (b) a cos sequence. Both the plasmid and the foreign DNA to be cloned are cut with a restriction enzyme, and the two DNAs are then ligated together. (c) The ligation reaction leads to the formation of hybrid concatamers, molecules in which plasmid sequences and foreign DNAs are linked in series in no particular order. The bacteriophage l packaging extract contains the restriction enzyme that recognizes cos sequences and cleaves at these sites. (d) DNA molecules of the proper size (36 to 51 kbp ) are packaged into phage heads, forming infective phage particles. (e) The cos sequence is

5'-TACGGGGCGGCGACCTCGCG-3'

3'-ATGCCCCGCCGCTGGAGCGC-5'

Endonuclease cleavage at the sites indicated by arrows leaves 12-bp cohesive ends. (a-d , Adapted from Figure 1.10.7 in Ausubel , F. M., et al., eds., 1987. Current Protocols in Molecular Biology. New York : John Wiley & Sons; e, from Figure 4 in Murialdo , H., 1991. Annual Review of Biochemistry 60:136.)

Shuttle Vectors

Shuttle vectors are plasmids capable of propagating and transferring (“shuttling”) genes between two different organisms, one of which is typically a prokaryote (E. coli) and the other a eukaryote (for example, yeast). Shuttle vectors must have unique origins of replication for each cell type as well as different markers for selection of transformed host cells harboring the vector (Figure 13.10). Shuttle vectors have the advantage that eukaryotic genes can be cloned in bacterial hosts, yet the expression of these genes can be analyzed in appropriate eukaryotic backgrounds.

Figure 13.10A typical shuttle vector. This vector has both yeast and bacterial origins of replication, ampr ( ampicillin resistance gene for selection in E. coli) and LEU2+, a gene in the yeast pathway for leucine biosynthesis. The recipient yeast cells are LEU2- (defective in this gene) and thus require leucine for growth. LEU2- yeast cells transformed with this shuttle vector can be selected on medium lacking any leucine supplement. (Adapted from Figure 19-5 in Watson J. D., et al., 1987. The Molecular Biology of the Gene. Menlo Park , CA : Benjamin-Cummings.)

 

Artificial Chromosomes

DNA molecules 2 megabase pairs in length have been successfully propagated in yeast by creating yeast artificial chromosomes or YACs. Further, such YACs have been transferred into transgenic mice for the analysis of large genes or multigenic DNA sequences in vivo, that is, within the living animal. For these large DNAs to be replicated in the yeast cell, YAC constructs must include not only an origin of replication (known in yeast terminology as an autonomously replicating sequence or ARS) but also a centromere and telomeres. Recall that centromeres provide the site for attachment of the chromosome to the spindle during mitosis and meiosis, and telomeres are nucleotide sequences defining the ends of chromosomes. Telomeres are essential for proper replication of the chromosome.

13.2 · DNA Libraries

A DNA library is a set of cloned fragments that collectively represent the genes of a particular organism. Particular genes can be isolated from DNA libraries, much as books can be obtained from conventional libraries. The secret is knowing where and how to look.

Genomic Libraries

Any particular gene constitutes only a small part of an organism’s genome. For example, if the organism is a mammal whose entire genome encompasses some 106 kbp and the gene is 10 kbp , then the gene represents only 0.001% of the total nuclear DNA. It is impractical to attempt to recover such rare sequences directly from isolated nuclear DNA because of the overwhelming amount of extraneous DNA sequences. Instead, a genomic library is prepared by isolating total DNA from the organism, digesting it into fragments of suitable size, and cloning the fragments into an appropriate vector. This approach is called shotgun cloning because the strategy has no way of targeting a particular gene but instead seeks to clone all the genes of the organism at one time. The intent is that at least one recombinant clone will contain at least part of the gene of interest. Usually, the isolated DNA is only partially digested by the chosen restriction endonuclease so that not every restriction site is cleaved in every DNA molecule. Then, even if the gene of interest contains a susceptible restriction site, some intact genes might still be found in the digest. Genomic libraries have been prepared from hundreds of different species.
            Many clones must be created to be confident that the genomic library contains the gene of interest. The probability, P, that some number of clones, N, contains a particular fragment representing a fraction, ¦, of the genome is

P = 1 - (1 - ¦ )N

Thus,

N = ln (1 - P)/ln (1 - ¦)

For example, if the library consists of 10-kbp fragments of the E. coli genome (4640 kbp total), over 2000 individual clones must be screened to have a 99% probability (P = 50.99) of finding a particular fragment. Since ¦ = 10/4640 = 0.0022 and P = 0.99, N = 2093. For a 99% probability of finding a particular sequence within the 3 x 106 kbp human genome, N would equal almost 1.4 million if the cloned fragments averaged 10 kbp in size. The need for cloning vectors capable of carrying very large DNA inserts becomes obvious from these numbers.

Critical Developments in Biochemistry
Combinatorial Libraries

Specific recognition and binding of other molecules is a defining characteristic of any protein or nucleic acid. Often, target ligands of a particular protein are unknown, or, in other instances, a unique ligand for a known protein may be sought in the hope of blocking the activity of the protein or otherwise perturbing its function. Combinatorial libraries are the products of emerging strategies to facilitate the identification and characterization of possible ligands for a protein. These strategies are also applicable to the study of nucleic acids. Unlike genomic libraries, combinatorial libraries consist of synthetic oligomers . Arrays of synthetic oligonucleotides printed as tiny dots on miniature solid supports are known as DNA chips. Specifically, combinatorial libraries contain very large numbers of chemically synthesized molecules (such as peptides or oligonucleotides ) with randomized sequences or structures. Such libraries are designed and constructed with the hope that one molecule among a vast number will be recognized as a ligand by the protein (or nucleic acid) of interest. If so, perhaps that molecule will be useful in a pharmaceutical application, for instance as a drug to treat a disease involving the protein to which it binds.
An example of this strategy is the preparation of a synthetic combinatorial library of hexapeptides . The maximum number of sequence combinations for hexapeptides is 206 or 64,000,000. One approach to simplify preparation and screening possibilities for such a library is to specify the first two
amino acids in

the hexapeptide while the next four are randomly chosen. In this approach, 400 libraries (202) are synthesized, each of which is unique in terms of the amino acids at positions 1 and 2 but random at the other four positions (as in AAXXXX, ACXXXX, ADXXXX, etc.) so that each of the 400 libraries contains 204 or 160,000 different sequence combinations. Screening these libraries with the protein of interest reveals which of the 400 libraries contains a ligand with high affinity. This library is then systematically expanded by specifying the first 3 amino acids (knowing from the chosen 1-of-400 libraries which amino acids are best as the first 2); only 20 synthetic libraries (each containing 203 or 8000 hexapeptides ) are made here (one for each third-position possibility, the remaining three positions being randomized). Selection for ligand binding, again with the protein of interest, reveals the best of these 20, and this particular library is then varied systematically at the fourth position, creating 20 more libraries (each containing 202 or 400 hexapeptides ). This cycle of synthesis, screening, and selection is repeated until all six positions in the hexapeptide are optimized to create the best ligand for the protein. A variation on this basic strategy using synthetic oligonucleotides rather than peptides identified a unique 15-mer (sequence GGTTGGTGTGGTTGG) with high affinity (KD = 2.7 nM ) toward thrombin, a serine protease in the blood coagulation pathway. Thrombin is a major target for the pharmacological prevention of clot formation in coronary thrombosis.
(From Cortese , R., 1996. Combinatorial Libraries: Synthesis, Screening and Application Potential. Berlin : Walter de Gruyter .)

Screening Libraries

A common method of screening plasmid-based genomic libraries is to carry out a colony hybridization experiment. The protocol is similar for phage-based libraries except that bacteriophage plaques, not bacterial colonies, are screened. In a typical experiment, host bacteria containing either a plasmid-based or bacteriophage-based library are plated out on a petri dish and allowed to grow overnight to form colonies (or in the case of phage libraries, plaques) (Figure 13.11).

Figure 13.11Screening a genomic library by colony hybridization (or plaque hybridization). Host bacteria transformed with a plasmid-based genomic library or infected with a bacteriophage -based genomic library are plated on a petri plate and incubated overnight to allow bacterial colonies (or phage plaques) to form. A replica of the bacterial colonies (or plaques) is then obtained by overlaying the plate with a nitrocellulose disc (1). Nitrocellulose strongly binds nucleic acids; single-stranded nucleic acids are bound more tightly than double-stranded nucleic acids. (Nylon membranes with similar nucleic acid- and protein-binding properties are also used.) Once the nitrocellulose disc has taken up an impression of the bacterial colonies (or plaques), it is removed and the petri plate is set aside and saved. The disc is treated with 2 M NaOH , neutralized, and dried (2). NaOH both lyses any bacteria (or phage particles) and dissociates the DNA strands. When the disc is dried, the DNA strands become immobilized on the filter. The dried disc is placed in a sealable plastic bag, and a solution containing heat-denatured (single-stranded), labeled probe is added (3). The bag is incubated to allow annealing of the probe DNA to any target DNA sequences that might be present on the nitrocellulose. The filter is then washed, dried, and placed on a piece of X-ray film to obtain an autoradio-gram (4). The position of any spots on the X-ray film reveals where the labeled probe has hybridized with target DNA (5). The location of these spots can be used to recover the genomic clone from the bacteria (or plaques) on the original petri plate.

A replica of the bacterial colonies (or plaques) is then obtained by overlaying the plate with a nitrocellulose disc. The disc is removed, treated with alkali to dissociate bound DNA duplexes into single-stranded DNA, dried, and placed in a sealed bag with labeled probe (see the box on Southern blotting). If the probe DNA is duplex DNA, it must be denatured by heating at 70°C. The probe and target DNA complementary sequences must be in a single- stranded form if they are to hybridize with one another. Any DNA sequences complementary to probe DNA will be revealed by autoradiography of the nitrocellulose disc. Bacterial colonies (phage plaques) containing clones bearing target DNA are identified on the film and can be recovered from the master plate.

Probes for Southern Hybridization

Clearly, specific probes are essential reagents if the goal is to identify a particular gene against a background of innumerable DNA sequences. Usually, the probes that are used to screen libraries are nucleotide sequences that are complementary to some part of the target gene. To make useful probes requires some information about the gene’s nucleotide sequence. Sometimes such information is available. Alternatively, if the amino acid sequence of the protein encoded by the gene is known, it is possible to work backward through the genetic code to the DNA sequence (Figure 13.12).

Figure 13.12 Cloning genes using oligonucleotide probes designed from a known amino acid sequence. A radioactively labeled set of DNA (degenerate) oligonucleotides representing all possible mRNA coding sequences is synthesized. (In this case, there are 25, or 32.) The complete mixture is used to probe the genomic library by colony hybridization (see Figure 13.11). (Adapted from Figure 19-18 in Watson, J. D., et al., 1987. Molecular Biology of the Gene. Menlo Park CA : Benjamin-Cummings.)

 

Because the genetic code is degenerate (that is, several codons may specify the same amino acid; see Chapter 32), probes designed by this approach are usually degenerate oligonucleotides about 17 to 50 residues long (such oligonucleotides are so-called 17- to 50-mers). The oligonucleotides are synthesized so that different bases are incorporated at sites where degeneracies occur in the codons . The final preparation thus consists of a mixture of equal-length oligonucleotides whose sequences vary to accommodate the degeneracies . Presumably, one oligonucleotide sequence in the mixture will hybridize with the target gene. These oligonucleotide probes are at least 17-mers because shorter degenerate oligonucleotides might hybridize with sequences unrelated to the target sequence.
            A piece of DNA from the corresponding gene in a related organism can also be used as a probe in screening a library for a particular gene. Such probes are termed heterologous probes because they are not derived from the homologous (same) organism.
            Problems arise if a complete eukaryotic gene is the cloning target; eukaryotic genes can be tens or even hundreds of kilobase pairs in size. Genes this size are fragmented in most cloning procedures. Thus, the DNA identified by the probe may represent a clone that carries only part of the desired gene. However, most cloning strategies are based on a partial digestion of the genomic DNA, a technique that generates an overlapping set of genomic fragments. This being so, DNA segments from the ends of the identified clone can now be used to probe the library for clones carrying DNA sequences that flanked the original isolate in the genome. Repeating this process ultimately yields the complete gene among a subset of overlapping clones.

cDNA Libraries

cDNAs are DNA molecules copied from mRNA templates. cDNA libraries are constructed by synthesizing cDNA from purified cellular mRNA. These libraries present an alternative strategy for gene isolation, especially eukaryotic genes. Because most eukaryotic mRNAs carry 3'-poly(A) tails, mRNA can be selectively isolated from preparations of total cellular RNA by oligo (dT)-cellulose chromatography (Figure 13.13).

Figure 13.13 Isolation of eukaryotic mRNA via oligo (dT )-cellulose chromatography. (a) In the presence of 0.5 M NaCl , the poly( A) tails of eukaryotic mRNA anneal with short oligo (dT ) chains covalently attached to an insoluble chromatographic matrix such as cellulose. Other RNAs , such as rRNA (green), pass right through the chromatography column. (b) The column is washed with more 0.5 M NaCl to remove residual contaminants. (c) Then the poly( A) mRNA is recovered by washing the column with water because the base pairs formed between the poly(A) tails of the mRNA and the oligo (dT ) chains are unstable in solutions of low ionic strength.

 

DNA copies of the purified mRNAs are synthesized by first annealing short oligo
(dT) chains to the poly(A) tails. These oligo (dT) chains serve as primers for reverse transcriptase-driven synthesis of DNA (Figure 13.14). (Random oligonucleotides can also be used as primers, with the advantages being less dependency on poly (A) tracts and increased likelihood of creating clones representing the 5'-ends of mRNAs.) Reverse transcriptase is an enzyme that synthesizes a DNA strand, copying RNA as the template. DNA polymerase is then used to copy the DNA strand and form a double-stranded (duplex DNA) molecule. Linkers are then added to the DNA duplexes rendered from the mRNA templates, and the cDNA is cloned into a suitable vector. Once a cDNA derived from a particular gene has been identified, the cDNA becomes an effective probe for screening genomic libraries for isolation of the gene itself.

Figure 13.14 Reverse transcriptase-driven synthesis of cDNA from oligo ( dT ) primers annealed to the poly(A) tails of purified eukaryotic mRNA. (a) Oligo ( dT ) chains serve as primers for synthesis of a DNA copy of the mRNA by reverse transcriptase. Following completion of first-strand cDNA synthesis by reverse transcriptase, RNase H and DNA polymerase are added (b). RNase H specifically digests RNA strands in DNA; RNA hybrid duplexes. DNA polymerase copies the first-strand cDNA , using as primers the residual RNA segments after RNase H has created nicks and gaps (c). DNA polymerase has a 5' ® 3' exonuclease activity that removes the residual RNA as it fills in with DNA. The nicks remaining in the second-strand DNA are sealed by DNA ligase (d), yielding duplex cDNA . EcoRI adapters with 5'-overhangs are then ligated onto the cDNA duplexes (e) using phage T4 DNA ligase to create EcoRI -ended cDNA for insertion into a cloning vector.

            Because different cell types in eukaryotic organisms express selected subsets of genes, RNA preparations from cells or tissues in which genes of interest are selectively transcribed are enriched for the desired mRNAs. cDNA libraries prepared from such mRNA are representative of the pattern and extent of gene expression that uniquely define particular kinds of differentiated cells. cDNA libraries of many normal and diseased human cell types are commercially available, including cDNA libraries of many tumor cells. Comparison of normal and abnormal cDNA libraries, in conjunction with two-dimensional gel electrophoretic analysis (see Appendix to Chapter 5) of the proteins produced in normal and abnormal cells, is a promising new strategy in clinical medicine to understand disease mechanisms.

 

 

Critical Developments in Biochemistry

Identifying Specific DNA Sequences by Southern Blotting

(Southern Hybridization)

Any given DNA fragment is unique solely by virtue of its specific nucleotide sequence. The only practical way to find one particular DNA segment among a vast population of different DNA fragments (such as you might find in genomic DNA preparations) is to exploit its sequence specificity to identify it. In 1975, E.M. Southern invented a technique capable of doing just that.

Electrophoresis
Southern first fractionated a population of DNA fragments according to size by gel electrophoresis (see step 2 in figure). The electrophoretic mobility of a nucleic acid is inversely proportional to its molecular mass. Polyacrylamide gels are suitable for separation of nucleic acids of 25 to 2000 bp . Agarose gels are better if the DNA fragments range up to 10 times this size. Most preparations of genomic DNA show a broad spectrum of sizes, from less than 1 kbp to more than 20 kbp . Typically, no discrete-size fragments are evident following electrophoresis, just a “smear” of DNA throughout the gel.

Blotting
Once the fragments have been separated by electrophoresis (step 3), the gel is soaked in a solution of NaOH. Alkali denatures duplex DNA, converting it to single-stranded DNA. After the pH of the gel is adjusted to neutrality with buffer, a sheet of nitrocellulose soaked in a concentrated salt solution is then placed over the gel (c), and salt solution is drawn through the gel in a direction perpendicular to the direction of electrophoresis (step 4). The salt solution is pulled through the gel in one of three ways: capillary action (blotting), suction (vacuum blotting), or electrophoresis ( electroblotting ). The movement of salt solution through the gel carries the DNA to the nitrocellulose sheet. Nitrocellulose binds single-stranded DNA molecules very tightly, effectively immobilizing them in place on the sheet.* Note that the distribution pattern of the electrophoretically separated DNA is maintained when the single-stranded DNA molecules bind to the nitrocellulose sheet (step 5 in figure).

Next, the nitrocellulose is dried by baking in a vacuum oven ;† baking tightly fixes the single-stranded DNAs to the nitrocellulose. Next, in the prehybridization step, the nitrocellulose sheet is incubated with a solution containing protein (serum albumin, for example) and/or a detergent such as sodium dodecyl sulfate. The protein and detergent molecules saturate any remaining binding sites for DNA on the nitrocellulose. Thus, no more DNA can bind nonspecifically to the nitrocellulose sheet.

Hybridization
To detect a particular DNA within the electrophoretic smear of countless DNA fragments, the prehybridized nitrocellulose sheet is incubated in a sealed plastic bag with a solution of specific probe molecules (step 6 in figure). A probe is usually a single-stranded DNA of defined sequence that is distinctively labeled, either with a radioactive isotope (such as 32P) or some other easily detectable tag. The nucleotide sequence of the probe is designed to be complementary to the sought-for or target DNA fragment. The single-stranded probe DNA anneals with the single-stranded target DNA bound to the nitrocellulose through specific base pairing to form a DNA duplex. This annealing, or hybridization as it is usually called, labels the target DNA, revealing its position on the nitrocellulose. For example, if the probe is 32P-labeled, its location can be detected by autoradiographic exposure of a piece of X-ray film laid over the nitrocellulose sheet (step 7 in figure).
            Southern’s procedure has been extended to the identification of specific RNA and protein molecules. In a play on Southern’s name, the identification of particular RNAs following separation by gel electrophoresis, blotting, and probe hybridization is called Northern blotting. The analogous technique for identifying protein molecules is termed Western blotting. In Western blotting, the probe of choice is usually an antibody specific for the target protein.

The Southern blotting technique involves the transfer of electrophoretically separated DNA fragments to a nitrocellulose sheet and subsequent detection of specific DNA sequences. A preparation of DNA fragments [typically a restriction digest, (1)] is separated according to size by gel electrophoresis (2). The separation pattern can be visualized by soaking the gel in ethidium bromide to stain the DNA and then illuminating the gel with UV light (3). Ethidium bromide molecules intercalated between the hydrophobic bases of DNA are fluorescent under UV light. The gel is soaked in strong alkali to denature the DNA and then neutralized in buffer. Next, the gel is placed on a sheet of nitrocellulose (or DNA-binding nylon membrane), and concentrated salt solution is passed through the gel (4) to carry the DNA fragments out of the gel where they are bound tightly to the nitrocellulose (5). Incubation of the nitrocellulose sheet with a solution of labeled, single-stranded probe DNA (6) allows the probe to hybridize with target DNA sequences complementary to it. The location of these target sequences is then revealed by an appropriate means of detection, such as autoradiography (7).

*The underlying cause of DNA binding to nitrocellulose is not clear, but probably involves a combination of hydrogen bonding, hydrophobic interactions, and salt bridges.
†Vacuum drying is essential because nitrocellulose reacts violently with O2 if heated. For this reason, nylon-based membranes are preferable to nitrocellulose membranes.

 

Human Biochemistry

The Human Genome Project

The Human Genome Project is a collaborative international, government- and private-sponsored effort to map and sequence the entire human genome, some 3 billion base pairs distributed among the two sex chromosomes (X and Y) and 22 autosomes (chromosomes that are not sex chromosomes). Initial work identified and mapped at least 3000 genetic markers (genes or other recognizable loci on the DNA), evenly distributed throughout the chromosomes at roughly 100-kb intervals. At the same time, determination of the entire nucleotide sequence of the human genome began. The target date for completion is 2005. An ancillary part of the project is sequencing the genomes of other species (such as yeast, Drosophila melanogaster [the fruit fly], mice, and Arabidopsis thaliana [a plant]) to reveal comparative aspects of genetic and sequence organization (Table 13.1). Information about whole genome sequences of organisms has created a new branch of science called functional genomics. Functional genomics addresses global issues of gene expression, such as looking at all the genes that are activated during major metabolic shifts (as from growth under aerobic to growth under anaerobic conditions) or during embryogenesis and development of organisms. Functional genomics also provides new insights into evolutionary relationships between organisms.
            The Human Genome Project is also vital to medicine. A number of human diseases have been traced to genetic defects, whose positions within the human genome have been identified. Among these are cystic fibrosis gene Duchenne muscular dystrophy gene* (at 2.4 megabases, the largest known gene in any organism) Huntington’s disease gene

neurofibromatosis gene

neuroblastoma gene (a form of brain cancer)

amyotrophic lateral sclerosis gene (Lou Gehrig’s disease)

fragile X-linked mental retardation gene*

as well as genes associated with the development of diabetes, breast cancer, colon cancer, and affective disorders such as schizo-phrenia and bipolar affective disorder (manic depression).

Table 13.1
Completed Genome Nucleotide Sequences
Genome          Genome Size1 (Year Completed)
Bacteriophage f X174 0.0054 (1977)
Bacteriophage l 0.048 (1982)

Marchantia2 chloroplast
  genome


0.187 (1986)

Vaccinia virus 0.192 (1990)
Cytomegalovirus (CMV) 0.229 (1991)

Marchantia 2 mitochondrial
   genome


0.187 (1992)
Variola (smallpox) virus 0.186 (1993)

Hemophilus influenzae3
  (Gram-negative bacterium)


1.830 (1995)

Mycobacterium genatalium
  (mycobacterium)

0.58 (1995)

Methanococcus jannaschii
  (archaebacterium) 


1.67 (1996)
Escherichia coli (Gram-
   negative bacterium)

4.64 (1996)
Saccharomyces cerevisiae (yeast)  12.067 (1996)
Bacillus subtilis
 
  (Gram-positive bacterium)

4.21 (1997)
Arabidopsis thaliana
  ( green plant)  

100 (?)
Caenorhabditis elegans (simple
   animal : nematode worm)

100 (1998?)
Drosophila melanogaster
  ( fruit fly)        

165 (?)
Homo sapiens (human) 2900 (2005?)

1 Genome size is given as millions of base pairs ( mb ).
2 Marchantia is a bryophyte (a nonvascular green plant).
3 The first complete sequence for the genome of a free-living organism.

*X-chromosome linked gene. As of 1992, more than 100 disease-related genes had been mapped to this chromosome.

Expression Vectors

Expression vectors are engineered so that any cloned insert can be transcribed into RNA, and, in many instances, even translated into protein. cDNA expression libraries can be constructed in specially designed vectors derived from either plasmids or bacteriophage l. Proteins encoded by the various cDNA clones within such expression libraries can be synthesized in the host cells, and if suitable assays are available to identify a particular protein, its corresponding cDNA clone can be identified and isolated. Expression vectors designed for RNA expression or protein expression, or both, are available.

RNA Expression

A vector for in vitro expression of DNA inserts as RNA transcripts can be constructed by putting a highly efficient promoter adjacent to a versatile cloning site. Figure 13.15 depicts such an expression vector. Linearized recombinant vector DNA is transcribed in vitro using SP6 RNA polymerase. Large amounts of RNA product can be obtained in this manner; if radioactive ribonucleotides are used as substrates, labeled RNA molecules useful as probes are made.

Figure 13.15  Expression vectors carrying the promoter recognized by the RNA polymerase of bacteriophage SP6 are useful for making RNA transcripts in vitro. SP6 RNA polymerase works efficiently in vitro and recognizes its specific promoter with high specificity. These vectors typically have a polylinker adjacent to the SP6 promoter. Successive rounds of transcription initiated by SP6 RNA polymerase at its promoter lead to the production of multiple RNA copies of any DNA inserted at the polylinker. Before transcription is initiated, the circular expression vector is linearized by a single cleavage at or near the end of the insert so that transcription terminates at a fixed point.

Protein Expression

Because cDNAs are DNA copies of mRNAs, cDNAs are uninterrupted copies of the exons of expressed genes. Because cDNAs lack introns, it is feasible to express these cDNA versions of eukaryotic genes in prokaryotic hosts that cannot process the complex primary transcripts of eukaryotic genes. To express a eukaryotic protein in E. coli, the eukaryotic cDNA must be cloned in an expression vector that contains regulatory signals for both transcription and translation. Accordingly, a promoter where RNA polymerase initiates transcription as well as a ribosome binding site to facilitate translation are engineered into the vector just upstream from the restriction site for inserting foreign DNA.  The AUG initiation codon that specifies the first amino acid in the protein (the translation start site) is contributed by the insert (Figure 13.16).

Figure 13.16A typical expression-cloning vector. Eukaryotic coding sequences are inserted at the restriction site just downstream from a promoter region where RNA polymerase binds and initiates transcription. Transcription proceeds through a region encoding a bacterial ribosome-binding site and into the cloned insert. The presence of the bacterial ribosome-binding site in the RNA transcript ensures that the RNA can be translated into protein by the ribosomes of the host bacteria. (Adapted from Figure 19-5 from Molecular Biology of the Gene, 4th edition. Copyright 1987 by James D. Watson. Reprinted by permission of Benjamin/Cummings Publishing Co., Inc.)

            Strong promoters have been constructed that drive the synthesis of foreign proteins to levels equal to 30% or more of total E. coli cellular protein. An example is the hybrid promoter, rtac , which was created by fusing part of the promoter for the E. coli genes encoding the enzymes of lactose metabolism (the lac promoter) with part of the promoter for the genes encoding the enzymes of tryptophan biosynthesis (the trp promoter) (Figure 13.17).

Figure 13.17 A ptac protein expression vector contains the hybrid promoter ptac derived from fusion of the lac and trp promoters. Expression from ptac is more than 10 times greater than expression from either the lac or trp promoter alone. Isopropyl-b-D-thiogalactoside , or IPTG, induces expression from ptac as well as lac .

In cells carrying rtac expression vectors, the rtac promoter is not induced to drive transcription of the foreign insert until the cells are exposed to inducers that lead to its activation. Analogs of lactose (a b-galactoside ) such as isopropyl-b-thiogalactoside , or IPTG, are excellent inducers of rtac . Thus, expression of the foreign protein is easily controlled. (See Chapter 31 for detailed discussions of inducible gene expression.) The bacterial production of valuable eukaryotic proteins represents one of the most important uses of recombinant DNA technology. For example, human insulin for the clinical treatment of diabetes is now produced in bacteria.
            Analogous systems for expression of foreign genes in eukaryotic cells include vectors carrying promoter elements derived from mammalian viruses, such as simian virus 40 (SV40), the Epstein-Barr virus, and the human cytomegalovirus (CMV). A system for high-level expression of foreign genes uses insect cells infected with the baculovirus expression vector. Baculoviruses infect lepidopteran insects (butterflies and moths). In engineered baculovirus vectors, the foreign gene is cloned downstream of the promoter for polyhedrin, a major viral-encoded structural protein, and the recombinant vector is incorporated into insect cells grown in culture. Expression from the polyhedrin promoter can lead to accumulation of the foreign gene product to levels as high as 500 mg/L.

Screening cDNA Expression Libraries with Antibodies

Antibodies that specifically cross-react with a particular protein of interest are often available. If so, these antibodies can be used to screen a cDNA expression library to identify and isolate cDNA clones encoding the protein. The cDNA library is introduced into host bacteria, which are plated out and grown overnight, as in the colony hybridization scheme previously described. DNA-binding nylon membranes are placed on the plates to obtain a replica of the bacterial colonies. The nylon membrane is then incubated under conditions that induce protein synthesis from the cloned cDNA inserts, and the cells are treated to release the synthesized protein. The synthesized protein binds tightly to the nylon membrane, which can then be incubated with the specific antibody. Binding of the antibody to its target protein product reveals the position of any cDNA clones expressing the protein, and these clones can be recovered from the original plate. Like other libraries, expression libraries can be screened with oligonucleotide probes, too.

Fusion Protein Expression

Some expression vectors carry cDNA inserts cloned directly into the coding sequence of a vector-borne protein-coding gene (Figure 13.18).

Figure 13.18A typical expression vector for the synthesis of a hybrid protein. The cloning site is located at the end of the coding region for the protein b-galactosidase. Insertion of foreign DNAs at this site fuses the foreign sequence to the b-galactosidase coding region (the lacZ gene). IPTG induces the transcription of the lacZ gene from its promoter plac , causing expression of the fusion protein. (Adapted from Figure 1.5.4 in Ausubel , F. M., et al., 1987. Current Protocols in Molecular Biology. New York : John Wiley & Sons.)

 

 

Translation of the recombinant sequence leads to synthesis of a hybrid protein or fusion protein. The N-terminal region of the fused protein represents amino acid sequences encoded in the vector, whereas the remainder of the protein is encoded by the foreign insert. Keep in mind that the triplet codon sequence within the cloned insert must be in phase with codons contributed by the vector sequences to make the right protein. The N-terminal protein sequence contributed by the vector can be chosen to suit purposes. Furthermore, adding an N-terminal signal sequence that targets the hybrid protein for secretion from the cell simplifies recovery of the fusion protein. A variety of gene fusion systems have been developed to facilitate isolation of a specific protein encoded by a cloned insert. The isolation procedures are based on affinity chromatography purification of the fusion protein through exploitation of the unique ligand-binding properties of the vector-encoded protein (Table 13.2).

Table 13.2
Gene Fusion Systems for Isolation of Cloned Fusion Proteins
Gene Product
Origin
Molecular Mass
( kD )
Secreted ?1
Affinity Ligand
b- Galactosidase E. coli  116 No p- Aminophenyl-b-d-thiogalactoside
(APTG)
Protein A S. aureus 31 Yes Immunoglobulin G ( IgG )
Chloramphenicol acetyltransferase (CAT) E. coli 24 Yes

Chloramphenicol  

Streptavidin Streptomyces 13 Yes Biotin
Glutathione-S- transferase (GST) E. coli 26 No Glutathione
Maltose-binding protein (MBP) E. coli 40 Yes Starch
1 This indicates whether combined secretion-fusion gene systems have led to secretion of the protein product from the cells, which simplifies its isolation and purification.
Adapted from Uhlen , M., and Moks , T., 1990. Gene fusions for purpose of expression: An introduction. Methods in Enzymology 185:129-143.

 

b-Galactosidase and Blue or White Selection

One version of these fusion protein expression vectors places the cloning site at the end of the coding region of the protein b-galactosidase , so that among other things the fusion protein is attached to b-galactosidase and can be recovered by purifying the b-galactosidase activity. Alternatively, placing the cloning site within the b-galactosidase coding region means that cloned inserts disrupt the b-galactosidase amino acid sequence, inactivating its enzymatic activity. This property has been exploited in developing a visual screening protocol that distinguishes those clones in the library that bear inserts from those that lack them.
            Cells that have been transformed with a plasmid-based b-galactosidase expression cDNA library (or infected with a similar library constructed in a bacteriophage l-based b-galactosidase fusion vector) are plated on media containing 5-bromo-4-chloro-3-indolyl-b-d-galactopyranoside, or X-gal (Figure 13.19).

Figure 13.19 The structure of 5-bromo-4-chloro-3-indolyl-b-d-galactopyranoside, or X-gal.

 

X-gal is a chromogenic substrate, a colorless substance that upon enzymatic reaction yields a colored product. Following induction with IPTG, bacterial colonies (or plaques) harboring vectors in which the b-galactosidase gene is intact (those vectors lacking inserts) express an active b-galactosidase that cleaves X-gal, liberating 5-bromo-4-chloro-indoxyl, which dimerizes to form an indigo blue product. These blue colonies (or plaques) represent clones that lack inserts. The b-galactosidase gene is inactivated in clones with inserts, so those colonies (or plaques) that remain “white” (actually, colorless) are recombinant clones.

Reporter Gene Constructs

Potential regulatory regions of genes (such as promoters) can be investigated by placing these regulatory sequences into plasmids upstream of a gene, called a reporter gene, whose expression is easy to measure. Such chimeric plasmids are then introduced into cells of choice (including eukaryotic cells) to assess the potential function of the nucleotide sequence in regulation because expression of the reporter gene serves as a report on the effectiveness of the regulatory element. A number of different genes have been used as reporter genes, such as the lacZ gene. A reporter gene with many inherent advantages is that encoding the green fluorescent protein (or GFP), described in Chapter 4. Unlike the protein expressed by other reporter gene systems, GFP does not require any substrate to measure its activity, nor is it dependent on any cofactor or prosthetic group. Detection of GFP requires only irradiation with near UV or blue light (400-nm light is optimal), and the green fluorescence (light of 500 nm) that results is easily observed with the naked eye, although it can also be measured precisely with a fluorometer . Figure 13.20 demonstrates the use of GFP as a reporter gene.

 

Figure 13.20 Green fluorescent protein (GFP) as a reporter gene. The promoter from the per gene was placed upstream of the GFP gene in a plasmid and transformed into Drosophila (fruit flies). The per gene encodes a protein involved in establishing the circadian (daily) rhythmic activity of fruit flies. The fluorescence shown here in an isolated fly head follows a 24-hour rhythmic pattern and occurs to a lesser extent throughout the entire fly, indicating that per gene expression can occur in cells throughout the animal. Such uniformity suggests that individual cells have their own independent clocks. (Image courtesy of Jeffrey D. Plautz and Steve A. Kay, Scripps Research Institute, La Jolla , California . See also Plautz , J. D., et al., 1997. Science 278:1632-1635.)

A Deeper Look
The Two-Hybrid System to Identify Proteins Involved in Specific
Protein-Protein Interactions

Specific interactions between proteins (so-called protein-protein interactions) lie at the heart of many essential biological processes. Stanley Fields, Cheng-Ting Chien , and their collaborators have invented a method to identify specific protein-protein interactions in vivo through expression of a reporter gene whose transcription is dependent on a functional transcriptional activator, the GAL4 protein. The GAL4 protein consists of two domains: a DNA-binding (or DB) domain and a transcriptional activation (or TA) domain. Even if expressed as separate proteins, these two domains will still work, provided they can be brought together. The method depends on two separate plasmids encoding two hybrid proteins, one consisting of the GAL4 DB domain fused to protein X, and the other consisting of the GAL4 TA domain fused to protein Y (figure, part a). If proteins X and Y interact in a specific protein - protein interaction, the GAL4 DB and TA domains are brought together so that transcription of a reporter gene driven by the GAL4 promoter can take place (figure, part b). Protein X, fused to the GAL4-DNA binding domain (DB), serves as the “bait” to fish for the protein Y “target” and its fused GAL4 TA domain. This method can be used to screen cells for protein “targets” that interact specifically with a particular “bait” protein. To do so, cDNAs encoding proteins from the cells of interest are inserted into the TA-containing plasmid to create fusions of the cDNA coding sequences with the GAL4 TA domain coding sequences, so a fusion protein library is expressed. Identification of a target of the “bait” protein by this method also yields directly a cDNA version of the gene encoding the “target” protein.

13.3Polymerase Chain Reaction (PCR)

Polymerase chain reaction or PCR is a technique for dramatically amplifying the amount of a specific DNA segment. A preparation of denatured DNA containing the segment of interest serves as template for DNA polymerase, and two specific oligonucleotides serve as primers for DNA synthesis (as in Figure 13.21).

 

Figure 13.21 Polymerase chain reaction (PCR). Oligonucleotides complementary to a given DNA sequence prime the synthesis of only that sequence. Heat-stable Taq DNA polymerase survives many cycles of heating. Theoretically, the amount of the specific primed sequence is doubled in each cycle.

These primers, designed to be complementary to the two 3'-ends of the specific DNA segment to be amplified, are added in excess amounts of 1000 times or greater (Figure 13.21). They prime the DNA polymerase-catalyzed synthesis of the two complementary strands of the desired segment, effectively doubling its concentration in the solution. Then the DNA is heated to dissociate the DNA duplexes and then cooled so that primers bind to both the newly formed and the old strands. Another cycle of DNA synthesis ensues. The protocol has been automated through the invention of thermal cyclers that alternately heat the reaction mixture to 95°C to dissociate the DNA, followed by cooling, annealing of primers, and another round of DNA synthesis. The isolation of heat-stable DNA polymerases from thermophilic bacteria (such as the Taq DNA polymerase from Thermus aquaticus ) has made it unnecessary to add fresh enzyme for each round of synthesis. Because the amount of target DNA theoretically doubles each round, 25 rounds would increase its concentration about 33 million times. In practice, the increase is actually more like a million times, which is more than ample for gene isolation. Thus, starting with a tiny amount of total genomic DNA, a particular sequence can be produced in quantity in a few hours.
            PCR amplification is an effective cloning strategy if sequence information for the design of appropriate primers is available. Because DNA from a single cell can be used as a template, the technique has enormous potential for the clinical diagnosis of infectious diseases and genetic abnormalities. With PCR techniques, DNA from a single hair or sperm can be analyzed to identify particular individuals in criminal cases without ambiguity. RT-PCR, a variation on the basic PCR method, is useful when the nucleic acid to be amplified is an RNA (such as mRNA). Reverse transcriptase (RT) is used to synthesize a cDNA strand complementary to the RNA, and this cDNA serves as the template for further cycles of PCR.

In Vitro Mutagenesis

The advent of recombinant DNA technology has made it possible to clone genes, manipulate them in vitro, and express them in a variety of cell types under various conditions. The function of any protein is ultimately dependent on its amino acid sequence, which in turn can be traced to the nucleotide sequence of its gene. The introduction of purposeful changes in the nucleotide sequence of a cloned gene represents an ideal way to make specific structural changes in a protein. The effects of these changes on the protein’s function can then be studied. Such changes constitute mutations introduced in vitro into the gene. In vitro mutagenesis makes it possible to alter the nucleotide sequence of a cloned gene systematically, as opposed to the chance occurrence of mutations in natural genes.
            One efficient technique for in vitro mutagenesis is PCR-based mutagenesis. Mutant primers are added to a PCR reaction in which the gene (or segment of a gene) is undergoing amplification. The mutant primers are primers whose sequence has been specifically altered to introduce a directed change at a particular place in the nucleotide sequence of the gene being amplified (Figure 13.22). Mutant versions of the gene can then be cloned and expressed to determine any effects of the mutation on the function of the gene product.

 

Figure 13.22 One method of PCR-based site-directed mutagenesis. Template DNA strands are separated by increased temperature, and the single strands are amplified by PCR using mutagenic primers (represented as bent arrows) whose sequences introduce a single base substitution at site X (and its complementary base X'; thus the desired amino acid change in the protein encoded by the gene). Ideally, the mutagenic primers also introduce a unique restriction site into the plasmid that was not present before. Following many cycles of PCR, the DNA product can be used to transform E. coli cells. Single colonies of the transformed cells can be picked. The plasmid DNA within each colony can be isolated and screened for the presence of the mutation by screening for the presence of the unique restriction site by restriction endonuclease cleavage. For example, the nucleotide sequence GGATCT within a gene codes for amino acid residues Gly -Ser. Using mutagenic primers of nucleotide sequence AGATCT (and its complement AGATCT) changes the amino acid sequence from Gly -Ser to Arg -Ser and creates a BglII restriction site (see Table 11.5). Gene expression of the isolated mutant plasmid in E. coli allows recovery and analysis of the mutant protein.

13.4 · Recombinant DNA Technology:

An Exciting Scientific Frontier

The strategies and methodologies described in this chapter are but an overview of the repertoire of experimental approaches that have been devised by molecular biologists in order to manipulate DNA and the information inherent in it. The enormous success of recombinant DNA technology means that the molecular biologist’s task in searching genomes for genes is now akin to that of a lexicographer compiling a dictionary, a dictionary in which the “letters,” i.e., the nucleotide sequences, spell out not words, but genes and what they mean. Molecular biologists have no index or alphabetic arrangement to serve as a guide through the vast volume of information in a genome; nevertheless, this information and its organization are rapidly being disclosed by the imaginative efforts and diligence of these scientists and their growing arsenal of analytical schemes.
            Recombinant DNA technology now verges on the ability to engineer at will the genetic constitution of organisms for desired ends. The commercial production of therapeutic biomolecules in microbial cultures is already established (for example, the production of human insulin in quantity in E. coli cells). Agricultural crops with desired attributes, such as enhanced resistance to herbicides, are in cultivation. The rat growth hormone gene has been cloned and transferred into mouse embryos, creating transgenic mice that at adulthood are twice normal size (see Chapter 29). Already, transgenic versions of domestic animals such as pigs, sheep, and even fish have been developed for human benefit. Perhaps most important, in a number of instances, clinical trials have been approved for gene replacement therapy (or, more simply, gene therapy) to correct particular human genetic disorders.

Human Biochemistry
The Biochemical Defects in Cystic Fibrosis and ADA- SCID

The gene defective in cystic fibrosis codes for CFTR (cystic fibrosis transmembrane conductance regulator), a membrane protein that pumps Cl- out of cells. If this Cl- pump is defective, Cl- ions remain in cells, which then take up water from the surrounding mucus by osmosis. The mucus thickens and accumulates in various organs, including the lungs, where its presence favors infections such as pneumonia. Left untreated, children with cystic fibrosis seldom survive past the age of 5 years.
            ADA- SCID (adenosine deaminase-defective severe combined immunodeficiency) is a fatal genetic disorder caused by defects in the gene that encodes adenosine deaminase (ADA).

The consequence of ADA deficiency is accumulation of adenosine and 2'-deoxyadenosine, substances toxic to lymphocytes, important cells in the immune response. 2'-Deoxyadenosine is particularly toxic because its presence leads to accumulation of its nucleotide form, dATP , an essential substrate in DNA synthesis. Elevated levels of dATP actually block DNA replication and cell division by inhibiting synthesis of the other deoxynucleoside 5'-triphosphates (see Chapter 27). Accumulation of dATP also leads to selective depletion of cellular ATP, robbing cells of energy. Children with ADA2 SCID fail to develop normal immune responses and are susceptible to fatal infections, unless kept in protective isolation.

Human Gene Therapy

Human gene therapy seeks to repair the damage caused by a genetic deficiency through introduction of a functional version of the defective gene. To achieve this end, a cloned variant of the gene must be incorporated into the organism in such a manner that it is expressed only at the proper time and only in appropriate cell types. At this time, these conditions impose serious technical and clinical difficulties. Many gene therapies have received approval from the National Institutes of Health for trials in human patients, including the introduction of gene constructs into patients. Among these are constructs designed to cure ADA- SCID (severe combined immunodeficiency due to adenosine deaminase [ADA] deficiency), neuroblastoma , or cystic fibrosis, or to treat cancer through expression of the E1A and p53 tumor suppressor genes.
            A basic strategy in human gene therapy involves incorporation of a functional gene into target cells. The gene is typically in the form of an expression cassette consisting of a cDNA version of the gene downstream from a promoter that drives expression of the gene. A vector carrying such an expression cassette is introduced into target cells, either ex vivo via gene transfer into cultured cells in the laboratory and administration of the modified cells to the patient, or in vivo via direct incorporation of the gene into the cells of the patient. Because retroviruses can transfer their genetic information directly into the genome of host cells, retroviruses provide one route to permanent modification of host cells ex vivo. A replication-deficient version of Maloney murine leukemia virus can serve as a vector for expression cassettes up to 9 kb in size.

Figure 13.23Retrovirus-mediated gene delivery ex vivo. Retroviruses are RNA viruses that replicate their RNA genome by first making a DNA intermediate. The Maloney murine leukemia virus (MMLV) is the retrovirus used in human gene therapy. Deletion of the essential genes gag, pol, and env from MMLV makes it replication-deficient (so it can’t reproduce) (a) and creates a space for insertion of an expression cassette (b). The modified MMLV acts as a vector for the expression cassette; although replication-defective, it is still infectious. Infection of a packaging cell line that carries intact gag, pol , and env genes allows the modified MMLV to reproduce (c), and the packaged retroviral viruses can be collected and used to infect a patient (d). In the cytosol of the patient’s cells, a DNA copy of the viral RNA is synthesized by viral reverse transcriptase, which accompanies the viral RNA into the cells. This DNA is then randomly integrated into the host cell genome, where its expression leads to production of the expression cassette product. (Adapted from  Figure 1 in Crystal, R. G., 1995. Transfer of genes to humans: Early lessons and obstacles to success. Science 270:404.)

Figure 13.23 describes a strategy for retrovirus vector-mediated gene delivery. In this strategy, it is hoped that the expression cassette will become stably integrated into the DNA of the patient’s own cells and expressed to produce the desired gene product. Alternatively, adenovirus vectors that can carry expression cassettes up to 7.5 kb are a possible in vivo approach to human gene therapy (Figure 13.24).

Figure 13.24 Adenovirus-mediated gene delivery in vivo. Adenoviruses are DNA viruses. The adenovirus genome (36 kb) is divided into early genes (E1 through E4) and late genes (L1 to L5) (a). Adenovirus vectors are generated by deleting gene E1 (and sometimes E3 if more space for an expression cassette is needed) (b); deletion of E1 renders the adenovirus incapable of replication unless introduced into a complementing cell line carrying the E1 gene (c). Adenovirus progeny from the complementing cell line can be used to infect a patient. In the patient, the adenovirus vector with its expression cassette enters the cells via specific receptors (d). Its linear dsDNA ultimately gains access to the cell nucleus, where it functions extrachromosomally and expresses the product of the expression cassette (e). (Adapted from Figure 2 in Crystal, R. G., 1995. Transfer of genes to humans: Early lessons and obstacles to success. Science 270:404.)

Recombinant, replication-deficient adenoviruses enter target cells via specific receptors on the target cell surface; the transferred genetic information is expressed directly from the adenovirus recombinant DNA and is never incorporated into the host cell genome. Although many problems remain to be solved, human gene therapy as a clinical strategy is feasible.