Nearly all biological processes involve the specialized functions of one or more protein molecules. Proteins function to produce other proteins, control all aspects of cellular metabolism, regulate the movement of various molecular and ionic species across membranes, convert and store cellular energy, and carry out many other activities. Essentially all of the information required to initiate, conduct, and regulate each of these functions must be contained in the structure of the protein itself. The previous chapter described the details of primary protein structure. However, proteins do not normally exist as fully extended polypeptide chains but rather as compact, folded structures, and the function of a given protein is rarely if ever dependent only on the amino acid sequence. Instead, the ability of a particular protein to carry out its function in nature is normally determined by its overall three-dimensional shape or conformation. This native, folded structure of the protein is dictated by several factors: (1) interactions with solvent molecules (normally water), (2) the pH and ionic composition of the solvent, and most important, (3) the sequence of the protein. The first two of these effects are intuitively reasonable, but the third, the role of the amino acid sequence, may not be. In ways that are just now beginning to be understood, the primary structure facilitates the development of short-range interactions among adjacent parts of the sequence and also long-range interactions among distant parts of the sequence. Although the resulting overall structure of the complete protein molecule may at first look like a disorganized and random arrangement, it is in nearly all cases a delicate and sophisticated balance of numerous forces that combine to determine the protein’s unique conformation. This chapter considers the details of protein structure and the forces that maintain these structures.

6.1 • Forces Influencing Protein Structure

Several different kinds of noncovalent interactions are of vital importance in protein structure. Hydrogen bonds, hydrophobic interactions, electrostatic bonds, and van der Waals forces are all noncovalent in nature, yet are extremely important influences on protein conformations. The stabilization free energies afforded by each of these interactions may be highly dependent on the local environment within the protein, but certain generalizations can still be made.

Hydrogen Bonds

Hydrogen bonds are generally made wherever possible within a given protein structure. In most protein structures that have been examined to date, component atoms of the peptide backbone tend to form hydrogen bonds with one another. Furthermore, side chains capable of forming H bonds are usually located on the protein surface and form such bonds primarily with the water solvent. Although each hydrogen bond may contribute an average of only about 12 kJ/mol in stabilization energy for the protein structure, the number of H- bonds formed in the typical protein is very large. For example, in a-helices, the C=O and N-H groups of every residue participate in H bonds. The importance of H bonds in protein structure cannot be overstated.

Hydrophobic Interactions

Hydrophobic “bonds,” or, more accurately, interactions, form because nonpolar side chains of amino acids and other nonpolar solutes prefer to cluster in a nonpolar environment rather than to intercalate in a polar solvent such as water. The forming of hydrophobic bonds minimizes the interaction of nonpolar residues with water and is therefore highly favorable. Such clustering is entropically driven. The side chains of the amino acids in the interior or core of the protein structure are almost exclusively hydrophobic. Polar amino acids are almost never found in the interior of a protein, but the protein surface may consist of both polar and nonpolar residues.

Electrostatic Interactions

Ionic interactions arise either as electrostatic attractions between opposite charges or repulsions between like charges. Chapter 4 discusses the ionization behavior of amino acids. Amino acid side chains can carry positive charges, as in the case of lysine, arginine, and histidine, or negative charges, as in aspartate and glutamate. In addition, the NH2-terminal and COOH-terminal residues of a protein or peptide chain usually exist in ionized states and carry positive or negative charges, respectively. All of these may experience electrostatic interactions in a protein structure. Charged residues are normally located on the protein surface, where they may interact optimally with the water solvent. It is energetically unfavorable for an ionized residue to be located in the hydrophobic core of the protein. Electrostatic interactions between charged groups on a protein surface are often complicated by the presence of salts in the solution. For example, the ability of a positively charged lysine to attract a nearby negative glutamate may be weakened by dissolved NaCl (Figure 6.1). The Na+ and Cl- ions are highly mobile, compact units of charge, compared to the amino acid side chains, and thus compete effectively for charged sites on the protein. In this manner, electrostatic interactions among amino acid residues on protein surfaces may be damped out by high concentrations of salts. Nevertheless, these interactions are important for protein stability.

Figure 6.1   ·   An electrostatic interaction between the e-amino group of a lysine and the g-carboxyl group of a glutamate residue.

Van der Waals Interaction

Both attractive forces and repulsive forces are included in van der Waals interactions. The attractive forces are due primarily to instantaneous dipole-induced dipole interactions that arise because of fluctuations in the electron charge distributions of adjacent nonbonded atoms. Individual van der Waals interactions are weak ones (with stabilization energies of 4.0 to 1.2 kJ/mol), but many such interactions occur in a typical protein, and, by sheer force of numbers, they can represent a significant contribution to the stability of a protein. Peter Privalov and George Makhatadze have shown that, for pancreatic ribonuclease A, hen egg white lysozyme, horse heart cytochrome c, and sperm whale myoglobin, van der Waals interactions between tightly packed groups in the interior of the protein are a major contribution to protein stability.

6.2 • Role of the Amino Acid Sequence in Protein Structure

It can be inferred from the first section of this chapter that many different forces work together in a delicate balance to determine the overall three-dimensional structure of a protein. These forces operate both within the protein structure itself and between the protein and the water solvent. How, then, does nature dictate the manner of protein folding to generate the three-dimensional structure that optimizes and balances these many forces? All of the information necessary for folding the peptide chain into its “native” structure is contained in the  amino acid sequence of the peptide. This principle was first appreciated by C. B. Anfinsen and F. White, whose work in the early 1960s dealt with the chemical denaturation and subsequent renaturation of bovine pancreatic ribonuclease. Ribonuclease was first denatured with urea and mercaptoethanol, a treatment that cleaved the four covalent disulfide (S-S) cross-bridges in the protein. Subsequent air oxidation permitted random formation of disulfide cross-bridges, most of which were incorrect. Thus, the air-oxidized material showed little enzymatic activity. However, treatment of these inactive preparations with small amounts of mercaptoethanol allowed a reshuffling of the disulfide bonds and permitted formation of significant amounts of active native enzyme. In such experiments, the only road map for the protein, that is, the only “instructions” it has, are those directed by its primary structure, the linear sequence of its amino acid residues.
            Just how proteins recognize and interpret the information that is stored in the polypeptide sequence is not well understood yet. It may be assumed that certain loci along the peptide chain act as nucleation points, which initiate folding processes that eventually lead to the correct structures. Regardless of how this process operates, it must take the protein correctly to the final native structure, without getting trapped in a local energy-minimum state which, although stable, may be different from the native state itself. A long-range goal of many researchers in the protein structure field is the prediction of three-dimensional conformation from the amino acid sequence. As the details of secondary and tertiary structure are described in this chapter, the complexity and immensity of such a prediction will be more fully appreciated. This area is perhaps the greatest uncharted frontier remaining in molecular biology.

6.3 Secondary Structure in Proteins

Any discussion of protein folding and structure must begin with the peptide bond, the fundamental structural unit in all proteins. As we saw in Chapter 5, the resonance structures experienced by a peptide bond constrain the oxygen, carbon, nitrogen, and hydrogen atoms of the peptide group, as well as the adjacent a-carbons, to all lie in a plane. The resonance stabilization energy of this planar structure is approximately 88 kJ/mol, and substantial energy is required to twist the structure about the C-N bond. A twist of q degrees involves a twist energy of 88 sin2 q kJ/mol.

Consequences of the Amide Plane

The planarity of the peptide bond means that there are only two degrees of freedom per residue for the peptide chain. Rotation is allowed about the bond linking the a-carbon and the carbon of the peptide bond and also about the bond linking the nitrogen of the peptide bond and the adjacent a-carbon. As shown in Figure 6.2, each a-carbon is the joining point for two planes defined by


Figure 6.2   · The amide or peptide bond planes are joined by the tetrahedral bonds of the a -carbon. The rotation parameters are f and y . The conformation shown corresponds to f = 180° and y = 180°. Note that positive values of f and c correspond to clockwise rotation as viewed from Ca . Starting from 0°, a rotation of 180° in the clockwise direction (+180°) is equivalent to a rotation of 180° in the counterclockwise direction (-180°). (Irving Geis)

peptide bonds. The angle about the Ca —N bond is denoted by the Greek letter f (phi) and that about the Ca —— Co is denoted by y (psi). For either of these bond angles, a value of 0° corresponds to an orientation with the amide plane bisecting the H——Ca —— R (sidechain) plane and a cis configuration of the main chain around the rotating bond in question (Figure 6.3).

Figure 6.3 · Many of the possible conformations about an a -carbon between two peptide planes are forbidden because of steric crowding. Several noteworthy examples are shown here.
            Note: The formal IUPAC-IUB Commission on Biochemical Nomenclature convention for the definition of the torsion angles f and y in a polypeptide chain (Biochemistry 9:3471–3479, 1970) is different from that used here, where the C a atom serves as the point of reference for both rotations, but the result is the same. (Irving Geis)

In any case, the entire path of the peptide backbone in a protein is known if the f and y rotation angles are all specified. Some values of f and y are not allowed due to steric interference between nonbonded atoms. As shown in Figure 6.4, values of f = 180° and y = 0° are not allowed because of the forbidden overlap of the N-H hydrogens. Similarly, f = 0° and y = 180° are forbidden because of unfavorable overlap between the carbonyl oxygens.

Figure 6.4 · A Ramachandran diagram showing the sterically reasonable values of the angles f and y . The shaded regions indicate particularly favorable values of these angles. Dots in purple indicate actual angles measured for 1000 residues (excluding glycine, for which a wider range of angles is permitted) in eight proteins. The lines running across the diagram (numbered +5 through 2 and -5 through -3) signify the number of amino acid residues per turn of the helix; “+” means right-handed helices; “-” means left-handed helices.  (After Richardson, J. S., 1981, Advances in Protein Chemistry 34:167–339.)

            G. N. Ramachandran and his coworkers in Madras, India, first showed that it was convenient to plot f values against y values to show the distribution of allowed values in a protein or in a family of proteins. A typical Ramachandran plot is shown in Figure 6.4. Note the clustering of f and y values in a few regions of the plot. Most combinations of f and y are sterically forbidden, and the corresponding regions of the Ramachandran plot are sparsely populated. The combinations that are sterically allowed represent the subclasses of structure described in the remainder of this section.

The Alpha-Helix

The discussion of hydrogen bonding in Section 6.1 pointed out that the carbonyl oxygen and amide hydrogen of the peptide bond could participate in H bonds either with water molecules in the solvent or with other H-bonding groups in the peptide chain. In nearly all proteins, the carbonyl oxygens and the amide protons of many peptide bonds participate in H bonds that link one peptide group to another, as shown in Figure 6.5.

Figure 6.5 · A hydrogen bond between the amide proton and carbonyl oxygen of adjacent peptide

 

These structures tend to form in cooperative fashion and involve substantial portions of the peptide chain. Structures resulting from these interactions constitute secondary structure for proteins (see Chapter 5). When a number of hydrogen bonds form between portions of the peptide chain in this manner, two basic types of structures can result: a-helices and b-pleated sheets.            

 

A Deeper Look

Knowing What the Right Hand and Left Hand Are Doing

Certain conventions related to peptide bond angles and the “handedness” of biological structures are useful in any discussion of protein structure. To determine the f and y angles between peptide planes, viewers should imagine themselves at the Ca carbon looking outward and should imagine starting from the f = 0°, y = 0° conformation. From this perspective, positive values of f correspond to clockwise rotations about the Ca–N bond of the plane that includes the adjacent N–H group. Similarly, positive

values of y correspond to clockwise rotations about the Ca–C bond of the plane that includes the adjacent C=O group.
            Biological structures are often said to exhibit “right-hand” or “left-hand” twists. For all such structures, the sense of the twist can be ascertained by holding the structure in front of you and looking along the polymer backbone. If the twist is clockwise as one proceeds outward and through the structure, it is said to be right-handed. If the twist is counterclockwise, it is said to be left-handed.

 

Evidence for helical structures in proteins was first obtained in the 1930s in studies of fibrous proteins. However, there was little agreement at that time about the exact structure of these helices, primarily because there was also lack of agreement about interatomic distances and bond angles in peptides. In 1951, Linus Pauling, Robert Corey, and their colleagues at the California Institute of Technology summarized a large volume of crystallographic data in a set of dimensions for polypeptide chains. (A summary of data similar to what they reported is shown in Figure 5.2.) With these data in hand, Pauling, Corey, and their colleagues proposed a new model for a helical structure in proteins, which they called the a-helix . The report from Caltech was of particular interest to Max Perutz in Cambridge , England, a crystallographer who was also interested in protein structure. By taking into account a critical but previously ignored feature of the X-ray data, Perutz realized that the a-helix existed in keratin, a protein from hair, and also in several other proteins. Since then, the a-helix has proved to be a fundamentally important peptide structure. Several representations of the a-helix are shown in Figure 6.6. One turn of the helix represents 3.6 amino acid residues. (A single turn of the a-helix involves 13 atoms from the O to the H of the H bond. For this reason, the a-helix is sometimes referred to as the 3.6 13 helix.) This is in fact the feature that most confused crystallographers before the Pauling and Corey a-helix model. Crystallog-raphers were so accustomed to finding twofold, threefold, sixfold, and similar integral axes in simpler molecules that the notion of a nonintegral number of units per turn was never taken seriously before Pauling and Corey’s work.
            Each amino acid residue extends 1.5 Å (0.15 nm) along the helix axis. With 3.6 residues per turn, this amounts to 3.6 x 1.5 Å or 5.4 Å (0.54 nm) of travel along the helix axis per turn. This is referred to as the translation distance or the pitch of the helix. If one ignores side chains, the helix is about 6 Å in diameter. The side chains, extending outward from the core structure of the helix, are removed from steric interference with the polypeptide backbone. As can be seen in
 

Figure 6.6 · Four different graphic representations of the a -helix. (a) As it originally appeared in Pauling’s 1960 The Nature of the Chemical Bond. (b) Showing the arrangement of peptide planes in the helix. (c) A space-filling computer graphic presentation. (d) A “ribbon structure” with an inlaid stick figure, showing how the ribbon indicates the path of the polypeptide backbone. (Irving Geis)

Figure 6.6, each peptide carbonyl is hydrogen bonded to the peptide N—H group four residues farther up the chain. Note that all of the H bonds lie parallel to the helix axis and that all of the carbonyl groups are pointing in one direction along the helix axis while the N-H groups are pointing in the opposite direction. Recall that the entire path of the peptide backbone can be known if the f and y twist angles are specified for each residue. The a-helix is formed if the values of f are approximately -60° and the values of y are in the range of -45 to -50°. Figure 6.7 shows

Figure 6.7 · The three-dimensional structures of two proteins that contain substantial amounts of a -helix in their structures. The helices are represented by the regularly coiled sections of the ribbon drawings. Myohemery-thrin is the oxygen-carrying protein in certain invertebrates, including Sipunculids, a phylum of marine worm. (Jane Richardson)

the structures of two proteins that contain a-helical segments. The number of residues involved in a given a-helix varies from helix to helix and from protein to protein. On average, there are about 10 residues per helix. Myoglobin, one of the first proteins in which a-helices were observed, has eight stretches of a-helix that form a box to contain the heme prosthetic group. The structures of the a and b subunits of hemoglobin are strikingly similar, with only a few differences at the C- and N-termini and on the surfaces of the structure that contact or interact with the other subunits of this multisubunit protein.
            As shown in Figure 6.6, all of the hydrogen bonds point in the same direction along the a-helix axis. Each peptide bond posesses a dipole moment that arises from the polarities of the N-H and C=O groups, and, because these groups are all aligned along the helix axis, the helix itself has a substantial dipole moment, with a partial positive charge at the N-terminus and a partial negative charge at the C-terminus (Figure 6.8). Negatively charged ligands (e.g., phosphates) frequently bind to proteins near the N-terminus of an a-helix. By contrast, positively charged ligands are only rarely found to bind near the C-terminus of an a-helix.

Figure 6.8 · The arrangement of N–H and C=O groups (each with an individual dipole moment) along the helix axis creates a large net dipole for the helix. Numbers indicate fractional charges on respective atoms.

            In a typical a-helix of 12 (or n) residues, there are 8 (or n - 4) hydrogen bonds. As shown in Figure 6.9, the first 4 amide hydrogens and the last 4 carbonyl oxygens cannot participate in helix H-bonds. Also, nonpolar residues situated near the helix termini can be exposed to solvent. Proteins frequently compensate for these problems by helix capping—providing H-bond partners for the otherwise bare N-H and C=O groups and folding other parts of the protein to foster hydrophobic contacts with exposed nonpolar residues at the helix termini.

Figure 6.9 · Four N–H groups at the N-terminal end of an a-helix and four C=O groups at the C-terminal end cannot participate in hydrogen bonding. The formation of H-bonds with other nearby donor and acceptor groups is referred to as helix capping. Capping may also involve appropriate hydrophobic interactions that accomodate non-polar side chains at the ends of helical segments.             

Careful studies of the polyamino acids, polymers in which all the amino acids are identical, have shown that certain amino acids tend to occur in a-helices, whereas others are less likely to be found in them. Polyleucine and polyalanine, for example, readily form a-helical structures. In contrast, polyaspartic acid and polyglutamic acid, which are highly negatively charged at pH 7.0, form only random structures because of strong charge repulsion between the R groups along the peptide chain. At pH 1.5 to 2.5, however, where the side chains are protonated and thus uncharged, these latter species spontaneously form a-helical structures. In similar fashion, polylysine is a random coil at pH values below about 11, where repulsion of positive charges prevents helix formation. At pH 12, where polylysine is a neutral peptide chain, it readily forms an a-helix.  

 

Critical Developments in Biochemistry
In Bed with a Cold, Pauling Stumbles onto the a-Helix and a Nobel Prize1

As high technology continues to transform the modern biochemical laboratory, it is interesting to reflect on Linus Pauling’s discovery of the a -helix. It involved only a piece of paper, a pencil, scissors, and a sick Linus Pauling, who had tired of reading detective novels. The story is told in the excellent book The Eighth Day of Creation by Horace Freeland Judson:

From the spring of 1948 through the spring of 1951 . . . rivalry sputtered and blazed between Pauling’s lab and (Sir Lawrence) Bragg’s — over protein. The prize was to propose and verify in nature a general three-dimensional structure for the polypeptide chain. Pauling was working up from the simpler structures of components. In January 1948, he went to Oxford as a visiting professor for two terms, to lecture on the chemical bond and on molecular structure and biological specificity. “In Oxford , it was April, I believe, I caught cold. I went to bed, and read detective stories for a day, and got bored, and thought why don’t I have a crack at that problem of alpha keratin.” Confined, and still fingering the polypeptide chain in his mind, Pauling called for paper, pencil, and straightedge and attempted to reduce the problem to an almost Euclidean purity. “I took a sheet of paper — I still have this sheet of paper — and drew, rather roughly, the way that I thought a polypeptide chain would look if it were spread out into a plane.” The repetitious herringbone of the chain he could stretch across the paper as simply as this —

— putting in lengths and bond angles from memory. . . . He knew that the peptide bond, at the carbon-to-nitrogen link, was always rigid:

And this meant that the chain could turn corners only at the alpha carbons. . . . “I creased the paper in parallel creases through the alpha carbon atoms, so that I could bend it and make the bonds to the alpha carbons, along the chain, have tetrahedral value. And then I looked to see if I could form hydrogen bonds from one part of the chain to the next.” He saw that if he folded the strip like a chain of paper dolls into a helix, and if he got the pitch of the screw right, hydrogen bonds could be shown to form, N–H O–C, three or four knuckles apart along the backbone, holding the helix in shape. After several tries, changing the angle of the parallel creases in order to adjust the pitch of the helix, he found one where the hydrogen bonds would drop into place, connecting the turns, as straight lines of the right length. He had a model.

 

1The discovery of the a-helix structure was only one of many achievements that led to Pauling’s Nobel Prize in chemistry in 1954. The official citation for the prize was “for his research into the nature of the chemical bond and its application to the elucidation of the structure of complex substances.”

            The tendencies of the amino acids to stabilize or destabilize a-helices are different in typical proteins than in polyamino acids. The occurrence of the common amino acids in helices is summarized in Table 6.1. Notably, proline (and hydroxyproline) act as helix breakers due to their

Table 6.1

Helix-Forming and Helix-Breaking Behavior of the Amino Acids

Amino Acid
Helix Behavior*
 
Amino Acid
Helix Behavior*

A

Ala H (I)   M Met  H
C Cys Variable     N Asn C (I)
D Asp Variable
  P Pro B

E  

Glu H

 

  Q Gln H (I)

F     

Phe H
  R Arg H (I)

G

Gly I (B)   S Ser C (B)

H

His H (I)   T Thr Variable

I

Ile H (C)   V Val Variable

K

Lys Variable
  W Trp H (C)

L

Leu H
  Y Tyr H (C)
*H = helix former; I = indifferent; B = helix breaker; C = random coil; ( ) = secondary tendency.

unique structure, which fixes the value of the Ca—N-C bond angle. Helices can be formed from either D- or L-amino acids, but a given helix must be composed entirely of amino acids of one configuration. a-Helices cannot be formed from a mixed copolymer of D- and L-amino acids. An a-helix composed of D-amino acids is left-handed.

Other Helical Structures

There are several other far less common types of helices found in proteins. The most common of these is the 310 helix, which contains 3.0 residues per turn (with 10 atoms in the ring formed by making the hydrogen bond three residues up the chain). It normally extends over shorter stretches of sequence than the a-helix. Other helical structures include the 27 ribbon and the p-helix, which has 4.4 residues and 16 atoms per turn and is thus called the 4.416 helix.

The Beta-Pleated Sheet

Another type of structure commonly observed in proteins also forms because of local, cooperative formation of hydrogen bonds. That is the pleated sheet, or b-structure, often called the b-pleated sheet. This structure was also first postulated by Pauling and Corey in 1951 and has now been observed in many natural proteins. A b-pleated sheet can be visualized by laying thin, pleated strips of paper side by side to make a “pleated sheet” of paper (Figure 6.10). Each

Figure 6.10 · A “pleated sheet” of paper with an antiparallel b -sheet drawn on it. (Irving Geis)

strip of paper can then be pictured as a single peptide strand in which the peptide backbone makes a zigzag pattern along the strip, with the a-carbons lying at the folds of the pleats. The pleated sheet can exist in both parallel and antiparallel forms. In the parallel b-pleated sheet, adjacent chains run in the same direction ( N ® C or C ® N). In the antiparallel b-pleated sheet, adjacent strands run in opposite directions.
            Each single strand of the b-sheet structure can be pictured as a twofold helix, that is, a helix with two residues per turn. The arrangement of successive amide planes has a pleated appearance due to the tetrahedral nature of the Ca atom. It is important to note that the hydrogen bonds in this structure are essentially interstrand rather than intrastrand. The peptide backbone in the b-sheet is in its most extended conformation (sometimes called the e-conformation ). The optimum formation of H bonds in the parallel pleated sheet results in a slightly less extended conformation than in the antiparallel sheet. The H bonds thus formed in the parallel b-sheet are bent significantly. The distance between residues is 0.347 nm for the antiparallel pleated sheet, but only 0.325 nm for the parallel pleated sheet. Figure 6.11 shows examples of both parallel and antiparallel b-pleated sheets. Note that the side chains in the pleated sheet are oriented perpendicular or normal to the plane of the sheet, extending out from the plane on alternating sides.

Figure 6.11 · The arrangement of hydrogen bonds in (a) parallel and (b) antiparallel b -pleated sheets.

            Parallel b-sheets tend to be more regular than antiparallel b-sheets. The range of f and y angles for the peptide bonds in parallel sheets is much smaller than that for antiparallel sheets. Parallel sheets are typically large structures; those composed of less than five strands are rare. Antiparallel sheets, however, may consist of as few as two strands. Parallel sheets characteristically distribute hydrophobic side chains on both sides of the sheet, while antiparallel sheets are usually arranged with all their hydrophobic residues on one side of the sheet. This requires an alternation of hydrophilic and hydrophobic residues in the primary structure of peptides involved in antiparallel b-sheets because alternate side chains project to the same side of the sheet (see Figure 6.10).
            Antiparallel pleated sheets are the fundamental structure found in silk, with the polypeptide chains forming the sheets running parallel to the silk fibers. The silk fibers thus formed have properties consistent with those of the b-sheets that form them. They are quite flexible but cannot be stretched or extended to any appreciable degree. Antiparallel structures are also observed in many other proteins, including immunoglobulin G, superoxide dismutase from bovine erythrocytes, and concanavalin A. Many proteins, including carbonic anhydrase, egg lysozyme, and glyceraldehyde phosphate dehydrogenase, possess both a-helices and b-pleated sheet structures within a single polypeptide chain.

The Beta-Turn

Most proteins are globular structures. The polypeptide chain must therefore possess the capacity to bend, turn, and reorient itself to produce the required compact, globular structures. A simple structure observed in many proteins is the b-turn (also known as the tight turn or b-bend), in which the peptide chain forms a tight loop with the carbonyl oxygen of one residue hydrogen-bonded with the amide proton of the residue three positions down the chain. This H bond makes the b-turn a relatively stable structure. As shown in Figure 6.12, the b-turn allows the protein to reverse the

Figure 6.12 · The structures of two kinds of b -turns (also called tight turns or b -bends) (Irving Geis)

direction of its peptide chain. This figure shows the two major types of b-turns, but a number of less common types are also found in protein structures. Certain amino acids, such as proline and glycine, occur frequently in b-turn sequences, and the particular conformation of the b-turn sequence depends to some extent on the amino acids composing it. Due to the absence of a side chain, glycine is sterically the most adaptable of the amino acids, and it accommodates conveniently to other steric constraints in the b-turn. Proline, however, has a cyclic structure and a fixed f angle, so, to some extent, it forces the formation of a b-turn, and in many cases this facilitates the turning of a polypeptide chain upon itself. Such bends promote formation of antiparallel b-pleated sheets.

The Beta-Bulge

One final secondary structure, the b-bulge , is a small piece of nonrepetitive structure that can occur by itself, but most often occurs as an irregularity in antiparallel b-structures. A b-bulge occurs between two normal b-structure hydrogen bonds and comprises two residues on one strand and one residue on the opposite strand. Figure 6.13 illustrates typical b-bulges. The extra

Figure 6.13 · Three different kinds of b-bulge structures involving a pair of adjacent polypeptide chains.(Adapted from Richardson, J. S., 1981. Advances in Protein Chemistry 34:167–339.)

 

residue on the longer side, which causes additional backbone length, is accommodated partially by creating a bulge in the longer strand and partially by forcing a slight bend in the b-sheet. Bulges thus cause changes in the direction of the polypeptide chain, but to a lesser degree than tight turns do. Over 100 examples of b-bulges are known in protein structures.
            The secondary structures we have described here are all found commonly in proteins in nature. In fact, it is hard to find proteins that do not contain one or more of these structures. The energetic (mostly H-bond) stabilization afforded by a-helices, b-pleated sheets, and b-turns is important to proteins, and they seize the opportunity to form such structures wherever possible.

6.4 • Protein Folding and Tertiary Structure

The folding of a single polypeptide chain in three-dimensional space is referred to as its tertiary structure. As discussed in Section 6.2, all of the information needed to fold the protein into its native tertiary structure is contained within the primary structure of the peptide chain itself. With this in mind, it was disappointing to the biochemists of the 1950s when the early protein structures did not reveal the governing principles in any particular detail. It soon became apparent that the proteins knew how they were supposed to fold into tertiary shapes, even if the biochemists did not. Vigorous work in many laboratories has slowly brought important principles to light.
            First, secondary structures—helices and sheets—form whenever possible as a consequence of the formation of large numbers of hydrogen bonds. Second, a-helices and b-sheets often associate and pack close together in the protein. No protein is stable as a single-layer structure, for reasons that become apparent later. There are a few common methods for such packing to occur. Third, because the peptide segments between secondary structures in the protein tend to be short and direct, the peptide does not execute complicated twists and knots as it moves from one region of a secondary structure to another. A consequence of these three principles is that protein chains are usually folded so that the secondary structures are arranged in one of a few common patterns. For this reason, there are families of proteins that have similar tertiary structure, with little apparent evolutionary or functional relationship among them. Finally, proteins generally fold so as to form the most stable structures possible. The stability of most proteins arises from (1) the formation of large numbers of intramolecular hydrogen bonds and (2) the reduction in the surface area accessible to solvent that occurs upon folding.

Fibrous Proteins

In Chapter 5, we saw that proteins can be grouped into three large classes based on their structure and solubility: fibrous proteins, globular proteins, and membrane proteins. Fibrous proteins contain polypeptide chains organized approximately parallel along a single axis, producing long fibers or large sheets. Such proteins tend to be mechanically strong and resistant to solubilization in water and dilute salt solutions. Fibrous proteins often play a structural role in nature (see Chapter 5).

a-Keratin

As their name suggests, the structure of the a-keratins is dominated by a-helical segments of polypeptide. The amino acid sequence of a-keratin subunits is composed of central a-helix—rich rod domains about 311 to 314 residues in length, flanked by nonhelical N- and C-terminal domains of varying size and composition (Figure 6.14a). The structure of the central rod domain of a typical a-keratin is shown in Figure 6.14b. It consists of four helical strands arranged as twisted pairs of two-stranded coiled coils. X-ray diffraction patterns show that these structures resemble a-helices, but with a pitch of 0.51 nm rather than the expected 0.54 nm. This is consistent with a tilt of the helix relative to the long axis of the fiber, as in the two-stranded “rope” in Figure 6.14.      

Figure 6.14 · (a) Both type I and type II a -keratin molecules have sequences consisting of long, central rod domains with terminal cap domains. The numbers of amino acid residues in each domain are indicated. Asterisks denote domains of variable length. (b) The rod domains form coiled coils consisting of intertwined right-handed a -helices. These coiled coils then wind around each other in a left-handed twist. Keratin filaments consist of twisted protofibrils (each a bundle of four coiled coils). (Adapted from Steinert, P., and  Parry, D., 1985. Annual Review of Cell Biology 1:41–65; and Cohlberg, J., 1993. Trends in Biochemical Sciences 18:360–362.)  

 

 The primary structure of the central rod segments of a-keratin consists of quasi-repeating seven-residue segments of the form (a-b-c-d-e-f-g)n. These units are not true repeats, but residues a and d are usually nonpolar amino acids. In a-helices, with 3.6 residues per turn, these nonpolar residues are arranged in an inclined row or stripe that twists around the helix axis. These nonpolar residues would make the helix highly unstable if they were exposed to solvent, but the association of hydrophobic strips on two coiled coils to form the two-stranded rope effectively buries the hydrophobic residues and forms a highly stable structure (Figure 6.14). The helices clearly sacrifice some stability in assuming this twisted conformation, but they gain stabilization energy from the packing of side chains between the helices. In other forms of keratin, covalent disulfide bonds form between cysteine residues of adjacent molecules, making the overall structure rigid, inextensible, and insoluble—important properties for structures such as claws, fingernails, hair, and horns in animals. How and where these disulfides form determines the amount of curling in hair and wool fibers. When a hairstylist creates a permanent wave (simply called a “permanent”) in a hair salon, disulfides in the hair are first reduced and cleaved, then reorganized and reoxidized to change the degree of curl or wave. In contrast, a “set” that is created by wetting the hair, setting it with curlers, and then drying it represents merely a rearrangement of the hydrogen bonds between helices and between fibers. (On humid or rainy days, the hydrogen bonds in curled hair may rearrange, and the hair becomes “frizzy.”)

Fibroin and b-Keratin: b-Sheet Proteins

The fibroin proteins found in silk fibers represent another type of fibrous protein. These are composed of stacked antiparallel b-sheets, as shown in Figure 6.15. In the polypeptide sequence

 

Figure 6.15 · Silk fibroin consists of a unique stacked array of b -sheets. The primary structure of fibroin molecules consists of long stretches of alternating glycine and alanine or serine residues. When the sheets stack, the more bulky alanine and serine residues on one side of a sheet interdigitate with similar residues on an adjoining sheet. Glycine hydrogens on the alternating faces interdigitate in a similar manner, but with a smaller intersheet spacing. (Irving Geis)

 

of silk proteins, there are large stretches in which every other residue is a glycine. As previously mentioned, the residues of a b-sheet extend alternately above and below the plane of the sheet. As a result, the glycines all end up on one side of the sheet and the other residues (mainly alanines and serines) compose the opposite surface of the sheet. Pairs of b-sheets can then pack snugly together (glycine surface to glycine surface or alanine—serine surface to alanine—serine surface). The b-keratins found in bird feathers are also made up of stacked b-sheets.

Collagen: A Triple Helix

Collagen is a rigid, inextensible fibrous protein that is a principal constituent of connective tissue in animals, including tendons, cartilage, bones, teeth, skin, and blood vessels. The high tensile strength of collagen fibers in these structures makes possible the various animal activities such as running and jumping that put severe stresses on joints and skeleton. Broken bones and tendon and cartilage injuries to knees, elbows, and other joints involve tears or hyperextensions of the collagen matrix in these tissues.
            The basic structural unit of collagen is tropocollagen, which has a molecular weight of 285,000 and consists of three intertwined polypeptide chains, each about 1000 amino acids in length. Tropocollagen molecules are about 300 nm long and only about 1.4 nm in diameter. Several kinds of collagen have been identified. Type I collagen, which is the most common, consists of two identical peptide chains designated a1(I) and one different chain designated a 2(I). Type I collagen predominates in bones, tendons, and skin. Type II collagen, found in cartilage, and type III collagen, found in blood vessels, consist of three identical polypeptide chains.
            Collagen has an amino acid composition that is unique and is crucial to its three-dimensional structure and its characteristic physical properties. Nearly one residue out of three is a glycine, and the proline content is also unusually high. Three unusual modified amino acids are also found in collagen: 4-hydroxy-proline (Hyp), 3-hydroxyproline, and 5-hydroxylysine (Hyl) (Figure 6.16). Proline and Hyp together compose up to 30% of the residues of collagen.

 

A Deeper Look
Charlotte ’s Web Revisited: Helix – Sheet Composites in Spider Dragline Silk

E.B. White’s endearing story Charlotte’s Web centers around the web-spinning feats of Charlotte the spider. Although the intricate designs of spiderwebs are eye- (and fly-) catching, it might be argued that the composition of web silk itself is even more remarkable. Spider silk is synthesized in special glands in the spider’s abdomen. The silk strands produced by these glands are both strong and elastic. Dragline silk (that from which the spider hangs) has a tensile strength of 200,000 psi (pounds per square inch) — stronger than steel and similar to Kevlar, the synthetic material used in bulletproof vests! This same silk fiber is also flexible enough to withstand strong winds and other natural stresses.
            This combination of strength and flexibility derives from the composite nature of spider silk. As keratin protein is extruded from

the spider’s glands, it endures shearing forces that break the H bonds stabilizing keratin a-helices. These regions then form microcrystalline arrays of b-sheets. These microcrystals are surrounded by the keratin strands, which adopt a highly disordered state composed of a-helices and random coil structures.
            The b-sheet microcrystals contribute strength, and the disordered array of helix and coil make the silk strand flexible. The resulting silk strand resembles modern human-engineered composite materials. Certain tennis racquets, for example, consist of fiberglass polymers impregnated with microcrystalline graphite. The fiberglass provides flexibility, and the graphite crystals contribute strength. Modern high technology, for all its sophistication, is merely imitating nature — and Charlotte ’s web — after all.

 

Figure 6.16 · The hydroxylated residues typically found in collagen.

 


Interestingly, these three amino acids are formed from normal proline and lysine after the collagen polypeptides are synthesized. The modifications are effected by two enzymes: prolyl hydroxylase and lysyl hydroxylase. The prolyl hydroxylase reaction (Figure 6.17) requires molecular oxygen,


Figure 6.17 · Hydroxylation of proline residues is catalyzed by prolyl hydroxylase. The reaction requires a -ketoglutarate and ascorbic acid (vitamin C).


a-ketoglutarate, and ascorbic acid (vitamin C) and is activated by Fe2+. The hydroxylation of lysine is similar. These processes are referred to as posttranslational modifications because they occur after genetic information from DNA has been translated into newly formed protein.

            Because of their high content of glycine, proline, and hydroxyproline, collagen fibers are incapable of forming traditional structures such as a-helices and b-sheets. Instead, collagen polypeptides intertwine to form a unique triple helix, with each of the three strands arranged in a helical fashion (Figure 6.18). Compared to the a-helix, the collagen helix is much more extended,

Figure 6.18 · Poly(Gly-Pro-Pro), a collagen-like right-handed triple helix composed of three left-handed helical chains. (Adapted from Miller, M. H., and Scheraga, H. A., 1976, Calculation of the structures of collagen models. Role of interchain interactions in determining the triple-helical coiled-coil conformation. I. Poly(glycyl-prolyl-prolyl). Journal of Polymer Science Symposium 54:171–200.)

with a rise per residue along the triple helix axis of 2.9 Å, compared to 1.5 Å for the a-helix. There are about 3.3 residues per turn of each of these helices. The triple helix is a structure that forms to accommodate the unique composition and sequence of collagen. Long stretches of the polypeptide sequence are repeats of a Gly-x-y motif, where x is frequently Pro and y is frequently Pro or Hyp. In the triple helix, every third residue faces or contacts the crowded center of the structure. This area is so crowded that only Gly can fit, and thus every third residue must be a Gly (as observed). Moreover, the triple helix is a staggered structure, such that Gly residues from the three strands stack along the center of the triple helix and the Gly from one strand lies adjacent to an x residue from the second strand and to a y from the third. This allows the N-H of each Gly residue to hydrogen bond with the C=O of the adjacent x residue. The triple helix structure is further stabilized and strengthened by the formation of interchain H bonds involving hydroxyproline.
            Collagen types I, II, and III form strong, organized fibrils, consisting of staggered arrays of tropocollagen molecules (Figure 6.19). The periodic arrangement of triple helices in a head-to-

Figure 6.19 · In the electron microscope, collagen fibers exhibit alternating light and dark bands. The dark bands correspond to the 40-nm gaps or “holes” between pairs of aligned collagen triple helices. The repeat distance, d, for the light- and dark-banded pattern is 68 nm. The collagen molecule is 300 nm long, which corresponds to 4.41d. The molecular repeat pattern of five staggered collagen molecules corresponds to 5d. (J. Gross, Biozentrum/Science Photo Library)

tail fashion results in banded patterns in electron micrographs. The banding pattern typically has a periodicity (repeat distance) of 68 nm. Because collagen triple helices are 300 nm long, 40-nm gaps occur between adjacent collagen molecules in a row along the long axis of the fibrils and the pattern repeats every five rows (5 x 68 nm = 340 nm). The 40-nm gaps are referred to as hole regions, and they are important in at least two ways. First, sugars are found covalently attached to 5-hydroxylysine residues in the hole regions of collagen (Figure 6.20). The occurrence of carbohydrate in the hole

Figure 6.20 · A disaccharide of galactose and glucose is covalently linked to the 5-hydroxyl group of hydroxylysines in collagen by the combined action of the enzymes galactosyl transferase and glucosyl transferase.

region has led to the proposal that it plays a role in organizing fibril assembly. Second, the hole regions may play a role in bone formation. Bone consists of microcrystals of hydroxyapatite, Ca5(PO4)3OH, embedded in a matrix of collagen fibrils. When new bone tissue forms, the formation of new hydroxyapatite crystals occurs at intervals of 68 nm. The hole regions of collagen fibrils may be the sites of nucleation for the mineralization of bone.

Human Biochemistry
Collagen-Related Diseases

Collagen provides an ideal case study of the molecular basis of physiology and disease. For example, the nature and extent of collagen cross-linking depends on the age and function of the tissue. Collagen from young animals is predominantly un-cross-linked and can be extracted in soluble form, whereas collagen from older animals is highly cross-linked and thus insoluble. The loss of flexibility of joints with aging is probably due in part to increased cross-linking of collagen.
            Several serious and debilitating diseases involving collagen abnormalities are known. Lathyrism occurs in animals due to the regular consumption of seeds of Lathyrus odoratus, the sweet  pea, and involves weakening and abnormalities in blood vessels, joints, and bones. These conditions are caused by b-amino-propionitrile (see figure), which covalently inactivates lysyl oxidase and leads to greatly reduced intramolecular cross-linking of collagen in affected animals (or humans).

Scurvy results from a dietary vitamin C deficiency and involves the inability to form collagen fibrils properly. This is the result of reduced activity of prolyl hydroxylase, which is vitamin C–dependent, as previously noted. Scurvy leads to lesions in the skin and blood vessels, and, in its advanced stages, it can lead to grotesque disfiguration and eventual death. Although rare in the modern world, it was a disease well known to sea-faring explorers in earlier times who did not appreciate the importance of fresh fruits and vegetables in the diet.  
          A number of rare genetic diseases involve collagen abnormalities, including Marfan's syndrome and the Ehlers–Danlos syndromes, which result in hyperextensible joints and skin. The formation of atherosclerotic plaques, which cause arterial blockages in advanced stages, is due in part to the abnormal formation of collagenous structures in blood vessels.

           

b-Aminopropionitrile (present in sweet peas) covalently inactivates lysyl oxidase, preventing intramolecular cross-linking of collagen and causing abnormalities in joints, bones, and blood vessels.

            The collagen fibrils are further strengthened and stabilized by the formation of both intramolecular (within a tropocollagen molecule) and intermolecular (between tropocollagen molecules in the fibril) cross-links. Intramolecular cross-links are formed between lysine residues in the (nonhelical) N-terminal region of tropocollagen in a unique pair of reactions shown in Figure 6.21. The enzyme lysyl oxidase catalyzes the formation of aldehyde groups at the lysine


Figure 6.21 · Collagen fibers are stabilized and strengthened by Lys Lys cross-links. Aldehyde moieties formed by lysyl oxidase react in a spontaneous nonenzymatic aldol reaction.

side chains in a copper-dependent reaction. The aldehyde groups of two such side chains then link covalently in a spontaneous nonenzymatic aldol condensation. The intermolecular cross-linking of tropocollagens involves the formation of a unique hydroxypyridinium structure from one lysine and two hydroxylysine residues (Figure 6.22). These cross-links form between the N-terminal region of one tropocollagen and the C-terminal region of an adjacent tropocollagen in the fibril.


Figure 6.22 · The hydroxypyridinium structure formed by the cross-linking of a Lys and two hydroxy Lys residues.

Globular Proteins

Fibrous proteins, although interesting for their structural properties, represent only a small percentage of the proteins found in nature. Globular proteins, so named for their approximately spherical shape, are far more numerous.

 

Helices and Sheets in Globular Proteins

Globular proteins exist in an enormous variety of three-dimensional structures, but nearly all contain substantial amounts of the a-helices and b-sheets that form the basic structures of the simple fibrous proteins. For example, myoglobin, a small, globular, oxygen-carrying protein of muscle (17 kD, 153 amino acid residues), contains eight a-helical segments, each containing 7 to 26 amino acid residues. These are arranged in an apparently irregular (but invariant) fashion (see Figure 5.7). The space between the helices is filled efficiently and tightly with (mostly hydrophobic) amino acid side chains. Most of the polar side chains in myoglobin (and in most other globular proteins) face the outside of the protein structure and interact with solvent water. Myoglobin’s structure is unusual because most globular proteins contain a relatively small amount of a-helix. A more typical globular protein (Figure 6.23) is bovine ribonuclease A, a small protein


Figure 6.23 · The three-dimensional structure of bovine ribonuclease A, showing the a -helices as ribbons. (Jane Richardson)

(14.6 kD, 129 residues) that contains a few short helices, a broad section of antiparallel b-sheet, a few b-turns, and several peptide segments without defined secondary structure.

            Why should the cores of most globular and membrane proteins consist almost entirely of a-helices and b-sheets? The reason is that the highly polar N-H and C=O moieties of the peptide backbone must be neutralized in the hydrophobic core of the protein. The extensively H-bonded nature of a-helices and b-sheets is ideal for this purpose, and these structures effectively stabilize the polar groups of the peptide backbone in the protein core.

            In globular protein structures, it is common for one face of an a-helix to be exposed to the water solvent, with the other face toward the hydrophobic interior of the protein. The outward face of such an amphiphilic helix consists mainly of polar and charged residues, whereas the inward face contains mostly nonpolar, hydrophobic residues. A good example of such a surface helix is that of residues 153 to 166 of flavodoxin from Anabaena (Figure 6.24). Note that the helical wheel presentation of this helix readily shows that one face contains four hydrophobic residues and that the other is almost entirely polar and charged.


Figure 6.24 · (a) The alpha helix consisting of residues 153–166 (red) in flavodoxin from Anabaena is a surface helix and is amphipathic. (b) The two helices (yellow and blue) in the interior of the citrate synthase dimer (residues 260–270 in each monomer) are mostly hydrophobic. (c) The exposed helix (residues 74–87—red) of calmodulin is entirely accessible to solvent and consists mainly of polar and charged residues.

 

            Less commonly, an a-helix can be completely buried in the protein interior or completely exposed to solvent. Citrate synthase is a dimeric protein in which a -helical segments form part of the subunit—subunit interface. As shown in Figure 6.24, one of these helices (residues 260 to 270) is highly hydrophobic and contains only two polar residues, as would befit a helix in the protein core. On the other hand, Figure 6.24 also shows the solvent-exposed helix (residues 74 to 87) of calmodulin, which consists of 10 charged residues, 2 polar residues, and only 2 nonpolar residues.  

Packing Considerations

The secondary and tertiary structures of myoglobin and ribonuclease A illustrate the importance of packing in tertiary structures. Secondary structures pack closely to one another and also intercalate with (insert between) extended polypeptide chains. If the sum of the van der Waals volumes of a protein’s constituent amino acids is divided by the volume occupied by the protein, packing densities of 0.72 to 0.77 are typically obtained. This means that, even with close packing, approximately 25% of the total volume of a protein is not occupied by protein atoms. Nearly all of this space is in the form of very small cavities. Cavities the size of water molecules or larger do occasionally occur, but they make up only a small fraction of the total protein volume. It is likely that such cavities provide flexibility for proteins and facilitate conformation changes and a wide range of protein dynamics (discussed later).

Ordered, Nonrepetitive Structures

In any protein structure, the segments of the polypeptide chain that cannot be classified as defined secondary structures, such as helices or sheets, have been traditionally referred to as coil or random coil. Both these terms are misleading. Most of these segments are neither coiled nor random, in any sense of the words. These structures are every bit as highly organized and stable as the defined secondary structures. They are just more variable and difficult to describe. These so-called coil structures are strongly influenced by side-chain interactions. Few of these interactions are well understood, but a number of interesting cases have been described. In his early studies of myoglobin structure, John Kendrew found that the -OH group of threonine or serine often forms a hydrogen bond with a backbone NH at the beginning of an a-helix. The same stabilization of an a-helix by a serine is observed in the three-dimensional structure of pancreatic trypsin inhibitor (Figure 6.25). Also in this same structure, an asparagine residue adjacent to a b-strand is found to form H bonds that stabilize the b-structure.

Figure 6.25 · The three-dimensional structure of bovine pancreatic trypsin inhibitor. Note the stabilization of the a -helix by a hydrogen bond to Ser47 and the stabilization of the b -sheet by Asn43.

 

 

            Nonrepetitive but well-defined structures of this type form many important features of enzyme active sites. In some cases, a particular arrangement of “coil” structure providing a specific type of functional site recurs in several functionally related proteins. The peptide loop that binds iron—sulfur clusters in both ferredoxin and high potential iron protein is one example. Another is the central loop portion of the E-F hand structure that binds a calcium ion in several calcium-binding proteins, including calmodulin, carp parvalbumin, troponin C, and the intestinal calcium-binding protein. This loop, shown in Figure 6.26, connects two short a-helices. The calcium ion nestles into the pocket formed by this structure.

Figure 6.26 · A representation of the so-called E–F hand structure, which forms calcium-binding sites in a variety of proteins. The stick drawing shows the peptide backbone of the E–F hand motif. The “E” helix extends along the index finger, a loop traces the approximate arrangement of the curled middle finger, and the “F” helix extends outward along the thumb. A calcium ion (Ca2+) snuggles into the pocket created by the two helices and the loop. Kretsinger and coworkers originally assigned letters alphabetically to the helices in parvalbumin, a protein from carp. The E–F hand derives its name from the letters assigned to the helices at one of the Ca2+-binding sites.

Flexible, Disordered Segments

In addition to nonrepetitive but well-defined structures, which exist in all proteins, genuinely disordered segments of polypeptide sequence also occur. These sequences either do not show up in electron density maps from X-ray crystallographic studies or give diffuse or ill-defined electron densities. These segments either undergo actual motion in the protein crystals themselves or take on many alternate conformations in different molecules within the protein crystal. Such behavior is quite common for long, charged side chains on the surface of many proteins. For example, 16 of the 19 lysine side chains in myoglobin have uncertain orientations beyond the d-carbon, and five of these are disordered beyond the b-carbon. Similarly, a majority of the lysine residues are disordered in trypsin, rubredoxin, ribonuclease, and several other proteins. Arginine residues, however, are usually well ordered in protein structures. For the four proteins just mentioned, 70% of the arginine residues are highly ordered, compared to only 26% of the lysines.

Motion in Globular Proteins

Although we have distinguished between well-ordered and disordered segments of the polypeptide chain, it is important to realize that even well-ordered side chains in a protein undergo motion, sometimes quite rapid. These motions should be viewed as momentary oscillations about a single, highly stable conformation. Proteins are thus best viewed as dynamic structures. The allowed motions may be motions of individual atoms, groups of atoms, or even whole sections of the protein. Furthermore, they may arise from either thermal energy or specific, triggered conformational changes in the protein. Atomic fluctuations such as vibrations typically are random, very fast, and usually occur over small distances (less than 0.5 Å), as shown in Table 6.2. These motions arise from the kinetic energy within the protein and are a function of temperature. These very fast motions can be modeled by molecular dynamics calculations and studied by X-ray diffraction.

Table 6.2
Motion and Fluctuations in Proteins
Type of Motion
Spatial
Displacement
(Å)

Characteristic
Time
(sec) 

Source of Energy
Atomic vibrations 0.01 – 1 10-15 – 10-11  Kinetic energy
Collective motions 0.01 – 5
or more
10-12 – 10-3 Kinetic energy

1. Fast: Tyr ring flips;
    methyl group rotations
2. Slow: hinge bending
    between domains

Triggered conformation changes 0.5 – 10
or more 
10-9 – 103 Interactions with triggering agent
Adapted from Petsko and Ringe (1984).

            A class of slower motions, which may extend over larger distances, is collective motions. These are movements of groups of atoms covalently linked in such a way that the group moves as a unit. Such groups range in size from a few atoms to hundreds of atoms. Whole structural domains within a protein may be involved, as in the case of the flexible antigen-binding domains of immunoglobulins, which move as relatively rigid units to selectively bind separate antigen molecules. Such motions are of two types—(1) those that occur quickly but infrequently, such as tyrosine ring flips, and (2) those that occur slowly, such as cis-trans isomerizations of prolines. These collective motions also arise from thermal energies in the protein and operate on a time scale of 10-12 to 10-3 sec. These motions can be studied by nuclear magnetic resonance (NMR) and fluorescence spectroscopy.
            Conformational changes involve motions of groups of atoms (individual side chains, for example) or even whole sections of proteins. These motions occur on a time scale of 10-9 to 103 sec, and the distances covered can be as large as 1 nm. These motions may occur in response to specific stimuli or arise from specific interactions within the protein, such as hydrogen bonding, electrostatic interactions, and ligand binding. More will be said about conformational changes when enzyme catalysis and regulation are discussed (see Chapters 14 and 15).

Forces Driving the Folding of Globular Proteins

As already pointed out, the driving force for protein folding and the resulting formation of a tertiary structure is the formation of the most stable structure possible. Two forces are at work here. The peptide chain must both (1) satisfy the constraints inherent in its own structure and (2) fold so as to “bury” the hydrophobic side chains, minimizing their contact with solvent. The polypeptide itself does not usually form simple straight chains. Even in chain segments where helices and sheets are not formed, an extended peptide chain, being composed of L-amino acids, has a tendency to twist slightly in a right-handed direction. As shown in Figure 6.27, this tendency

Figure 6.27 · The natural right-handed twist exhibited by polypeptide chains, and the variety of structures that arise from this twist.

 

is apparently the basis for the formation of a variety of tertiary structures having a right-handed sense. Principal among these are the right-handed twists in arrays of b-sheets and right-handed cross-overs in parallel b-sheet arrays. Right-handed twisted b-sheets are found at the center of a number of proteins and provide an extended, highly stable structural core. Phosphoglycerate mutase, adenylate kinase, and carbonic anhydrase, among others, exist as smoothly twisted planes or saddle-shaped structures. Triose phosphate isomerase, soybean trypsin inhibitor, and domain 1 of pyruvate kinase contain right-handed twisted cylinders or barrel structures at their cores.
            Connections between b-strands are of two types—hairpins and cross-overs. Hairpins, as shown in Figure 6.27, connect adjacent antiparallel b-strands. Cross-overs are necessary to connect adjacent (or nearly adjacent) parallel b-strands. Nearly all cross-over structures are right-handed. Only in subtilisin and phosphoglucoisomerase have isolated left-handed cross-overs been identified. In many cross-over structures, the cross-over connection itself contains an a-helical segment. This is referred to as a bab-loop. As shown in Figure 6.27, the strong tendency in nature to form right-handed cross-overs, the wide occurrence of a-helices in the cross-over connection, and the right-handed twists of b-sheets can all be understood as arising from the tendency of an extended polypeptide chain of L-amino acids to adopt a right-handed twist structure. This is a chiral effect. Proteins composed of D-amino acids would tend to adopt left-handed twist structures.
            The second driving force that affects the folding of polypeptide chains is the need to bury the hydrophobic residues of the chain, protecting them from solvent water. From a topological viewpoint, then, all globular proteins must have an “inside” where the hydrophobic core can be arranged and an “outside” toward which the hydrophilic groups must be directed. The sequestration of hydrophobic residues away from water is the dominant force in the arrangement of secondary structures and nonrepetitive peptide segments to form a given tertiary structure. Globular proteins can be classified mainly on the basis of the particular kind of core or backbone structure they use to accomplish this goal. The term hydrophobic core, as used here, refers to a region in which hydrophobic side chains cluster together, away from the solvent. Backbone refers to the polypeptide backbone itself, excluding the particular side chains. Globular proteins can be pictured as consisting of “layers” of backbone, with hydrophobic core regions between them. Over half the known globular protein structures have two layers of backbone (separated by one hydrophobic core). Roughly one-third of the known structures are composed of three backbone layers and two hydrophobic cores. There are also a few known four-layer structures and one known five-layer structure. A few structures are not easily classified in this way, but it is remarkable that most proteins fit into one of these classes. Examples of each are presented in Figure 6.28.



Figure 6.28 · Examples of protein domains with different numbers of layers of backbone structure. (a) Cytochrome c' with two layers of a-helix. (b) Domain 2 of phosphoglycerate kinase, composed of a b -sheet layer between two layers of helix, three layers overall. (c) An unusual five-layer structure, domain 2 of glycogen phosphorylase, a b -sheet layer sandwiched between four layers of a -helix. (d) The concentric “layers” of b -sheet (inside) and a -helix (outside) in triose phosphate isomerase. Hydrophobic residues are buried between these concentric layers in the same manner as in the planar layers of the other proteins. The hydrophobic layers are shaded yellow. (Jane Richardson)

Classification of Globular Proteins

In addition to classification based on layer structure, proteins can be grouped according to the type and arrangement of secondary structure. There are four such broad groups: antiparallel a-helix, parallel or mixed b-sheet, antiparallel b-sheet, and the small metal- and disulfide-rich proteins.

            It is important to note that the similarities of tertiary structure within these groups do not necessarily reflect similar or even related functions. Instead, functional homology usually depends on structural similarities on a smaller and more intimate scale.

Antiparallel a-Helix Proteins

Antiparallel a-helix proteins are structures heavily dominated by a-helices. The simplest way to pack helices is in an antiparallel manner, and most of the proteins in this class consist of bundles of antiparallel helices. Many of these exhibit a slight (15°) left-handed twist of the helix bundle. Figure 6.29 shows a representative sample of antiparallel a-helix proteins. Many of these are regular, uniform structures, but in a few cases (uteroglobin, for example) one of the helices is tilted away from the bundle. Tobacco mosaic virus protein has small, highly twisted antiparallel b-sheets on one end of the helix bundle with two additional helices on the other side of the sheet. Notice in Figure 6.29 that most of the antiparallel a-helix proteins are made up of four-helix bundles.
 

Figure 6.29 · Several examples of antiparallel a -proteins. (Jane Richardson)

 

            The so-called globin proteins are an important group of a-helical proteins. These include hemoglobins and myoglobins from many species. The globin structure can be viewed as two layers of helices, with one of these layers perpendicular to the other and the polypeptide chain moving back and forth between the layers.

Parallel or Mixed b-Sheet Proteins

The second major class of protein structures contains structures based around parallel or mixed b-sheets. Parallel b-sheet arrays, as previously discussed, distribute hydrophobic side chains on both sides of the sheet. This means that neither side of parallel b-sheets can be exposed to solvent. Parallel b-sheets are thus typically found as core structures in proteins, with little access to solvent.

            Another important parallel b-array is the eight-stranded parallel b-barrel, exemplified in the structures of triose phosphate isomerase and pyruvate kinase (Figure 6.30). Each b-strand in


Figure 6.30 · Parallel b -array proteins—the eight-stranded b -barrels of triose phosphate isomerase (a, side view, and b, top view) and (c) pyruvate kinase. (Jane Richardson)

the barrel is flanked by an antiparallel a-helix. The a-helices thus form a larger cylinder of parallel helices concentric with the b-barrel. Both cylinders thus formed have a right-handed twist. Another parallel b-structure consists of an internal twisted wall of parallel or mixed b-sheet protected on both sides by helices or other substructures. This structure is called the doubly wound parallel b-sheet because the structure can be imagined to have been wound by strands beginning in the middle and going outward in opposite directions. The essence of this structure is shown in Figure 6.31. Whereas the barrel structures have four layers of backbone structure, the doubly wound sheet proteins have three major layers and thus two hydrophobic core regions.

 

 

Figure 6.31 · Several typical doubly wound parallel b-sheet proteins. (Jane Richardson)  

 

 

 

 

  

A Deeper Look
The Coiled Coil Motif in Proteins

The coiled coil motif was first identified in 1953 by Linus Pauling, Robert Corey, and Francis Crick as the main structural element of fibrous proteins such as keratin and myosin. Since that time, many proteins have been found to contain one or more coiled coil segments or domains. A coiled coil is a bundle of a-helices that are wound into a superhelix. Two, three, or four helical segments may be found in the bundle, and they may be arranged parallel or antiparallel to one another. Coiled coils are characterized by a distinctive and regular packing of side chains in the core of the bundle. This regular meshing of side chains

requires that they occupy equivalent positions turn after turn. This is not possible for undistorted a-helices, which have 3.6 residues per turn. The positions of side chains on their surface shift continuously along the helix surface (see figure). However, giving the right-handed a-helix a left-handed twist reduces the number of residues per turn to 3.5, and, because 3.5 times 2 equals 7.0, the positions of the side chains repeat after two turns (seven residues). Thus, a heptad repeat pattern in the peptide sequence is diagnostic of a coiled coil structure. The figure shows a sampling of coiled coil structures (highlighted in color) in various proteins.

Antiparallel b-Sheet Proteins

Another important class of tertiary protein conformations is the antiparallel b-sheet structures. Antiparallel b-sheets, which usually arrange hydrophobic residues on just one side of the sheet, can exist with one side exposed to solvent. The minimal structure for an antiparallel b-sheet protein is thus a two-layered structure, with hydrophobic faces of the two sheets juxtaposed and the opposite faces exposed to solvent. Such domains consist of b-sheets arranged in a cylinder or barrel shape. These structures are usually less symmetric than the singly wound parallel barrels and are not as efficiently hydrogen bonded, but they occur much more frequently in nature. Barrel structures tend to be either all parallel or all antiparallel and usually consist of even numbers of b-strands. Good examples of antiparallel structures include soybean trypsin inhibitor, rubredoxin, and domain 2 of papain (Figure 6.32).

 

 

Figure 6.32 · Examples of antiparallel b -sheet structures in proteins. (Jane Richardson)

 

 


Topology diagrams of antiparallel b-sheet barrels reveal that many of them arrange the polypeptide sequence in an interlocking pattern reminiscent of patterns found on ancient Greek vases (Figure 6.33) and are thus referred to as a Greek key topology. Several of these, including concanavalin A and g-crystallin, contain an extra swirl in the Greek key pattern (see Figure 6.33).

Figure 6.33 · Examples of the so-called Greek key antiparallel b -barrel structure in proteins.

 

  Antiparallel arrangements of b-strands can also form sheets as well as barrels. Glyceraldehyde-3-phosphate dehydrogenase, Streptomyces subtilisin inhibitor, and glutathione reductase are examples of single-sheet, double-layered topology (Figure 6.34).