Chapter 5

Proteins: Their Biological Functions and Primary Structure

 

 

 

 

Although helices are uncommon in manmade architecture, they are a common structural them in biolobgical macromolecules—proteins, nucleic acids, and even polysaccharides. (Loretto Chapel, Santa Fe, NM/© Sarbo)   

 

Proteins are a diverse and abundant class of biomolecules, constituting more than 50% of the dry weight of cells. This diversity and abundance reflect the central role of proteins in virtually all aspects of cell structure and function. An extraordinary diversity of cellular activity is possible only because of the versatility inherent in proteins, each of which is specifically tailored to its biological role. The pattern by which each is tailored resides within the genetic information of cells, encoded in a specific sequence of nucleotide bases in DNA. Each such segment of encoded information defines a gene, and expression of the gene leads to synthesis of the specific protein encoded by it, endowing the cell with the functions unique to that particular protein. Proteins are the agents of biological function; they are also the expressions of genetic information.

5.1 • Proteins Are Linear Polymers of Amino Acids

Chemically, proteins are unbranched polymers of amino acids linked head to tail, from carboxyl group to amino group, through formation of covalent peptide bonds, a type of amide linkage (Figure 5.1).

Figure 5.1  Peptide formation is the creation of an amide bond between the carboxyl group of one amino acid and the amino group of another amino acid. R1 and R2 represent the R groups of two different amino acids.

            Peptide bond formation results in the release of H2O. The peptide “backbone” of a protein consists of the repeated sequence -N-Ca-C-, where the N represents the amide nitrogen, the Ca is the a-carbon atom of an amino acid in the polymer chain, and the final C is the carbonyl carbon of the amino acid, which in turn is linked to the amide N of the next amino acid down the line. The geometry of the peptide backbone is shown in Figure 5.2. Note that the carbonyl oxygen and the amide hydrogen are trans to each other in this figure. This conformation is favored energetically because it results in less steric hindrance between nonbonded atoms in neighboring amino acids. Because the a-carbon atom of the amino acid is a chiral center (in all amino acids except glycine), the polypeptide chain is inherently asymmetric. Only L-amino acids are found in proteins.




Figure 5.2 
 The peptide bond is shown in its usual trans conformation of carbonyl O and amide H. The Ca atoms are the a-carbons of two adjacent amino acids joined in peptide linkage. The dimensions and angles are the average values observed by crystallographic analysis of amino acids and small peptides. The peptide bond is the light gray bond between C and N. (Adapted from Ramachandran, G. N., et al., 1974. Biochimica Biophysica Acta 359:298–302.)  

The Peptide Bond Has Partial Double Bond Character

The peptide linkage is usually portrayed by a single bond between the carbonyl carbon and the amide nitrogen (Figure 5.3a). Therefore, in principle, rotation may occur about any covalent bond in the polypeptide backbone because all three kinds of bonds (N-Ca, Ca-Co, and the Co-N peptide bond) are single bonds. In this representation, the C and N atoms of the peptide grouping are both in planar sp2 hybridization and the C and O atoms are linked by a p bond, leaving the nitrogen with a lone pair of electrons in a 2p orbital. However, another resonance form for the peptide bond is feasible in which the C and N atoms participate in a p bond, leaving a lone e- pair on the oxygen (Figure 5.3b).

Figure 5.3   The partial double bond character of the peptide bond. Resonance interactions among the carbon, oxygen, and nitrogen atoms of the peptide group can be represented by two resonance extremes (a and b). (a) The usual way the peptide atoms are drawn. (b) In an equally feasible form, the peptide bond is now a double bond; the amide N bears a positive charge and the carbonyl O has a negative charge. (c) The actual peptide bond is best described as a resonance hybrid of the forms in (a) and (b). Significantly, all of the atoms associated with the peptide group are coplanar, rotation about Co–N is restricted, and the peptide is distinctly polar. (Irving Geis)

 This structure prevents free rotation about the Co-peptide bond because it becomes a double bond. The real nature of the peptide bond lies somewhere between these extremes; that is, it has partial double bond character, as represented by the intermediate form shown in Figure 5.3c.          

Peptide bond resonance has several important consequences. First, it restricts free rotation around the peptide bond and leaves the peptide backbone with only two degrees of freedom per amino acid group: rotation around the N- Ca bond and rotation around the Ca-Co bond.1 Second, the six atoms composing the peptide bond group tend to be coplanar, forming the so-called amide plane of the polypeptide backbone (Figure 5.4). Third, the Co-N bond length is 0.133 nm, which is shorter than normal C-N bond lengths (for example, the Ca-N bond of 0.145 nm) but longer than typical C=N bonds (0.125 nm). The peptide bond is estimated to have 40% double-bond character.

1The angle of ratation about the N-Ca bond is designated j, phi, whereas the Ca-Co angle of roation is designated y, psi.

Figure 5.4   The coplanar relationship of the atoms in the amide group is highlighted as an imaginary shaded plane lying between two successive a-carbon atoms in the peptide backbone.

The Polypeptide Backbone Is Relatively Polar

Peptide bond resonance also causes the peptide backbone to be relatively polar. As shown in Figure 5.3b, the amide nitrogen represents a protonated or positively charged form, and the carbonyl oxygen becomes a negatively charged atom in the double-bonded resonance state. In actuality, the hybrid state of the partially double-bonded peptide arrangement gives a net positive charge of 0.28 on the amide N and an equivalent net negative charge of 0.28 on the carbonyl O. The presence of these partial charges means that the peptide bond has a permanent dipole. Nevertheless, the peptide backbone is relatively unreactive chemically, and protons are gained or lost by the peptide groups only at extreme pH conditions.

Peptide Classification

Peptide is the name assigned to short polymers of amino acids. Peptides are classified by the number of amino acid units in the chain. Each unit is called an amino acid residue, the word residue denoting what is left after the release of H2O when an amino acid forms a peptide link upon joining the peptide chain. Dipeptides have two amino acid residues, tripeptides have three, tetrapeptides four, and so on. After about 12 residues, this terminology becomes cumbersome, so peptide chains of more than 12 and less than about 20 amino acid residues are usually referred to as oligopeptides, and, when the chain exceeds several dozen amino acids in length, the term polypeptide is used. The distinctions in this terminology are not precise.

Proteins Are Composed of One or More Polypeptide Chains

The terms polypeptide and protein are used interchangeably in discussing single polypeptide chains. The term protein broadly defines molecules composed of one or more polypeptide chains. Proteins having only one polypeptide chain are monomeric proteins. Proteins composed of more than one polypeptide chain are multimeric proteins. Multimeric proteins may contain only one kind of polypeptide, in which case they are homomultimeric, or they may be composed of several different kinds of polypeptide chains, in which instance they are heteromultimeric. Greek letters and subscripts are used to denote the polypeptide composition of multimeric proteins. Thus, an a2-type protein is a dimer of identical polypeptide subunits, or a homodimer. Hemoglobin (Table 5.1) consists of four polypeptides of two different kinds; it is an a2 b2 heteromultimer.

Table 5.1
Size of Protein Molecules*
Protein
Mr
Number of Residues per Chain
Subunit Organization
Insulin (bovine)
5,733
21 (A)
 
30 (B)
Cytochrome c (equine)
12,500
104
a1
Ribonuclease A (bovine pancreas)
12,640
124
a1
Lysozyme (egg white)    
13,930
129
a1
Myoglobin (horse)
16,980
153
a1
Chymotrypsin (bovine pancreas)
22,600
13 (a)
132 (b)
97 (g)
abg

Hemoglobin (human)                       

64,500
141 (a)
146 (b)
a2b2
Serum albumin (human)
68,500
550
a1
Hexokinase (yeast)
96,000
 200
a4
g-Globulin (horse) 
149,900
214 (a)
446 (b)
a2b2
Glutamate dehydrogenase (liver)
332,694
500
a6

Myosin (rabbit)                       

                         

470,000
1800 (heavy, h)
190 (a)
149 (a')
160 (b)
h2a1a'2 b2
Ribulose bisphosphate carboxylase (spinach)
560,000
475 (a)
123 (b)
a8b8
Glutamine synthetase (E. coli)
600,000
468
a12

*Illustrations of selected proteins listed in Table 5.1 are drawn to constant scale. Adapted from Goodsell and Olson, 1993. Trends in Biochemical Sciences 18:65–68.


            Polypeptide chains of proteins range in length from about 100 amino acids to 1800, the number found in each of the two polypeptide chains of myosin, the contractile protein of muscle. However, titin, another muscle protein, has nearly 27,000 amino acid residues and a molecular weight of 2.8 x 106. The average molecular weight of polypeptide chains in eukaryotic cells is about 31,700, corresponding to about 270 amino acid residues. Table 5.1 is a representative list of proteins according to size. The molecular weights (Mr) of proteins can be estimated by a number of physicochemical methods such as polyacrylamide gel electrophoresis or ultracentrifugation (see Chapter Appendix). Precise determinations of protein molecular masses are best obtained by simple calculations based on knowledge of their amino acid sequence. No simple generalizations correlate the size of proteins with their functions. For instance, the same function may be fulfilled in different cells by proteins of different molecular weight. The Escherichia coli enzyme responsible for glutamine synthesis (a protein known as glutamine synthetase) has a molecular weight of 600,000, whereas the analogous enzyme in brain tissue has a molecular weight of just 380,000.

Acid Hydrolysis of Proteins

Peptide bonds of proteins are hydrolyzed by either strong acid or strong base. Because acid hydrolysis proceeds without racemization and with less destruction of certain amino acids (Ser, Thr, Arg, and Cys) than alkaline treatment, it is the method of choice in analysis of the amino acid composition of proteins and polypeptides. Typically, samples of a protein are hydrolyzed with 6 N HCl at 110°C for 24, 48, and 72 hr in sealed glass vials. Tryptophan is destroyed by acid and must be estimated by other means to determine its contribution to the total amino acid composition. The OH-containing amino acids serine and threonine are slowly destroyed, but the data obtained for the three time points (24, 48, and 72 hr) allow extrapolation to zero time to estimate the original Ser and Thr content (Figure).

Figure 5.5   (a) The hydroxy amino acids serine and threonine are slowly destroyed during the course of protein hydrolysis for amino acid composition analysis. Extrapolation of the data back to time zero allows an accurate estimation of the amount of these amino acids originally present in the protein sample. (b) Peptide bonds involving hydrophobic amino acid residues such as valine and isoleucine resist hydrolysis by HCl. With time, these amino acids are released and their free concentrations approach a limiting value that can be approximated with reliability.

In contrast, peptide bonds involving hydrophobic residues such as valine and isoleucine are only slowly hydrolyzed in acid. Another complication arises because the b- and g-amide linkages in asparagine (Asn) and glutamine (Gln) are acid labile. The amino nitrogen is released as free ammonium, and all of the Asn and Gln residues of the protein become aspartic acid (Asp) and glutamic acid (Glu), respectively. The amount of ammonium released during acid hydrolysis gives an estimate of the total number of Asn and Gln residues in the original protein, but not the amounts of either. Accordingly, the concentrations of Asp and Glu determined in amino acid analysis are expressed as Asx and Glx, respectively. Because the relative contributions of [Asn + Asp] or [Gln + Glu] cannot be derived from the data, this information must be obtained by alternative means.

Amino Acid Analysis of Proteins

The complex amino acid mixture in the hydrolysate obtained after digestion of a protein in 6 N HCl can be separated into the component amino acids by either ion exchange chromatography (see Chapter 4) or by reversed-phase high-pressure liquid chromatography (HPLC) (see Chapter Appendix). The amount of each amino acid can then be determined. In ion exchange chromatography, the amino acids are separated and then quantified following reaction with ninhydrin (so-called postcolumn derivatization). In HPLC, the amino acids are converted to phenylthiohydantoin (PTH) derivatives via reaction with Edman’s reagent (see Figure 5.19) prior to chromatography (precolumn derivatization). Both of these methods of separation and analysis are fully automated in instruments called amino acid analyzers. Analysis of the amino acid composition of a 30-kD protein by these methods requires less than 1 hour and only 6 mg (0.2 nmol) of the protein.

            Table 5.2 gives the amino acid composition of several selected proteins: ribonuclease A, alcohol dehydrogenase, myoglobin, histone H3, and collagen. Each of the 20 naturally occurring amino acids is usually represented at least once in a polypeptide chain. However, some small proteins may not have a representative of every amino acid. Note that ribonuclease (12.6 kD, 124 amino acid residues) does not contain any tryptophan. Amino acids almost never occur in equimolar ratios in proteins, indicating that proteins are not composed of repeating arrays of amino acids. There are a few exceptions to this rule. Collagen, for example, contains large proportions of glycine and proline, and much of its structure is composed of (Gl-x-Pro) repeating units, where x is any amino acid. Other proteins show unusual abundances of various amino acids. For example, histones are rich in positively charged amino acids such as arginine and lysine. Histones are a class of proteins found associated with the anionic phosphate groups of eukaryotic DNA.

Table 5.2

Amino Acid Composition of Some Selected Proteins

Values expressed are percent representation of each amino acid.

   

Proteins*
Amino Acid
RNase
ADH
Mb
Histone H3
Collagen

Ala

6.9
7.5
9.8
13.3
11.7
Arg
3.7
3.2
1.7
13.3
4.9
Asn
7.6
2.1
2.0
0.7
1.0

Asp

4.1
4.5
5.0
3.0
3.0

Cys

6.7
3.7
0  
1.5
0  

Gln

6.5
2.1
3.5
5.9
2.6
Glu
4.2
5.6
8.7
5.2
4.5
Gly
3.7
10.2
9.0
5.2
32.7
His
3.7
1.9
7.0
1.5
0.3
Ile
3.1
6.4
5.1
5.2
0.8
Leu
1.7
6.7
11.6
8.9
2.1
Lys
7.7
8.0
13.0
9.6
3.6
Met
3.7
2.4
1.5
1.5
0.7
Phe
2.4
4.8
4.6
3.0
1.2
Pro
4.5
5.3
2.5
4.4
22.5
Ser
12.2
7.0
3.9
3.7
 3.8
Thr
6.7
6.4
3.5
7.4
 1.5
Trp
0  
0.5
1.3
0  
0  
Tyr
4.0
1.1
1.3
2.2
 0.5
Val
7.1
10.4
4.8
4.4
 1.7
Acidic
8.4
10.2
13.7
8.1
7.5
Basic
15.0
13.1
21.8
24.4
8.8
Aromatic
6.4
6.4
7.2
5.2
1.7
Hydrophobic
18.0
30.7
27.6
23.0
6.5

*Proteins are as follows:
RNase: Bovine ribonuclease A, an enzyme; 124 amino acid residues. Note that RNase lacks tryptophan.
ADH: Horse liver alcohol dehydrogenase, an enzyme; dimer of identical 374 amino acid polypeptide chains. The amino acid composition of ADH is reasonably representative of the norm for water-soluble proteins.
Mb: Sperm whale myoglobin, an oxygen-binding protein; 153 amino acid residues. Note that Mb lacks cysteine.
Histone H3: Histones are DNA-binding proteins found in chromosomes; 135 amino acid residues. Note the very basic nature of this protein due to its abundance of Arg and Lys residues. It also lacks tryptophan.
Collagen: Collagen is an extracellular structural protein; 1052 amino acid residues. Collagen has an unusual amino acid composition; it is about one-third glycine and is rich in proline. Note that it also lacks Cys and Trp and is deficient in aromatic amino acid residues in general.


            Amino acid analysis itself does not directly give the number of residues of each amino acid in a polypeptide, but it does give amounts from which the percentages or ratios of the various amino acids can be obtained (Table 5.2). If the molecular weight and the exact amount of the protein analyzed are known (or the number of amino acid residues per molecule is known), the molar ratios of amino acids in the protein can be calculated. Amino acid analysis provides no information on the order or sequence of amino acid residues in the polypeptide chain. Because the polypeptide chain is unbranched, it has only two ends, an amino-terminal or N-terminal end and a carboxyl-terminal or C-terminal end.

The Sequence of Amino Acids in Proteins

The unique characteristic of each protein is the distinctive sequence of amino acid residues in its polypeptide chain(s). Indeed, it is the amino acid sequence of proteins that is encoded by the nucleotide sequence of DNA. This amino acid sequence, then, is a form of genetic information. By convention, the amino acid sequence is read from the N-terminal end of the polypeptide chain through to the C-terminal end. As an example, every molecule of ribonuclease A from bovine pancreas has the same amino acid sequence, beginning with N-terminal lysine at position 1 and ending with C-terminal valine at position 124 (Figure 5.6). Given the possibility of any of the 20 amino acids at each position, the number of unique amino acid sequences is astronomically large. The astounding sequence variation possible within polypeptide chains provides a key insight into the incredible functional diversity of protein molecules in biological systems, which is discussed shortly.
 

Figure 5.6  Bovine pancreatic ribonuclease A contains 124 amino acid residues, none of which are tryptophan. Four intrachain disulfide bridges (S–S) form cross-links in this polypeptide between Cys26 and Cys84, Cys40 and Cys95, Cys58 and Cys110, and Cys65 and Cys72. These disulfides are depicted by yellow bars.  

5.2 • Architecture of Protein Molecules

Protein Shape

As a first approximation, proteins can be assigned to one of three global classes  on the basis of shape and solubility: fibrous, globular, or membrane (Figure 5.7). Fibrous proteins tend to have relatively simple, regular linear structures. These proteins often serve structural roles in cells. Typically, they are insoluble in water or in dilute salt solutions.

(click on each Protein to see animation)

Figure 5.7   (a) Proteins having structural roles in cells are typically fibrous and often water insoluble. Collagen is a good example. Collagen is composed of three polypeptide chains that intertwine. (b) Soluble proteins serving metabolic functions can be characterized as compactly folded globular molecules, such as myoglobin. The folding pattern puts hydrophilic amino acid side chains on the outside and buries hydrophobic side chains in the interior, making the protein highly water soluble. (c) Membrane proteins fold so that hydrophobic amino acid side chains are exposed in their membrane-associated regions. The portions of membrane proteins extending into or exposed at the aqueous environments are hydrophilic in character, like soluble proteins. Bacteriorhodopsin is a typical membrane protein; it binds the light-absorbing pigment, cis-retinal, shown here in red. (a, b, Irving Geis)


In contrast, globular proteins are roughly spherical in shape. The polypeptide chain is compactly folded so that hydrophobic amino acid side chains are in the interior of the molecule and the hydrophilic side chains are on the outside exposed to the solvent, water. Consequently, globular proteins are usually very soluble in aqueous solutions. Most soluble proteins of the cell, such as the cytosolic enzymes, are globular in shape. Membrane proteins are found in association with the various membrane systems of cells. For interaction with the nonpolar phase within membranes, membrane proteins have hydrophobic amino acid side chains oriented outward. As such, membrane proteins are insoluble in aqueous solutions but can be solubilized in solutions of detergents. Membrane proteins characteristically have fewer hydrophilic amino acids than cytosolic proteins.

 

A Deeper Look
The Virtually Limitless Number of Different Amino Acid Sequences
Given 20 different amino acids, a polypeptide chain of n residues can have any one of 20n possible sequence arrangements. To portray this, consider the number of tripeptides possible if there were only three different amino acids, A, B, and C (tripeptide = 3 = n; n3 = 33 = 27):

For a polypeptide chain of 100 residues in length, a rather modest size, the number of possible sequences is 20100, or because 20 = 101.3, 10130 unique possibilities. These numbers are more than astronomical! Because an average protein molecule of 100 residues would have a mass of 13,800 daltons (average molecular mass of an amino acid residue = 138), 10130 such molecules would have a mass of 1.38 x 10134 daltons. The mass of the observable universe is estimated to be 1080 proton masses

AAA    BBB    CCC
AAB    BBA    CCA
AAC    BBC    CCB
ABA     BAB    CBC
ACA    BCB    CAC
ABC    BAA    CBA
ACB    BCC    CAB
ABB    BAC    CBB
ACC    BCA    CAA

(about 1080 daltons). Thus, the universe lacks enough material to make just one molecule of each possible polypeptide sequence for a protein only 100 residues in length.

The Levels of Protein Structure

The architecture of protein molecules is quite complex. Nevertheless, this complexity can be resolved by defining various levels of structural organization.

Primary Structure

The amino acid sequence is the primary (1°) structure of a protein, such as that shown in Figure 5.6, for example.

Secondary Structure

Through hydrogen bonding interactions between adjacent amino acid residues (discussed in detail in Chapter 6), the polypeptide chain can arrange itself into characteristic helical or pleated segments. These segments constitute structural conformities, so-called regular structures,that extend along one dimension, like the coils of a spring. Such architectural features of a protein are designated secondary (2°) structures (Figure 5.8). Secondary structures are just one of the higher levels of structure that represent the three-dimensional arrangement of the polypeptide in space.

 

Figure 5.8   Two structural motifs that arrange the primary structure of proteins into a higher level of organization predominate in proteins: the a-helix and the b-pleated strand. Atomic representations of these secondary structures are shown here, along with the symbols used by structural chemists to represent them: the flat, helical ribbon for the a-helix and the flat, wide arrow for b-structures. Both of these structures owe their stability to the formation of hydrogen bonds between N–H and O=C functions along the polypeptide backbone (see Chapter 6).   

Tertiary Structure

When the polypeptide chains of protein molecules bend and fold in order to assume a more compact three-dimensional shape, a tertiary (3°) level of structure is generated (Figure 5.9). It is by virtue of their tertiary structure that proteins adopt a globular shape. A globular conformation gives the lowest surface-to-volume ratio, minimizing interaction of the protein with the surrounding environment.

 

 

 

Figure 5.9   Folding of the polypeptide chain into a compact, roughly spherical conformation creates the tertiary level of protein structure. (a) The primary structure and (b) a representation of the tertiary structure of chymotrypsin, a proteolytic enzyme, are shown here. The tertiary representation in (b) shows the course of the chymotrypsin folding pattern by successive numbering of the amino acids in its sequence. (Residues 14 and 15 and 147 and 148 are missing because these residues are removed when chymotrypsin is formed from its larger precursor, chymotrypsinogen.) The ribbon diagram depicts the three-dimensional track of the polypeptide in space.  

 

Quaternary Structure

Many proteins consist of two or more interacting polypeptide chains of characteristic tertiary structure, each of which is commonly referred to as a subunit of the protein. Subunit organization constitutes another level in the hierarchy of protein structure, defined as the protein’s quaternary (4°) structure (Figure 5.10). Questions of quaternary structure address the various kinds of subunits within a protein molecule, the number of each, and the ways in which they interact with one another.

Figure 5.10   Hemoglobin, which consists of two a and two b polypeptide chains, is an example of the quaternary level of protein structure. In this drawing, the b-chains are the two uppermost polypeptides and the two a-chains are the lower half of the molecule. The two closest chains (darkest colored) are the b2-chain (upper left) and the a1-chain (lower right). The heme groups of the four globin chains are represented by rectangles with spheres (the heme iron atom). Note the symmetry of this macromolecular arrangement. (Irving Geis)

             Whereas the primary structure of a protein is determined by the covalently linked amino acid residues in the polypeptide backbone, secondary and higher orders of structure are determined principally by noncovalent forces such as hydrogen bonds and ionic, van der Waals, and hydrophobic interactions. It is important to emphasize that all the information necessary for a protein molecule to achieve its intricate architecture is contained within its 1° structure, that is, within the amino acid sequence of its polypeptide chain(s). Chapter 6 presents a detailed discussion of the 2°, 3°, and 4° structure of protein molecules.

Protein Conformation

The overall three-dimensional architecture of a protein is generally referred to as its conformation. This term is not to be confused with configuration, which denotes the geometric possibilities for a particular set of atoms (Figure 5.11). In going from one configuration to another, covalent bonds must be broken and rearranged. In contrast, the conformational possibilities of a molecule are achieved without breaking any covalent bonds. In proteins, rotations about each of the single bonds along the peptide backbone have the potential to alter the course of the polypeptide chain in three-dimensional space. These rotational possibilities create many possible orientations for the protein chain, referred to as its conformational possibilities. Of the great number of theoretical conformations a given protein might adopt, only a very few are favored energetically under physiological conditions. At this time, the rules that direct the folding of protein chains into energetically favorable conformations are still not entirely clear; accordingly, they are the subject of intensive contemporary research.

Figure 5.11   Configuration and conformation are not synonymous. (a) Rearrange-ments between configurational alternatives of a molecule can be achieved only by breaking and remaking bonds, as in the transformation between the D- and L-configurations of glyceraldehyde. No possible rotational reorientation of bonds linking the atoms of D-glyceraldehyde yields geometric identity with L-glyceraldehyde, even though they are mirror images of each other. (b) The intrinsic free rotation around single covalent bonds creates a great variety of three-dimensional conformations, even for relatively simple molecules. Consider 1,2-dichloroethane. Viewed end-on in a Newman projection, three principal rotational orientations or conformations predominate. Steric repulsion between eclipsed and partially eclipsed conformations keeps the possibilities at a reasonable number. (c) Imagine the conformational possibilities for a protein in which two of every three bonds along its backbone are freely rotating single bonds.

            Later we return to an analysis of the 1° structure of proteins and the methodology used in determining the amino acid sequence of polypeptide chains, but let’s first consider the extraordinary variety and functional diversity of these most interesting macromolecules.

5.3 • The Many Biological Functions of Proteins

Proteins are the agents of biological function. Virtually every cellular activity is dependent on one or more particular proteins. Thus, a convenient way to classify the enormous number of proteins is by the biological roles they fill. Table 5.3 summarizes the classification of proteins by function and gives examples of representative members of each class.

Table 5.3
Biological Functions of Proteins and Some Representative Examples
Functional Class           Examples
Enzymes           Ribonuclease
Trypsin
Phosphofructokinase
Alcohol dehydrogenase
Catalase
“Malic” enzyme
Regulatory proteins       Insulin
Somatotropin

Thyrotropin
lac repressor
NF1 (nuclear factor 1)
Catabolite activator protein (CAP)
AP1
Transport proteins        Hemoglobin
Serum albumin
Glucose transporter
Storage proteins           Ovalbumin
Casein
Zein
Phaseolin
Ferritin
Contractile and motile proteins Actin
Myosin
Tubulin
Dynein
Kinesin
Structural proteins a-Keratin
Collagen
Elastin
Fibroin
Proteoglycans
Scaffold proteins Grb 2
crk
shc
stat
IRS-1
Protective and exploitive proteins Immunoglobulins
Thrombin
Fibrinogen
Antifreeze proteins
Snake and bee venom proteins
Diphtheria toxin
Ricin
Exotic proteins  Monellin Resilin
Glue proteins

Enzymes

By far the largest class of proteins is enzymes. More than 3000 different enzymes are listed in Enzyme Nomenclature, the standard reference volume on enzyme classification. Enzymes are catalysts that accelerate the rates of biological reactions. Each enzyme is very specific in its function and acts only in a particular metabolic reaction. Virtually every step in metabolism is catalyzed by an enzyme. The catalytic power of enzymes far exceeds that of synthetic catalysts. Enzymes can enhance reaction rates in cells as much as 1016 times the uncatalyzed rate. Enzymes are systematically classified according to the nature of the reaction that they catalyze, such as the transfer of a phosphate group (phosphotransferase) or an oxidation-reduction (oxidoreductase). The formal names of enzymes come from the particular reaction within the class that they catalyze, as in ATP:D-fructose-6-phosphate 1-phosphotransferase and alcohol: NAD+ oxido-reductase. Often, enzymes have common names in addition to their formal names. ATP:D-fructose-6-phosphate 1-phosphotransferase is more commonly known as phosphofructokinase (kinase is a common name given to ATP-dependent phosphotransferases). Similarly, alcohol:NAD+ oxidoreductase is casually referred to as alcohol dehydrogenase. The reactions catalyzed by these two enzymes are shown in Figure 5.12. Other enzymes are known by trivial names that have historical roots, such as catalase (systematic name, hydrogen-peroxide:hydrogen-peroxide oxidoreductase), and sometimes these trivial names have descriptive connotations as well, as in malic enzyme (systematic name, L-malate:NADP+ oxidoreductase).


  (click on image to search the Enzyme Nomenclature database)
Figure 5.12 
 
Enzymes are classified according to the specific biological reaction that they catalyze. Cells contain thousands of different enzymes. Two common examples drawn from carbohydrate metabolism are phosphofructokinase (PFK), or, more precisely, ATP:d-fructose-6-phosphate 1-phosphotransferase, and alcohol dehydrogenase (ADH), or alcohol:NAD+ oxidoreductase, which catalyze the reactions shown here.

 

Regulatory Proteins

A number of proteins do not perform any obvious chemical transformation but nevertheless can regulate the ability of other proteins to carry out their physiological functions. Such proteins are referred to as regulatory proteins. A well-known example is insulin, the hormone regulating glucose metabolism in animals. Insulin is a relatively small protein (5.7 kD) and consists of two polypeptide chains held together by disulfide cross-bridges. Other hormones that are also proteins include pituitary somatotropin (21 kD) and thyrotropin (28 kD), which stimulates the thyroid gland. Another group of regulatory proteins is involved in the regulation of gene expression. These proteins characteristically act by binding to DNA sequences that are adjacent to coding regions of genes, either activating or inhibiting the transcription of genetic information into RNA. Examples include repressors, which, because they block transcription, are considered negative control elements.
            A prokaryotic representative is lac repressor (37 kD), which controls expression of the enzyme system responsible for the metabolism of lactose (milk sugar); a mammalian example is NF1 (nuclear factor 1, 60 kD), which inhibits transcription of the gene encoding the b-globin polypeptide chain of hemoglobin. Positively acting control elements are also known. For example, the E. coli catabolite gene activator protein (CAP) (44 kD), under appropriate metabolic conditions, can bind to specific sites along the E. coli chromosome and increase the rate of transcription of adjacent genes. The mammalian AP1 is a heterodimeric transcription factor composed of one polypeptide from the Jun family of gene-regulatory proteins and one polypeptide from the Fos family of gene-regulatory proteins. AP1 activates expression of the b-globin gene. These various DNA-binding regulatory proteins often possess characteristic structural features, such as helix-turn-helix, leucine zipper, and zinc finger motifs (see Chapter 31).

Transport Proteins

A third class of proteins is the transport proteins. These proteins function to transport specific substances from one place to another. One type of transport is exemplified by the transport of oxygen from the lungs to the tissues by hemoglobin (Figure 5.13a) or by the transport of fatty acids from adipose tissue to various organs by the blood protein serum albumin. A very different type is the transport of metabolites across permeability barriers such as cell membranes, as mediated by specific membrane proteins. These membrane transport proteins take up metabolite molecules on one side of a membrane, transport them across the membrane, and release them on the other side. Examples include the transport proteins responsible for the uptake of essential nutrients into the cell, such as glucose or amino acids (Figure 5.13b). All naturally occurring membrane transport proteins studied thus far form channels in the membrane through which the transported substances are passed.

Figure 5.13   Two basic types of biological transport are (a) transport within or between different cells or tissues and (b) transport into or out of cells. Proteins function in both of these phenomena. For example, the protein hemoglobin transports oxygen from the lungs to actively respiring tissues. Transport proteins of the other type are localized in cellular membranes, where they function in the uptake of specific nutrients, such as glucose (shown here) and amino acids, or the export of metabolites and waste products.( ©Irving Geis)

 

 Storage Proteins

Proteins whose biological function is to provide a reservoir of an essential nutrient are called storage proteins. Because proteins are amino acid polymers and because nitrogen is commonly a limiting nutrient for growth, organisms have exploited proteins as a means to provide sufficient nitrogen in times of need. For example, ovalbumin, the protein of egg white, provides the developing bird embryo with a source of nitrogen during its isolation within the egg. Casein is the most abundant protein of milk and thus the major nitrogen source for mammalian infants. The seeds of higher plants often contain as much as 60% storage protein to make the germinating seed nitrogen-sufficient during this crucial period of plant development. In corn (Zea mays or maize), a family of low molecular weight proteins in the kernel called zeins serve this purpose; peas (the seeds of Phaseolus vulgaris) contain a storage protein called phaseolin. The use of proteins as a reservoir of nitrogen is more efficient than storing an equivalent amount of amino acids. Not only is the osmotic pressure minimized, but the solvent capacity of the cell is taxed less in solvating one molecule of a polypeptide than in dissolving, for example, 100 molecules of free amino acids. Proteins can also serve to store nutrients other than the more obvious elements composing amino acids (N, C, H, O, and S). As an example, ferritin is a protein found in animal tissues that binds iron, retaining this essential metal so that it is available for the synthesis of important iron-containing proteins such as hemoglobin. One molecule of ferritin (460 kD) binds as many as 4500 atoms of iron (35% by weight).

Contractile and Motile Proteins

Certain proteins endow cells with unique capabilities for movement. Cell division, muscle contraction, and cell motility represent some of the ways in which cells execute motion. The contractile and motile proteins underlying these motions share a common property: they are filamentous or polymerize to form filaments. Examples include actin and myosin, the filamentous proteins forming the contractile systems of cells, and tubulin, the major component of microtubules (the filaments involved in the mitotic spindle of cell division as well as in flagella and cilia). Another class of proteins involved in movement includes dynein and kinesin, so-called motor proteins that drive the movement of vesicles, granules, and organelles along microtubules serving as established cytoskeletal “tracks.”

Structural Proteins

An apparently passive but very important role of proteins is their function in creating and maintaining biological structures. Structural proteins provide strength and protection to cells and tissues. Monomeric units of structural proteins typically polymerize to generate long fibers (as in hair) or protective sheets of fibrous arrays, as in cowhide (leather). a-Keratins are insoluble fibrous proteins making up hair, horns, and fingernails. Collagen, another insoluble fibrous protein, is found in bone, connective tissue, tendons, cartilage, and hide, where it forms inelastic fibrils of great strength. One-third of the total protein in a vertebrate animal is collagen. A structural protein having elastic properties is, appropriately, elastin, an important component of ligaments. Because of the way elastin monomers are cross-linked in forming polymers, elastin can stretch in two dimensions. Certain insects make a structurally useful protein, fibroin (a b-keratin), the major constituent of cocoons (silk) and spider webs. An important protective barrier for animal cells is the extracellular matrix containingcollagen and proteoglycans, covalent protein-polysaccharide complexes that cushion and lubricate.

Scaffold Proteins (Adapter Proteins)

Some proteins play a recently discovered role in the complex pathways of cellular response to hormones and growth factors. These proteins, the scaffold or adapter proteins, have a modular organization in which specific parts (modules) of the protein’s structure recognize and bind certain structural elements in other proteins through protein-protein interactions. For example, SH2 modules bind to proteins in which a tyrosine residue has become phosphorylated on its phenolic -OH, and SH3 modules bind to proteins having a characteristic grouping of proline residues. Others include PH modules, which bind to membranes, and PDZ-containing proteins, which bind specifically to the C-terminal amino acid of certain proteins. Because scaffold proteins typically possess several of these different kinds of modules, they can act as a scaffold onto which a set of different proteins is assembled into a multiprotein complex. Such assemblages are typically involved in coordinating and communicating the many intracellular responses to hormones or other signalling molecules (Figure 5.14). Anchoring (or targeting) proteins are proteins that bind other proteins, causing them to associate with other structures in the cell. A family of anchoring proteins, known as AKAP or A kinase anchoring proteins, exists in which specific AKAP members bind the regulatory enzyme protein kinase A (PKA) to particular subcellular compartments. For example, AKAP100 targets PKA to the endoplasmic reticulum, whereas AKAP79 targets PKA to the plasma membrane.
 

 

Figure 5.14   Diagram of the N ® C sequence organization of the adapter protein insulin receptor substrate-1 (IRS-1) showing the various amino acid sequences (in one-letter code) that contain tyrosine (Y) residues that are potential sites for phosphorylation. The other adapter proteins that recognize various of these sites are shown as Grb2, SHPTP-2, and p85aPIK. Insulin binding to the insulin receptor activates the enzymatic activity that phosphorylates these Tyr residues on IRS-1. (Adapted from White, M. F., and Kahn, C. R., 1994. Journal of Biological Chemistry 269:1–4.)

 

 

 

Protective and Exploitive Proteins

In contrast to the passive protective nature of some structural proteins, another group can be more aptly classified as protective or exploitive proteins because of their biologically active role in cell defense, protection, or exploitation. Prominent among the protective proteins are the immunoglobulins or antibodies produced by the lymphocytes of vertebrates. Antibodies have the remarkable ability to “ignore” molecules that are an intrinsic part of the host organism, yet they can specifically recognize and neutralize “foreign” molecules resulting from the invasion of the organism by bacteria, viruses, or other infectious agents. Another group of protective proteins is the blood-clotting proteins, thrombin and fibrinogen, which prevent the loss of blood when the circulatory system is damaged. Arctic and Antarctic fishes have antifreeze proteins to protect their blood against freezing in the below-zero temperatures of high-latitude seas. In addition, various proteins serve defensive or exploitive roles for organisms, including the lytic and neurotoxic proteins of snake and bee venoms and toxic plant proteins, such as ricin, whose apparent purpose is to thwart predation by herbivores. Another class of exploitive proteins includes the toxins produced by bacteria, such as diphtheria toxin and cholera toxin.

Exotic Proteins

Some proteins display rather exotic functions that do not quite fit the previous classifications. Monellin, a protein found in an African plant, has a very sweet taste and is being considered as an artificial sweetener for human consumption. Resilin, a protein having exceptional elastic properties, is found in the hinges of insect wings. Certain marine organisms such as mussels secrete glue proteins, allowing them to attach firmly to hard surfaces. It is worth repeating that the great diversity of function in proteins, as reflected in this survey, is attained using just 20 amino acids.

5.4 • Some Proteins Have Chemical Groups Other Than Amino Acids

Many proteins consist of only amino acids and contain no other chemical groups. The enzyme ribonuclease and the contractile protein actin are two such examples. Such proteins are called simple proteins. However, many other proteins contain various chemical constituents as an integral part of their structure. These proteins are termed conjugated proteins (Table 5.4). If the nonprotein part is crucial to the protein’s function, it is referred to as a prosthetic group. If the nonprotein moiety is not covalently linked to the protein, it can usually be removed by denaturing the protein structure. However, if the conjugate is covalently joined to the protein, it may be necessary to carry out acid hydrolysis of the protein into its component amino acids in order to release it. Conjugated proteins are typically classified according to the chemical nature of their nonamino acid component; a representative selection of them is given here and in Table 5.4. (Note that comparisons of Tables 5.3 and 5.4 reveal two distinctly different ways of considering the nature of proteins—function versus chemistry.)

Table 5.4
Representative Conjugated Proteins
Class              Prosthetic Group
Percent by
Weight (approx.)
Glycoproteins contain carbohydrate
Fibronectin
g-Globulin
Proteoglycan
 
Lipoproteins contain lipid
Blood plasma lipoproteins:
  High density lipoprotein (HDL) (a-lipoprotein)
  Low density lipoprotein (LDL) (b-lipoprotein)

 

Triacylglycerols, phospholipids, cholesterol
Triacylglycerols, phospholipids, cholesterol 

 

75
67

Nucleoprotein complexes contain nucleic acid
Ribosomes      
Tobacco mosaic virus  
Adenovirus      
HIV-1 (AIDS virus)

 

RNA   
RNA   
DNA
RNA

60
5
Phosphoproteins contain phosphate
Casein 
Glycogen phosphorylase a

 

Phosphate groups
Phosphate groups

Metalloproteins contain metal atoms
Ferritin
Alcohol dehydrogenase
Cytochrome oxidase    
Nitrogenase     
Pyruvate carboxylase   

 

Iron
Zinc
Copper and iron
Molybdenum and iron
Manganese      

35

 

Hemoproteins contain heme
Hemoglobin
Cytochrome c
Catalase
Nitrate reductase
Ammonium oxidase
 
Flavoproteins contain flavin
Succinate dehydrogenase         
NADH dehydrogenase
Dihydroorotate dehydrogenase
Sulfite reductase          

 

FAD
FMN
FAD and FMN
FAD and FMN

Glycoproteins.    Glycoproteins are proteins that contain carbohydrate. Proteins destined for an extracellular location are characteristically glycoproteins. For example, fibronectin and proteoglycans are important components of the extracellular matrix that surrounds the cells of most tissues in animals. Immunoglobulin G molecules are the principal antibody species found circulating free in the blood plasma. Many membrane proteins are glycosylated on their extracellular segments.

Lipoproteins.    Blood plasma lipoproteins are prominent examples of the class of proteins conjugated with lipid. The plasma lipoproteins function primarily in the transport of lipids to sites of active membrane synthesis. Serum levels of low density lipoproteins (LDLs) are often used as a clinical index of susceptibility to vascular disease.

Nucleoproteins.    Nucleoprotein conjugates have many roles in the storage and transmission of genetic information. Ribosomes are the sites of protein synthesis. Virus particles and even chromosomes are protein-nucleic acid complexes.

Phosphoproteins.    These proteins have phosphate groups esterified to the hydroxyls of serine, threonine, or tyrosine residues. Casein, the major protein of milk, contains many phosphates and serves to bring essential phosphorus to the growing infant. Many key steps in metabolism are regulated between states of activity or inactivity, depending on the presence or absence of phosphate groups on proteins, as we shall see in Chapter 15. Glycogen phosphorylase a is one well-studied example.

Metalloproteins.    Metalloproteins are either metal storage forms, as in the case of ferritin, or enzymes in which the metal atom participates in a catalytically important manner. We encounter many examples throughout this book of the vital metabolic functions served by metalloenzymes.

Figure 5.15   Heme consists of protoporphyrin IX and an iron atom. Protoporphyrin, a highly conjugated system of double bonds, is composed of four 5-membered heterocyclic rings (pyrroles) fused together to form a tetrapyrrole macrocycle. The specific isomeric arrangement of methyl, vinyl, and propionate side chains shown is protoporphyrin IX. Coordination of an atom of ferrous iron (Fe2+) by the four pyrrole nitrogen atoms yields heme.

  

 

 

Hemoproteins.    These proteins are actually a subclass of metalloproteins because their prosthetic group is heme, the name given to iron protoporphyrin IX (Figure 5.15). Because heme-containing proteins enjoy so many prominent biological functions, they are considered a class by themselves.

Flavoproteins.    Flavin is an essential substance for the activity of a number of important oxidoreductases. We discuss the chemistry of flavin and its derivatives, FMN and FAD, in the chapter on electron transport and oxidative phosphorylation (Chapter 21).

5.5 • Reactions of Peptides and Proteins

 The chemical properties of peptides and proteins are most easily considered in terms of the chemistry of their component functional groups. That is, they possess reactive amino and carboxyl termini and they display reactions characteristic of the chemistry of the R groups of their component amino acids. These reactions are familiar to us from Chapter 4 and from the study of organic chemistry and need not be repeated here.

5.6 • Purification of Protein Mixtures

Cells contain thousands of different proteins. A major problem for protein chemists is to purify a chosen protein so that they can study its specific properties in the absence of other proteins. Proteins have been separated and purified on the basis of their two prominent physical properties: size and electrical charge. A more direct approach is to employ affinity purification strategies that take advantage of the biological function or similar specific recognition properties of a protein (see Table 5.5 and Chapter Appendix).