computationally designing enzymatic catalysts for any chemical reaction. Despite recent progress (10, 11), creating enzymes for chemical transformations not efficiently catalyzed by naturally occurring enzymes remains a major challenge. Here, we describe (i) general computational methods for constructing active sites for multistep reactions consisting of superimpod reaction intermediates and transition states (TS) surrounded by protein functional groups in orientations optimal for catalysis (Fig. 1) and (ii) the u of this methodology to design novel catalysts for a retro-aldol reaction in which a carbon-carbon bond is broken in a nonnatural (i.e., not found in biological systems) substrate: 4-hydroxy-4-(6-methoxy-2-naphthyl)-2-butanone (Fig. 2A) (12).The first step in the computational design of an enzyme is to define one or more potential catalytic mechanisms for the desired reaction. For the retro-aldola reaction, we focud on mechanisms involving enamine catalysis by lysine via a Schiff ba or imine intermediate (13, 14). As shown in simplified form in Fig. 2B, the reaction proceeds in veral distinct steps, involving acid-ba catalysis by either amino acid side chains or water molecules.First, nucleophilic attack of lysine on the ketone of the substrate forms a carbinolamine intermediate, which eliminates a water molecule to form the imine/iminium species. Next,carbon-carbon bond cleavage is triggered by the deprotonation of the β-alcohol, with the iminium acting as an electron sink. Finally, the enamine tautomerizes to an imine that is then hydrolyzed to relea the covalently bound product and free the enzyme for another round of catalys
is.The cond step of the design process is the identification of protein scaffolds that can accommodate the designed TS enmble described above. To account for the multistep reaction pathway, we extended our enzyme design methodology (15) to allow the design of composite TS sites that are simultaneously compatible with multiple TS and reaction intermediates (16). Using this method, we generated design models using the four catalytic motifs shown schematically in Fig. 2C, which apply different constellations of catalytic
pond
residues to facilitate carbinolamine formation and water elimination, carbon-carbon bond cleavage, and relea of bound product.
Becau the probability of accurately reconstructing a given three-dimensional (3D) active site in an input protein scaffold is extremely small, it is esntial to consider a very large t of active-site possibilities. We generated such a t by simultaneously varying (i) the internal degrees of freedom of the composite TS (fig. S1B), (ii) the orientation of the catalytic side chains with respect to the composite TS (fig. S3), within ranges that are consistent with catalysis, and (iii) the conformations of the catalytic side chains (fig. S3).For example, in a reprentative calculation for motif III, we arched for placements of a total 1.4 × 1018 possible 3D active sites (table S3) at all triples or quadruples of backbone positions surrounding binding pockets in 71 different protein scaffolds (table
S4). This
combinatorial matching resulted in a total of 181,555 distinct solutions for the placement of the composite TS and the surrounding catalytic residues. Through extensive pruning at
multiple levels, and by breaking the combinatoric explosion via hashing, the RottaMatch algorithm (15) is able to rapidly eliminate most active-site possibilities in a given scaffold that are unfavorable as a result of poor catalytic geometry or significant steric clashes with very little computational cost. After optimization of the composite TS rigid body orientation and the identities and conformations of the surrounding residues, a total of 72 designs with 8to 20 amino acid identity changes in 10 different scaffolds were lected for experimental characterization bad on the predicted TS binding energy, the extent of satisfaction of the catalytic geometry, the packing around the active lysine, and the consistency of side-chain conformation after side-chain repacking in the prence and abnce of the TS model (16).Genes encoding the designs were synthesized and the proteins were expresd and purified
NIH-PA Author Manuscript
NIH-PA Author Manuscript
necklaceNIH-PA Author Manuscript
from Escherichia coli ; soluble purified protein was obtained for 70 of the 72 expresd designs.Retro-aldola activity was monitored via a fluorescence-bad assay of product formation (12) for each of the designs, and the results are summarized in Table 1. Our initial 12designs ud the first active site shown in Fig. 2C, which involves a charged side-chain (Lys-Asp-Lys)–mediated proton transfer scheme rembling that in D -2-deoxyribo-5-phosphate aldola (13). Of the designs, two showed slow enaminone formation with 2,4-pentandione (17), which is indicative of a nucleophilic lysine, but none displayed retro-aldola activity (16). Ten designs were made for the cond, much simpler active site shown in Fig. 2C, which involves a single imine-forming lysine in a hydrophobic pocket similar to aldola catalytic antibodies; of the designs, one formed the enaminone, but none were catalytically active. The third active site incorporates a His-Asp dyad as a general ba to abstract a proton from the β-alcohol; of the 14 designs tested, 10 exhibited stable enaminone formation, and 8 had detectable retro-aldola activity. In the final active site, we experimented with the explicit modeling of a water molecule, positioned via side-chain hydrogen-bonding groups, which shuttles between stabilizing the carbinolamine and abstracting the proton from the hydroxyl. Of the 36 designs tested, 20 formed the enaminone and 23 (with 11 distinct positi
flushing
ons for the catalytic lysine) had significant retroaldola activity, with rate enhancements up to four orders of magnitude over the uncatalyzed reaction (18).The active designs occur on five different protein scaffolds belonging to the trio phosphate isomera (TIM)–barrel and jelly-roll folds. The most active designs exhibited multiple turnover kinetics; the linear progress curves for designs RA60 and RA61, for example,continue unchanged for more than 20 turnovers. Progress curves [Fig. 3A and supporting online material (SOM)] show a range of kinetic behaviors: In some cas (RA45), there is a pronounced lag pha, likely associated with slow imine formation, whereas in others
(RA61), there is little or no lag, and for a third t, there is an initial burst followed by a
mossadslower steady-state rate (RA22). Notably, simple linear kinetics are obrved for the designs in the relatively open jelly-roll scaffold, whereas more complex kinetics are obrved for the TIM-barrel designs, which have more enclod active-site pockets that may restrict substrate access and product relea. To obtain k cat and K M estimates for veral of the best enzymes (Fig. 3B), we extracted reaction velocities from the steady-state portions of the progress curves and assumed simple Michaelis-Menten kinetics. Given the simplifications, the values are best viewed as phenomenological; future characterization will be required to define rate constants in a particular kinetic model. The apparent k cat and K M values are given in Table 2; k uncat was determined fro
m measurements of the reaction progression in the abnce of enzyme and is clo to previously determined values (18). k cat /k uncat for the most active designs is 2 × 104. The catalytic proficiency of the designs is far from that of naturally occurring enzymes, which have a k cat /K M of about 1 M −1 s −1 (Table 2); the very low k cat value is probably associated with low reactivity of the imine-forming lysine. Rates for all active designs with 270 µM substrate are reported in table S1. For each of the 11catalytic lysine positions, a “knockout” mutation to methionine dramatically decread the activity or, more commonly, abolished catalysis completely, verifying that the obrved activity was due to the designed active site.
Design models for veral of the most active designs with catalytic motif IV are shown in Fig. 4, A to C. Design RA60 (Fig. 4A) is on a jelly-roll scaffold, and RA45 (Fig. 4C) and RA46 (Fig. 4B) are on a TIM-barrel scaffold. The imine-forming lysine, the hydrogen-
bonding residues coordinating the bridging water molecules, and the designed hydrophobic pocket (which binds the aromatic portion of the substrate) are clearly evident in all three designs.
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript
To evaluate the accuracy of the design models, we solved the structures of two of the designs by x-ray crystallography (Fig. 4, D and E). The 2.2 Å resolution structure of the Ser 210→Ala 210 (S210A) variant of RA22 (Fig. 4D) (19) shows that the designed catalytic residues Lys 159, His 233, and Asp 53 superimpo well on the original design model, and the remainder of the active site is nearly identical to the design. The 1.9 Å resolution structure of the M48K variant of RA61 likewi reveals an active site very clo to that of the design model, with only His 46 and Trp 178 in alternative rotamer conformations, perhaps resulting from the abnce of substrate in the crystal structure (Fig. 4E). Both crystal structures differ most significantly from the designs in the loops surrounding the active site; explicitly incorporating backbone flexibility in the regions during the design process could yield improved enzymes in the future.Each propod catalytic mechanism can be treated as an experimentally testable hypothesis to be tested by multiple independent design experiments. Our lack of success with the first active sites that were tested contrasts markedly with our relatively high success rate with the active site in which proton shuffling is carried out by a bound water molecule rather than by amino acid side chains acting as acid-ba catalysts. The charged polar networks in highly optimized naturally occurring enzymes require exquisite control over functio
nal group positioning and protonation states, as well as the satisfaction of the hydrogen-bonding potential of the buried polar residues, which leads to still more extended hydrogen-bond networks. Computational design of such extended polar networks is exceptionally challenging becau of the difficulty of accurately computing the free energies of buried polar interactions, particularly the influence of polarizability on electrostatic free energies and the delicate balance between the cost of desolvation and the gain in favorable intraprotein electrostatic and hydrogen-bonding interactions. The sampling problem also becomes increasingly formidable for more complex sites: The side-chain identity and conformation combinatorics dealt with by hashing in RottaMatch become intractable for sites consisting of five or more long polar side chains, which for accurate reprentation may
require as many as 1000 rotamer conformations each. At the other extreme, bound water molecules offer considerable versatility, becau they can readily reorient to switch between acting as hydrogen-bond acceptors and donors and involve neither delicate free-energy tradeoffs nor intricate interaction networks.
It is tempting to speculate that our computationally designed enzymes remble primordial enzymes more than they remble highly refined modern-day enzymes. The ability to design simultaneously only three to four catalytic residues parallels the infinitesimal probability that, early in evolution, more
than three to four residues would have happened to be positioned appropriately for catalysis; some of the functions played by exquisitely
positioned side chains in modern enzymes may have been played by water molecules earlier in evolutionary history.高效课堂教学模式
Although our results demonstrate that novel enzyme activities can be designed from scratch and indicate the catalytic strategies that are most accessible to nascent enzymes, there is still a significant gap between the activities of our designed catalysts and tho of naturally
occurring enzymes. Narrowing this gap prents an exciting prospect for future work: What additional features have to be incorporated into the design process to achieve catalytic
activities approaching tho of naturally occurring enzymes? The clo agreement between the two crystal structures and the design models gives credence to our strategy of testing
hypothes about catalytic mechanisms by generating and testing the corresponding designs;indeed, almost any idea about catalysis can be readily tested by incorporation into the
computational design procedure. Determining what is missing from the current generation of designs
and how it can be incorporated into a next generation of more active designed
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript
catalysts will be an exciting challenge that should unite the fields of enzymology and computational protein design in the years to come.
Supplementary Material Refer to Web version on PubMed Central for supplementary material.Acknowledgments Kinetic parameters of the designs reported here were determined at the University of Washington. For lected designs, the kinetic parameters were confirmed by independent experiments performed at the Scripps Rearch Institute. We thank R. Fuller for technical assistance. Thorough testing of the four catalytic motifs was made possible through gene synthesis by Codon Devices. We thank Rotta@Home participants for their valuable contributions of computer time. E.A.A. is funded by a Ruth L. Kirschstein National Rearch Service Award. This work was supported by the Defen Advanced Rearch Projects Agency and HHMI. Coordinates a酒店面试问题及回答
nd structure factors for the crystal structures of RA22 variant S210A and RA61 variant M48K were deposited with the Rearch Collaboratory for Structural Bioinformatics Protein Data Bank (PDB) under the accession numbers 3B5V and 3B5L, respectively. The xyz coordinates of the designs RA22, RA34, RA45, RA46, RA60, and RA61 are included with the SOM as a zipped archive.References and Notes 1. Ro DK, et al. Nature. 2006; 440:940. [PubMed: 16612385]2. Kirk O, Borchert TV, Fuglsang CC. Curr. Opin. Biotechnol. 2002; 13:345. [PubMed: 12323357]3. Jansn DB, Dinkla IJ, Poelarends GJ, Terpstra P. Environ. Microbiol. 2005; 7:1868. [PubMed:16309386]4. Hilvert D. Annu. Rev. Biochem. 2000; 69:751. [PubMed: 10966475]5. Seelig B, Szostak JW. Nature. 2007; 448:828. [PubMed: 17700701]6. Arnold FH, Volkov AA. Curr. Opin. Chem. Biol. 1999; 3:54. [PubMed: 10021399]7. Khersonsky O, Roodveldt C, Tawfik DS. Curr. Opin. Chem. Biol. 2006; 10:498. [PubMed:
16939713]
8. Kuhlman B, et al. Science. 2003; 302:1364. [PubMed: 14631033]观鸟大年
9. Looger LL, Dwyer MA, Smith JJ, Hellinga HW. Nature. 2003; 423:185. [PubMed: 12736688]
10. Bolon DN, Mayo SL. Proc. Natl. Acad. Sci. U.S.A. 2001; 98:14274. [PubMed: 11724958]
11. Kaplan J, DeGrado WF. Proc. Natl. Acad. Sci. U.S.A. 2004; 101:11566. [PubMed: 15292507]
12. Tanaka F, Fuller R, Shim H, Lerner RA, Barbas CF III. J. Mol. Biol. 2004; 335:1007. [PubMed:
14698295]
13. Heine A, et al. Science. 2001; 294:369. [PubMed: 11598300]
ours14. Fullerton SW, et al. Bioorg. Med. Chem. 2006; 14:3002. [PubMed: 16403639]
15. Zanghellini A, et al. Protein Sci. 2006; 15:2785. [PubMed: 17132862]
16. Materials and methods are available as supporting material on Science Online.
17. Wagner J, Lerner RA, Barbas CF III. Science. 1995; 270:1797. [PubMed: 8525368]
18. Tanaka F, Barbas CF III. J. Am. Chem. Soc. 2002; 124:3510. [PubMed: 11929232]
19. Single-letter abbreviations for the amino acid residues are as follows: A, Ala; C, Cys; D, Asp; E,
Glu; F, Phe; G, Gly; H, His; I, Ile; K, Lys; L, Leu; M, Met; N, Asn; P, Pro; Q, Gln; R, Arg; S, Ser;T, Thr; V, Val; W, Trp; and Y, Tyr.
四级英语培训20. Dantas G, Kuhlman B, Callender D, Wong M, Baker D. J. Mol. Biol. 2003; 332:449. [PubMed:
12948494]
21. Meiler J, Baker D. Proteins. 2006; 65:538. [PubMed: 16972285]
22. Press, WH.; Teukolsky, SA.; Vetterling, WT.; Flannery, BP. Numerical Recipes in FORTRAN:
The Art of Scientific Computing. ed. 2. Cambridge: Cambridge Univ. Press; 1992.
23. Clemente FR, Houk KN. J. Am. Chem. Soc. 2005; 127:11294. [PubMed: 16089458]
alleviation24. Porter CT, Bartlett GJ, Thornton JM. Nucleic Acids Res. 2004; 32:D129. [PubMed: 14681376]
25. Zhong G, et al. Angew. Chem. Int. Ed. Engl. 1998; 37:2481.
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Fig. 1.
Computational enzyme design protocol for a multistep reaction. The first step is to generate enmbles of models of each of the key intermediates and transition states in the reaction pathway in the context of a specific catalytic motif compod of protein functional groups.The models are then superimpod, bad on the protein functional group positions, to create an initial composite active-site description. Large enmbles of distinct 3D realization of the composite active sites are then generated by simultaneously varying the degrees of freedom of the composite TS, the orientation of the catalytic side chains relative to the
composite TS, and the internal conformation of the catalytic side chains. For each composite active-site description, candidate catalytic sites are generated in an input scaffold t by
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript