© The Author (2007). Published by Oxford University Press. All rights rerved. For Permissions, plea email: journals.permissions@oxfordjournals
1
Structural bioinformatics
中戏第一meta-PPISP: a Meta Web Server for Protein-Protein Interaction
Site Prediction
Sanbo Qin 1,2 and Huan-Xiang Zhou 1-3*装修房子的步骤
1
克林顿Institute of Molecular Biophysics and 2School of Computational Science and 3Department of Physics, Florida State University, Tallaha, Florida 32306, USA.
Associate Editor: Prof. Anna Tramontano
绝对安全期ABSTRACT Summary: A number of complementary methods have been devel-oped for predicting prot
ein-protein interaction sites. W e sought to increa prediction robustness and accuracy by combining results from different predictors, and report here a meta web rver, meta-PPISP, that is built on three individual web rvers: cons-PPISP (pipe.scs.fsu.edu/ppisp.html), Promate (bioportal.weizmann.ac.il/promate), and PINUP (sparks.informatics.iupui. edu/PINUP/). A linear regression method, using the raw scores of the three rvers as input, was trained on a t of 35 nonhomologous proteins. Cross validation showed that meta-PPISP outperforms all the three individual rv-ers. At coverages identical to tho of the individual methods, the accuracy of meta-PPISP is higher by 4.8 to 18.2 percentage points. Similar improvements in accuracy are also en on CAPRI and other targets. Availability: meta-PPISP can be accesd at pipe.scs.fsu.edu /meta-ppisp.html.
Contact: zhou@sb.fsu.edu; phone: (850) 645-1336; fax: (850) 644-7244.
高中知识Supplementary inf ormation: Data ts, linear regression coeffi-cients, and details of prediction results are shown at the site of the
meta-PPISP rver.
1 INTRODUCTION It is increasingly recognized that proteins function in the context of
multi-component complexes. Interfaces formed in protein com-plexes carry important structural and functional information. After
the publication of the first automated method for predicting resi-dues in protein-protein interfaces in 2001 (Zhou and Shan, 2001), there has been intensive efforts at developing such methods (e.g., Farilli et al., 2002; Ofran and Rost, 2003; Neuvirth et al., 2004; Chen and Zhou, 2005a; Liang et al., 2006; for a review, e Zhou and Qin, 2007). We are now prented the opportunity to combine different approaches for increasing prediction robustness and accu-racy. Such metamethods have been found to be very effective in
structure predictions. We have found enhanced accuracy in predict-ing solvent accessibility by combining veral methods (Chen and
Zhou, 2005b). Here we report a metamethod, meta-PPISP, for predicting protein-protein a-PPISP is built on cons-PPISP (Chen and Zhou, 2005a), Promate (Neuvirth et al., 004), and PINUP (Liang et al., 2006). The methods are chon
*
To whom correspondence should be addresd.
for two reasons. First, they are accessible through web rvers (at pipe.scs.fsu.edu/ppisp.html,
bioportal.weizmann.ac.il/promate, and sparks.informatics.iupui.edu/PINUP/, respectively). Second, they are bad on very different approaches using quence con-rvation but along with different other attributes as input, and hence may prent synergy. cons-PPISP is a neural network pre-dictor that us quence profiles and solvent accessibilities of spatially neighboring residues as input. Promate us a composite probability calculated from properties such as condary structure, atom distribution, amino-acid pairing, and quence conrvation. PINUP is bad on an empirical energy function consisting of anba艾弗森
side-chain energy term, a term proportional to solvent accessible area, and a term accounting for quence conrvation. In meta-PPISIP, the three methods are combined in a linear re-gression analysis with the raw scores as input. The metamethod is found to consistently outperform the three individual methods. 2 METHODS For each protein, interface predictions were first obtained from the three individual rvers. The Promate and PINUP results have raw scores for each residue, ranging from 0 to 100; higher scores correspond to higher chances of being predicted as an interface residue. cons-PPISP gives con-nsus results from clustering predictions of a t of neural network mod-els. In the original paper (Chen and Zhou, 2005a), 68 models were ud.
Here we ud a reduced t of 17 models, involving training on small het-erodimers. No scores were given in the original cons-PPISP method. Here we assigned scores bad on the values of the interface-state output node. The connsus results of cons-PPISP were also taken into consideration: if a residue was predicted to be an interface residue by the connsus, the
highest output value among the 17 models was taken as the score; other-wi the lowest among the 17 models was ud. As with cons-PPISP, only surface residues (defined as tho with at least 10% solvent accessibility) were considered for interface prediction. cons-PPISP scores ranged from 0 to 1; to be consistent, the original Promate and PINUP scores were scaled by 100. Promate and PIPUP also have their own clustering procedures for虎皮兰花图片
final predictions. The final predictions are not ud for building meta-PPISP. However, the final predictions of the three individual methods are ud below for benchmarking the performance of meta-PPISP. For each surface residue, the predictor scores (s ij ) for itlf and its 8 nearest spatial neighbors were ud to define a linear function:
1小38
010
ij ij i j S c s c ===+∑∑
where i refers to the three predictors, j = 0 is for the residue itlf and in-creasing j refer to successively farther neighbors. The coefficients c 0 and c ij were optimized on a t of 35 proteins (Enz35), consisting of enzymes and inhibitors in Docking Benchmark 2.0 (Mintris et al., 2005) filtered at
Bioinformatics Advance Access published September 25, 2007
S.B. Qin and H.-X. Zhou
2
35% quence identity. The optimization aimed to generate an S value clo to 1 if the residue is known to be an interface residue and clo to 0 otherwi. Only unbound structures were ud for interface prediction. Optimization and asssment were made against the actual interface resi-dues, defined as tho with < 5 Å contacts across the interface in the bound complex. Performance reported on Enz35 was bad on a 7-fold cross validation. For other predictions, such as tho on CAPRI targets, all the 35 proteins were ud for optimization. The optimized coefficients are listed at pipe.scs.fsu.edu/meta-ppisp-SI.pdf.
Performance was assd by simultaneous considering two parameters: coverage and accuracy. Coverage is the fraction of the actual interface residues that are predicted as such. Accuracy is the fraction of correct inter-face predictions among all such predictions.
3 RESULTS
Figure 1 shows the performances of meta-PPISP and the three individual predictors on Enz35 and 25 CAPRI targets. The curve of coverage versus accuracy for each method was obtained by tting the threshold for positive predictions at different levels. Movement toward the upper right corner of the coverage versus accuracy plot signifies better performance. It can be en that meta-PPISP outperforms all the three individual methods.
The final predictions of cons-PPISP, Promate, and PINUP, bad on clustering, have coverages of 37.4%, 17.0%, and 45.2%, respectively, on Enz35; correspondingly the accuracies are 40.9%, 59.2%, and 50.0%. At the same coverages as the three individual methods, the accuracy of meta-PPISP is 59.1%, 69.5%, and 54.8%, respectively. The accuracy levels are higher than tho of the individual methods by 18.2, 10.2, and 4.8 percentage points. On the CAPRI targets, the final predictions of the three individual methods have coverages of 39.0%, 16.7%, and 34.7%, respec-tively; the corresponding accuracies are 38.2%, 47.0%, and 41.2%. The uniformly wor predictions on the CAPRI targets are partly due to the fact that about one third of the targets are antibody-antigen and other immune system complexes. Nevertheless meta-PPISP is able to improve the predictions of the individual methods, increasing their accuracies by 3.4, 4.0, and 7.5 percentage points, respectively. On a third t of targets, consisting of 32 enzymes and inhibitors more recently deposited in the Protein Data Bank, accuracy improvements by meta-PPISP fall within the range spanned by the Enz35 and CAPRI results.
As final predictions of meta-PPISP, we recommend a threshold of S th = 0.34 for positive prediction. With this threshold, the num-bers of predicted and actual interface residues are roughly equal, and conquently prediction coverage and accuracy are roughly equal. For example, with this threshold,
the coverage and accuracy of meta-PPISP for Enz35 are 50.5% and 49.5%, respectively. The numbers of actual and predicted interface residues and true posi-tives for the three ts of targets are listed at pipe.scs.fsu.edu/ meta-ppisp-SI.pdf. An illustrative comparison between the three individual methods and meta-PPISP on one protein can also be found at pipe.scs.fsu.edu/meta-ppisp-SI.pdf). meta-PPISP is accessible at pipe.scs.fsu.edu/meta-ppisp.html. Interface predictions are returned to the ur by e-mail. For each surface residue, the raw scores of the three individual methods and meta-PPISP are given. In addition, the ur is provided a link to a PDB file in which the meta-PPISP scores are stored as B-factors. The ur may rai or lower the positive-prediction threshold for diffe rent targets or for different purpos.
Fig. 1. Performance of meta-PPISP on Enz35 (left) and on CAPRI targets (right). Curves are drawn using raw scores; solid circles with matching colors show the final predictions, bad on clustering, of the three individ-ual methods. Promate distinguishes itlf among the three individual meth-ods in achieving a significant boost in accuracy by clustering.
In conclusion, we have shown that a metamethod built on three web rvers achieves significant improvements in accuracy. Inter-face predictions have a broad range of applications, including as-sisting in protein docking (van Dijk et al., 2005; Tjong et al., 2007). The meta-PPISP web rver is th
us expected to have wide usage and motivate developments of other metamethods.
ACKNOWLEDGEMENTS
This work was support in part by NIH Grant GM058187.
REFERENCES
Chen, H. and Zhou, H.-X. (2005a) Prediction of interface residues in protein-protein
complexes by a connsus neural network method: Test against NMR data, Pro-teins , 61, 21-35.
Chen, H. and Zhou, H.-X. (2005b) Prediction of solvent accessibility and sites of
deleterious mutations from protein quence, Nucl. Acids Res., 33, 3193-3199. Farilli, P., Pazos, F., Valencia, A. and Casadio, R. (2002) Prediction of protein-protein interaction sites in heterocomplexes with neural networks, Eur. J. Bio-chemistry , 269, 1356-1361.
Liang, S., Zhang, C., Liu, S. and Zhou, Y. (2006) Protein binding site prediction using
an empirical scoring function, Nucl. Acids Res., 34, 3698-3707.
Mintris, J., Wiehe, K., Pierce, B., Anderson, R., Chen, R., Janin, J. and Weng, Z.
(2005) Protein-protein docking benchmark 2.0: an update, Proteins , 60, 214-216. Neuvirth, H., Raz, R. and Schreiber, G. (2004) ProMate: A structure bad prediction
program to identify the location of protein-protein binding sites, J. Mol. Biol., 338, 181-199.
Ofran, Y. and Rost, B. (2003) Predicted protein-protein interaction sites from local
quence information, FEBS Lett., 544, 236-239.
Tjong, H., Qin, S.B. and Zhou, H.-X. (2007) PI 2PE: protein interface/interior predic-tion engine, Nucl. Acids Res., 35, W357-W362.
van Dijk, A.D.J., de Vries, S.J., Dominguez, C., Chen, H., Zhou, H.-X. and Bonvin,
A.M.J.J. (2005) Data-driven docking: HADDOCK's adventures in CAPRI, Pro-teins , 60, 232-238.
Zhou, H.-X. and Qin, S.B. (2007) Interaction-site prediction for protein complexes: a
critical asssment, Bioinformatics , in press.
Zhou, H.-X. and Shan, Y. (2001) Prediction of protein interaction sites from quence
profile and residue neighbor list., Proteins , 44, 336-343.