Please use this identifier to cite or link to this item:
PERT: an integrated informatics platform for thermostable protein design
support vector machine (SVM)
National Chung Hsing University
|引用:||何承偉，2012，KStable:使用Kstar搭配mRMR特徵選擇法預測蛋白質單點突變後熱穩定性之改變。國立中興大學基因體暨生物資訊學研究所碩士論文。 Altschul, S. F., et al. (1997) "Gapped BLAST and PSI-BLAST: anewgeneration of protein database search programs." Nucleic Acids Research 25:3389–3402. An, J., et al. (1998) "3DinSight: an integrated relational database and search tool for the structure, function and properties of biomolecules." Bioinformatics 14(2):188-95. Argos, P., et al. (1979) "Thermal stability and protein structure." Biochemistry 18(25):5698-5703. Bairoch, A. and Apweiler, R. (1997) "The SWISS-PROT protein sequence data bank and its supplement TrEMBL." Nucleic Acids Research 25:31-36. Berman, HM., et al. (1999) "The Protein Data Bank." Nucleic Acids Research 28(1):235-242. Brown, MP., et al. (2000) "Knowledge-based analysis of microarray gene expression data by using support vector machines." Proceedings of the National Academy of Sciences 97(1):262-7. Burbidge, R., Trotter, M., Buxton, B. and Holden, S. (2001) "Drug design by machine learning: support vector machines for pharmaceutical data analysis." Computers & Chemistry 26(1):5-14. Capriotti, E., Fariselli, P. and Casadio, R. (2005) "I-Mutant: Predictor of Protein Stability Changes upon Mutations." Nucleic Acids Research 33:306-310. Chang, CC. and Lin, CJ. (2011) "LIBSVM: A library for support vector machines." ACM Transactions on Intelligent Systems and Technology 2(27):1-27. Chen, J., et al. (2001) "Packing is a key selection factor in the evolution of protein hydrophobic cores." Biochemistry 40:15280-15289. Cheng, J., Randall, A. and Baldi, P. (2006) "Prediction of protein stability changes for single-site mutations using support vector machines." Proteins 62(4):1125-1132. Cheng,J., Randall, A.Z., Sweredoski, M.J. and Baldi, P. (2005) "SCRATCH: a protein structure and structural feature prediction server." Nucleic Acids Research 33:72-76. Cheng,J., Sweredoski, MJ. and Baldi, P. (2005) "Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data." Data Mining and Knowledge Discovery 11(3):213-222. Cheng,J., Sweredoski, MJ. and Baldi, P. (2006) "DOMpro: Protein Domain Prediction Using Profiles, Secondary Structure, Relative Solvent Accessibility, and Recursive Neural Networks." Data Mining and Knowledge Discovery 13(1):1-20. Cortes, C. and Vapnik, V. (1995) “Support-Vector Networks.” Machine Learning 20: 273-297. Dehouck, Y., et al. (2011) "Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0." BMC Bioinformatics 12:151-163. Ding, Y., et al. (2004) "The infuence of dipetide composition on protein thermostability." FEBS Letters 569(1-3):284-8. Drucker, H., Wu, D. and Vapnik, V. (1999) “Support Vector Machines for Spam Categorization.” IEEE Transactions on Neural Networks 10(5):1048-1054. Du, P. and Li, Y. (2006) "Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence." BMC Bioinformatics 7:518-526. Facchiano, AM., Colonna, G. and Ragone, R. (1998) "Helix stabilizing factors and stabilization of thermophilic protein: an X-ray based study." Proteing Engineering 11(9):753-760. Francis, E. H. and Cao, T. L. (2001) “Application of Support Vector Machines in Financial Time Series Forecasting.” The International Journal of Management Science Omega 29:309-317. Gao, L., et al. (2011) "On-the-spot lung cancer differential diagnosis by label-free, molecular vibrational imaging and knowledge-based classification." Journal of Biomedical Optics 16(9):096004. George, DG., et al. (1997) "The Protein Information Resource (PIR) and the PIR-international protein sequence database." Nucleic Acids Research 25:24-27. Gilis, D. and Rooman, M. (2000) "PoPMuSiC, an algorithm for predicting protein mutant stability changes: application to prion proteins. " Proteing Engineering 13(12):849-856. Grimsley, GR., et al. (1999) "Increasing protein stability by altering long-range coulombic interactions." Protein Science 8(9):1843-1849. Gromiha, MM, et al. (1999). "ProTherm: Thermodynamic Database for Proteins and Mutants." Nucleic Acids Research 27(1):286-288. Gromiha, MM. (2001) "Important inter-residue contacts for enhancing the thermal stability of thermophilic proteins." Biophysical Chemistry 91(1):71-7. Gromiha, MM. and Xavier-Suresh, M. (2008) "Discrimination of mesophilic and thermophilic proteins using machine learning algorithms." Proteins 70(4):1274-9. Gruber, C., Gruber, T., Krinninger, S. and Sick, B. (2009) "Online signature verification with support vector machines based on LCSS kernel functions." IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics 40(4):1088-1100. Hall, M., et al. (2009) "The WEKA Data Mining Software: An Update." SIGKDD Explorations 11(1):10-18. He, ZS., et al. (2012) "A novel sequence-based method for phosphorylation site prediction with feature selection and analysis." Protein & Peptide Letters 19(1):70-78. Huang, LT., et al. (2007) "Prediction of protein mutant stability using classification and regression tool." Biophysical Chemistry 125(2-3):462-470. Imanaka, T., Shibazaki, M. and Takagi, M. (1986) "A new way of enhancing the thermostability of proteases." Nature 324(6098):695-697. Kawashima, S. and Kanehisa, M. (2000) "AAindex: amino acid index database." Nucleic Acids Research 28(1):374. Kawashima, S., Ogata, H. and Kanehisa, M. (1999) "AAindex: amino acid index database." Nucleic Acids Research 27:368-369. Kumar, S., Tsai, CJ. and Nussinov, R. (2000) "Factors enhancing protein thermostability." Proteing Engineering 13(3):179-191. Kyte, J. and Doolittle, R.F. (1989) “A simple method for amino acids with nonpolar or more hydrophobic side chains.” Journal of Molecular Biology 157:105-132. Lehmann, M., et al. (2000) "Van Loon APGM: From DNA sequence to improved functionality: using protein sequence comparisons to rapidly design a thermostable consensus phytase." Protein Engineering 13:49-57. Li, YH. and Jain, AK. (1998) "Classification of text documents." The Computer Journal 41(8):537-546. Liu, B., et al. (2007) "Predicting the protein SUMO modification sites based on Properties Sequential Forward Selection (PSFS)." Biochemical and Biophysical Research Communications 358(1):136-139. Liua, L., Dongb, H., Wangc, S., Chena, H. and Shao, W. (2006) "Computationalanalysis of di-peptides correlated with the optimal temperature in G/11 xylanase." Process Biochemistry 41(2):305-311. Magnan, CN., Randall, A. and Baldi, P. (2009) "SOLpro: accurate sequence-based prediction of protein solubility." Bioinformatics 25(17):2200-2207. Masso, M. and Vaisman, I.I. (2010) "AUTO-MUTE: web-based tools for predicting stability changes in proteins due to single amino acid replacements." Protein Engineering, Design and Selection 23:683-687. Matsumura, M., Signor, G., and Matthews, BM. (1989) “Substantial increase of protein stability by muiple disulfide bonds.” Nature 342:291-293. McGuffin, LJ., Bryson, K. and Jones,DT. (2000) "The PSIPRED protein structure prediction server." Bioinformatics 16(4):404-405. Menendez-Arias, L. and Argos, P. (1989) "Engineering protein thermal stability. Sequence statistics point to residue substitutions in alpha-helices." Journal of Molecular Biology 206(2):397-406. Miyazaki J., et al. (2001) "Ancestral residues stabilizing 3-isopropylmalate dehydrogenase of an extreme thermophile: experimental evidence supporting the thermophilic common ancestor hypothesis." The Journal of Biological Chemistry 129:777-782. Nelson, D. L. and Cox, M. M. (2008) "Lehniner Principles of Biochemistry" W.H.Freeman, 5th edition. Parthiban, V., Gromiha, MM. and Schomburg, D. (2006) "CUPSAT: prediction of protein stability upon point mutations." Nucleic Acids Research 34:239-242. Pfeil, W. (2001) "Protein Stability and Folding, Supplement 1: A Collection of Thermodynamic Data." New York: Springer. Pickett, SD. and Sternberg, MJ. E. (1993) "Empirical Scale of Side-Chain Conformational Entropy in Protein Folding." Journal of Molecular Biology 231(3):825-39. Ponnuswamy, PK. and Gromiha, MM. (1994) "On the Conformational Stability of Folded Proteins." Journal of Theoretical Biology 166(1):63-74. Saraboji, K., Gromiha, MM. and Ponnuswamy, MN. (2005) "Importance of main-chain hydrophobic free energy to the stability of thermophilic proteins." International Journal of Biological Macromolecules 35(3-4):211-220. Sebastiani, F. A., et al. (1999) "A tutorial on automated text categorization." Proceedings of the 1st Argentinian Symposium Artificial Intelligence 7-35. Somol, P. and Pudil,P. (2002) "Feature Selection toolbox." Pattern Recognition 35(12):2749–2759. Szilagyi, A. and Zavodszky, P. (2000) "Structural differences between mesophilic, moderately thermophilic and extremely thermophilic protein subunits: results of a comprehensive survey." Structure 8(5):493-504. Teng, S., Srivastava, A.K. and Wang, L. (2010) "Sequence feature-based prediction of protein stability changes upon amino acid substitutions." BMC Genomics 11:5-13. Turunen, O., Vuorio, M., Fred, F. and Leisola, M. (2002) "Engineering of multiple arginines into the Ser/Thr surface of Trichoderma reesei endo-1,4-β-xylanase II increases the thermotolerance and shifts the pH optimum towards alkaline pH." Protein Engineering 15:141-145. Vogt, G., Woell, S. and Argos, P. (1997) "Protein thermal stability, hydrogen bonds, and ion pairs." Journal of Molecular Biology 269(4):631-43. Worth, CL., Preissner, R. and Blundell, TL. (2011) "SDM--a server for predicting effects of mutations on protein stability and malfunction." Nucleic Acids Research 39:215-222. Xiao, XH. and Dai, RW. (1997) "A metasynthetic approach for handwritten Chinese character recognition." Acta Automatica Sinica 23(5):621-627. Yin, S., Ding, F. and Dokholyan, NV. (2007) "Eris: an automated estimator of protein stability." Naure Methods 4(6):466-467. Yunqi, L., Middaugh, CR. and Fang, J. (2010) “A novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting relative thermostability of protein mutants.” BMC Bioinformatics 11:62-73. Zhou, H. and Zhou, Y. (2002) "Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction." Protein Science 11(11):2714-2726.|
|摘要:||如何設計具有高熱穩定性的蛋白質是蛋白質工程探討的重點之一。選擇適當的突變位置通常需要結構資訊，但在大部分情況下，結構資訊仍是未知。為此，本研究提出一個以蛋白質序列為基礎的資訊綜合網路平台PERT (Protein Engineering Research Tool)。有助於研究人員了解目標蛋白質的結構特性，PERT整合了目前常用的二級結構預測、序列保留性計算、胺基酸暴露程度預測、蛋白質表面亂度計算、蛋白質無結構區域預測、 domains 以及 domain linkers預測、帶電量計算預測、疏水性指數計算等生物資訊。另外，基於目前已發表增加蛋白質穩定度的突變策略，PERT會建議研究人員合適的突變位置，並同時計算出可能的穩定度變化。為了更精準預測蛋白質在突變後的穩定度，我們並開發以支持向量機為基礎的分類器模型PERTsvm。來自序列資訊、結構預測資訊和AAindex數據庫中胺基酸屬性的資料做為特徵集。為了提升支持向量機分類的性能，利用特徵選取工具feature selection toolbox (FST) 中序列前向搜索法sequential forward selection (SFS)挑出對分類有助益的特徵集。目前結果顯示我們預測的準確度優於目前現有的預測軟體I-Mutant和MUpro。這項研究將有助於研究人員了解目標蛋白質的結構特性，並且能減少實驗的次數和成本。|
One of the focus on protein engineering is how to design a high thermostable protein. However, in many cases, it requires experts to have a detailed structural information of the protein being modified, which is not always available. In this research we present a service platform PERT (Protein Engineering Research Tool). To provide a comprehensive structural information for researchers, PERT gathers many sequence-based prediction results, including secondary structure, sequence conservation, solvent accessibility, disorder region, domains and domain linkers. Based on some current strategies to increase protein stability, PERT will suggest the mutation site candidates, and predicting the stability change after mutation. In order to achieve a better stability prediction, we also develop a new classification model based on the support vector machine (SVM) technique. The feature set is derived from sequence information, the predicted structural information, and amino acid properties from AAindex database. These features were selected by sequential forward selection (SFS) using feature selection toolbox (FST). Current results showed the new predictor PERTsvm would be more accurate compared to I-Mutant and MU-pro. This study would be very helpful for researchers to understand the structural information of the target protein, and to reduce the cost of trials and errors.
|Appears in Collections:||基因體暨生物資訊學研究所|
Show full item record
TAIR Related Article
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.