Please use this identifier to cite or link to this item:
標題: KStable:使用Kstar搭配regular-mRMR特徵選擇法預測蛋白質單點突變後熱穩定性之改變
KStable: Predicting protein stability changes by K-star with regular-mRMR feature selection
作者: 何承偉
Ho, Cheng-Wei
關鍵字: 蛋白質熱穩定性;protein thermostability;機器學習法;mRMR特徵選擇;regular-mRMR;machine learning;mRMR feature selection;regular-mRMR
出版社: 基因體暨生物資訊學研究所
引用: 1.Saraboji, K., Gromiha, M.M. and Ponnuswamy, M. (2006) Average assignment method for predicting the stability of protein mutants. Biopolymers, 82, 80-92. 2.Huang, L.T., Saraboji, K., Ho, S.Y., Hwang, S.F., Ponnuswamy, M. and Gromiha, M.M. (2007) Prediction of protein mutant stability using classification and regression tool. Biophysical chemistry, 125, 462-470. 3.Capriotti, E., Fariselli, P., Calabrese, R. and Casadio, R. (2005) Predicting protein stability changes from sequences using support vector machines. Bioinformatics, 21, ii54-ii58. 4.Huang, L.T., Gromiha, M.M. and Ho, S.Y. (2007) iPTREE-STAB: interpretable decision tree based method for predicting protein stability changes upon mutations. Bioinformatics, 23, 1292-1293. 5.Capriotti, E., Fariselli, P. and Casadio, R. (2004) A neural-network-based method for predicting protein stability changes upon single point mutations. Bioinformatics, 20, i63-i68. 6.Kang, S., Chen, G. and Xiao, G. (2009) Robust prediction of mutation-induced protein stability change by property encoding of amino acids. Protein Engineering Design and Selection, 22, 75-83. 7.Cover, T.M., Thomas, J.A. and Wiley, J. (1991) Elements of information theory. Wiley Online Library. 8.Yang, Y. and Pedersen, J.O. (1997). MORGAN KAUFMANN PUBLISHERS, INC., pp. 412-420. 9.Eisenberg, D., Schwarz, E., Komaromy, M. and Wall, R. (1984) Analysis of membrane and surface protein sequences with the hydrophobic moment plot. Journal of molecular biology, 179, 125. 10.Vihinen, M., Torkkila, E. and Riikonen, P. (1994) Accuracy of protein flexibility predictions. Proteins: Structure, Function, and Bioinformatics, 19, 141-149. 11.Capriotti, E., Fariselli, P. and Casadio, R. (2005) I-Mutant2. 0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic acids research, 33, W306-W310. 12.Masso, M. and Vaisman, I.I. (2010) AUTO-MUTE: web-based tools for predicting stability changes in proteins due to single amino acid replacements. Protein Engineering Design and Selection, 23, 683-687. 13.Cheng, J., Randall, A., Sweredoski, M. and Baldi, P. (2005) SCRATCH: a protein structure and structural feature prediction server. Nucleic acids research, 33, W72-W76. 14.Bava, K.A., Gromiha, M.M., Uedaira, H., Kitajima, K. and Sarai, A. (2004) ProTherm, version 4.0: thermodynamic database for proteins and mutants. Nucleic acids research, 32, D120-D121. 15.Kumar, M.D.S., Bava, K.A., Gromiha, M.M., Prabakaran, P., Kitajima, K., Uedaira, H. and Sarai, A. (2006) ProTherm and ProNIT: thermodynamic databases for proteins and protein–nucleic acid interactions. Nucleic acids research, 34, D204-D206. 16.Dehouck, Y., Kwasigroch, J.M., Gilis, D. and Rooman, M. (2011) PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC bioinformatics, 12, 151. 17.Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T. and Kanehisa, M. (2008) AAindex: amino acid index database, progress report 2008. Nucleic acids research, 36, D202-D205. 18.Benesty, J., Chen, J., Huang, Y. and Cohen, I. (2009) Pearson Correlation Coefficient. Noise Reduction in Speech Processing, 1-4. 19.Peng, H., Long, F. and Ding, C. (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 27, 1226-1238. 20.Won, H.H., Kim, M.J., Kim, S. and Kim, J.W. (2008) EnsemPro: an ensemble approach to predicting transcription start sites in human genomic DNA sequences. Genomics, 91, 259-266. 21.Chen, L., Lu, L., Feng, K., Li, W., Song, J., Zheng, L., Yuan, Y., Zeng, Z. and Lu, W. (2009) Multiple classifier integration for the prediction of protein structural classes. Journal of Computational Chemistry, 30, 2248-2254. 22.Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P. and Witten, I.H. (2009) The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter, 11, 10-18. 23.Cleary, J.G. and Trigg, L.E. (1995). MORGAN KAUFMANN PUBLISHERS, INC., pp. 108-114. 24.Russell, S.J. and Norvig, P. (2010) Artificial intelligence: a modern approach. Prentice hall. 25.Cai, Y.D. and Chou, K.C. (2006) Predicting membrane protein type by functional domain composition and pseudo-amino acid composition. Journal of theoretical biology, 238, 395-400. 26.Chang, C.C. and Lin, C.J. (2011) LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2, 27. 27.Liaw, A. and Wiener, M. (2002) Classification and Regression by randomForest. R news, 2, 18-22. 28.Quinlan, J.R. (1986) Induction of decision trees. Machine learning, 1, 81-106. 29.Rodriguez, J.J., Kuncheva, L.I. and Alonso, C.J. (2006) Rotation forest: A new classifier ensemble method. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 28, 1619-1630. 30.Parthiban, V., Gromiha, M.M. and Schomburg, D. (2006) CUPSAT: prediction of protein stability upon point mutations. Nucleic Acids Research, 34, W239-W242. 31.Koehl, P. and Levitt, M. (1999) Structure-based conformational preferences of amino acids. Proceedings of the National Academy of Sciences, 96, 12524. 32.Richardson, J.S. and Richardson, D.C. (1988) Amino acid preferences for specific locations at the ends of alpha helices. Science, 240, 1648-1652. 33.Nozaki, Y. and Tanford, C. (1971) The solubility of amino acids and two glycine peptides in aqueous ethanol and dioxane solutions. Journal of Biological Chemistry, 246, 2211-2217.
蛋白質熱穩定性的潛在應用領域非常廣泛,如提高蛋白質活性、研究蛋白質互相作用位點的結構特性和藥物開發等。到目前為止,預測工具大多考慮3D結構作為預測蛋白質穩定性的資訊,然而卻有更多蛋白質是只知道一級序列的資訊。本研究提出一個使用序列預測蛋白質熱穩定性之預測系統KStable,在七大類別共58個機器學習法裡選擇表現最好的Kstar搭配由我們首先提出的regular-mRMR特徵選擇法讓來自Protherm資料庫的資料集進行學習。經過十倍交叉驗證後預測準確度為0.83,並與其他預測蛋白質穩定性的網站:AUTO-MUTE、i-Mutant、Mupro、PopMuSiC及CUPSAT進行比較,最終KStable預測準確度皆勝於其他網站。也因此證明了KStable 使用全新的特徵選擇法不僅減少預測時間且能與參考蛋白質結構的預測工具擁有相同或更佳的預測效能。

Protein thermostability is essential for many studies and industries. Up till now, most prediction tools of protein stability changes considered 3D structure information, however, a large number of proteins only have primary structure. Therefore, this study proposed an effective prediction system, KStable, based on sequence, which adopted Kstar algorithms with regular-mRMR feature selection we first proposed. The prediction accuracy of KStable was 0.83 by 10-fold cross validation in Protherm database. On the other hand, we also compared with the present website tools (AUTO-MUTE, i-Mutant, Mupro, PopMuSiC and CUPSAT) and the prediction accuracy and Matthew’s correlation coefficient of KStable were better than others. Therefore, KStable was proved to reduce the prediction time and keep the prediction performance to those with 3D structure information of tools.
其他識別: U0005-3007201216025300
Appears in Collections:基因體暨生物資訊學研究所

Show full item record

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.