Please use this identifier to cite or link to this item:
標題: 透過機器學習與模組化建置整合蛋白質單點突變後穩定性預測工具
Integrated off-the-shelf Predictor for Protein Stability Changes upon Single Mutation by Various Modules Using Machine Learning
作者: 林孟函
Meng-Han Lin
關鍵字: 蛋白質穩定性;胺基酸單點突變;機器學習;整合方法;特徵選擇;Mutation of a single amino acid;Protein stability change;Machine learning;Integration strategy;Feature selection
引用: 1. Tokuriki N, Tawfik DS: Stability effects of mutations and protein evolvability. Current opinion in structural biology 2009, 19(5):596-604. 2. Stefl S, Nishi H, Petukh M, Panchenko AR, Alexov E: Molecular mechanisms of disease-causing missense mutations. J Mol Biol 2013, 425(21):3919-3936. 3. Yue P, Li Z, Moult J: Loss of protein structure stability as a major causative factor in monogenic disease. J Mol Biol 2005, 353(2):459-473. 4. Frokjaer S, Otzen DE: Protein drug stability: a formulation challenge. Nature reviews drug discovery 2005, 4(4):298-306. 5. Rathi PC, Jaeger K-E, Gohlke H: Structural rigidity and protein thermostability in variants of lipase A from Bacillus subtilis. PloS one 2015, 10(7):e0130289. 6. Bloom JD, Labthavikul ST, Otey CR, Arnold FH: Protein stability promotes evolvability. Proceedings of the National Academy of Sciences 2006, 103(15):5869-5874. 7. Romero PA, Arnold FH: Exploring protein fitness landscapes by directed evolution. Nature reviews Molecular cell biology 2009, 10(12):866-876. 8. Socha RD, Tokuriki N: Modulating protein stability–directed evolution strategies for improved protein function. The FEBS journal 2013, 280(22):5582-5595. 9. Bednar D, Beerens K, Sebestova E, Bendl J, Khare S, Chaloupkova R, Prokop Z, Brezovsky J, Baker D, Damborsky J: FireProt: energy-and evolution-based computational design of thermostable multiple-point mutants. PLoS Comput Biol 2015, 11(11):e1004556. 10. Porebski BT, Buckle AM: Consensus protein design. Protein Engineering, Design Selection 2016, 29(7):245-251. 11. Dai M, Fisher HE, Temirov J, Kiss C, Phipps ME, Pavlik P, Werner JH, Bradbury AR: The creation of a novel fluorescent protein by guided consensus engineering. Protein Engineering, Design Selection 2007, 20(2):69-79. 12. Chakravorty D, Patra S: RankProt: A multi criteria-ranking platform to attain protein thermostabilizing mutations and its in vitro applications-Attribute based prediction method on the principles of Analytical Hierarchical Process. PloS one 2018, 13(10):e0203036. 13. Eijsink V, Vriend G, Van der Zee J, Van den Burg B, Venema G: Increasing the thermostability of the neutral proteinase of Bacillus stearothermophilus by improvement of internal hydrogen-bonding. Biochemical Journal 1992, 285(2):625-628. 14. Torrez M, Schultehenrich M, Livesay DR: Conferring thermostability to mesophilic proteins through optimized electrostatic surfaces. Biophysical journal 2003, 85(5):2845-2853. 15. DePristo MA, Weinreich DM, Hartl DL: Missense meanderings in sequence space: a biophysical view of protein evolution. Nature Reviews Genetics 2005, 6(9):678-687. 16. Kollman PA, Massova I, Reyes C, Kuhn B, Huo S, Chong L, Lee M, Lee T, Duan Y, Wang W: Calculating structures and free energies of complex molecules: combining molecular mechanics and continuum models. Accounts of chemical research 2000, 33(12):889-897. 17. Pitera JW, Kollman PA: Exhaustive mutagenesis in silico: multicoordinate free energy calculations on proteins and peptides. Proteins: Structure, Function, Bioinformatics 2000, 41(3):385-397. 18. Thomas PD, Dill KA: Statistical potentials extracted from protein structures: how accurate are they? J Mol Biol 1996, 257(2):457-469. 19. Carter Jr CW, LeFebvre BC, Cammer SA, Tropsha A, Edgell MH: Four-body potentials reveal protein-specific correlations to stability changes caused by hydrophobic core mutations1. J Mol Biol 2001, 311(4):625-638. 20. Topham CM, Srinivasan N, Blundell TL: Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitution and propensity tables. Protein Engineering 1997, 10(1):7-21. 21. Gilis D, Rooman M: Prediction of stability changes upon single-site mutations using database-derived potentials. Theoretical Chemistry Accounts 1999, 101(1-3):46-50. 22. Bordner A, Abagyan R: Large‐scale prediction of protein geometry and stability changes for arbitrary single point mutations. Bioinformatics 2004, 57(2):400-413. 23. Guerois R, Nielsen JE, Serrano L: Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol 2002, 320(2):369-387. 24. Yin S, Ding F, Dokholyan NV: Modeling backbone flexibility improves protein stability estimation. Structure 2007, 15(12):1567-1576. 25. Capriotti E, Fariselli P, Casadio R: A neural-network-based method for predicting protein stability changes upon single point mutations. Bioinformatics 2004, 20(suppl_1):i63-i68. 26. Cheng J, Randall A, Baldi P: Prediction of protein stability changes for single‐site mutations using support vector machines. Bioinformatics 2006, 62(4):1125-1132. 27. Huang L-T, Gromiha MM, Ho S-Y: Sequence analysis and rule development of predicting protein stability change upon mutation using decision tree model. Journal of Molecular modeling 2007, 13(8):879-890. 28. Capriotti E, Fariselli P, Casadio R: I-Mutant2. 0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic acids research 2005, 33(suppl_2):W306-W310. 29. Huang L-T, Gromiha MM, Ho S-Y: iPTREE-STAB: interpretable decision tree based method for predicting protein stability changes upon mutations. Bioinformatics 2007, 23(10):1292-1293. 30. Fariselli P, Martelli PL, Savojardo C, Casadio R: INPS: predicting the impact of non-synonymous variations on protein stability from sequence. Bioinformatics 2015, 31(17):2816-2821. 31. Folkman L, Stantic B, Sattar A, Zhou Y: EASE-MM: Sequence-based prediction of mutation-induced stability changes with feature-based multiple models. J Mol Biol 2016, 428(6):1394-1405. 32. Parthiban V, Gromiha MM, Schomburg D: CUPSAT: prediction of protein stability upon point mutations. Nucleic acids research 2006, 34(suppl_2):W239-W242. 33. Gilis D, Rooman M: PoPMuSiC, an algorithm for predicting protein mutant stability changes. Application to prion proteins. Protein engineering 2000, 13(12):849-856. 34. Dehouck Y, Grosfils A, Folch B, Gilis D, Bogaerts P, Rooman M: Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0. Bioinformatics 2009, 25(19):2537-2543. 35. Dehouck Y, Kwasigroch JM, Gilis D, Rooman M: PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC bioinformatics 2011, 12(1):151. 36. Worth CL, Preissner R, Blundell TL: SDM—a server for predicting effects of mutations on protein stability and malfunction. Nucleic acids research 2011, 39(suppl_2):W215-W222. 37. Pandurangan AP, Ochoa-Montaño B, Ascher DB, Blundell TL: SDM: a server for predicting effects of mutations on protein stability. Nucleic acids research 2017, 45(W1):W229-W235. 38. Pires DE, Ascher DB, Blundell TL: mCSM: predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics 2013, 30(3):335-342. 39. Laimer J, Hofer H, Fritz M, Wegenkittl S, Lackner P: MAESTRO-multi agent stability prediction upon point mutations. BMC bioinformatics 2015, 16(1):116. 40. Masso M, Vaisman II: AUTO-MUTE: web-based tools for predicting stability changes in proteins due to single amino acid replacements. Protein Engineering, Design Selection 2010, 23(8):683-687. 41. Masso M, Vaisman II: AUTO-MUTE 2.0: a portable framework with enhanced capabilities for predicting protein functional consequences upon mutation. Advances in bioinformatics 2014, 2014:278385. 42. Xue B, Lipps D, Devineni S: Integrated Strategy Improves the Prediction Accuracy of miRNA in Large Dataset. PloS one 2016, 11(12):e0168392. 43. Xia J-F, Zhao X-M, Huang D-S: Predicting protein–protein interactions from protein sequences using meta predictor. Amino Acids 2010, 39(5):1595-1599. 44. Petersen B, Petersen TN, Andersen P, Nielsen M, Lundegaard C: A generic method for assignment of reliability scores applied to solvent accessibility predictions. BMC structural biology 2009, 9(1):51. 45. Wan J, Kang S, Tang C, Yan J, Ren Y, Liu J, Gao X, Banerjee A, Ellis LB, Li T: Meta-prediction of phosphorylation sites with weighted voting and restricted grid search parameter selection. Nucleic acids research 2008, 36(4):e22-e22. 46. Pires DE, Ascher DB, Blundell TL: DUET: a server for predicting effects of mutations on protein stability using an integrated computational approach. Nucleic acids research 2014, 42(W1):W314-W319. 47. Chen C-W, Lin J, Chu Y-W: iStable: off-the-shelf predictor integration for predicting protein stability changes. BMC bioinformatics: 2013, 14(Suppl 2):S5. 48. Kawashima S, Kanehisa M: AAindex: amino acid index database. Nucleic acids research 2000, 28(1):374-374. 49. Frank E, Hall M, Trigg L, Holmes G, Witten IH: Data mining in bioinformatics using Weka. Bioinformatics 2004, 20(15):2479-2481. 50. Chang C-C, Lin C-J: LIBSVM: a library for support vector machines. ACM transactions on intelligent systems technology 2011, 2(3):1-27. 51. Peng H, Long F, Ding C: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis machine intelligence 2005, 27(8):1226-1238. 52. Chen T, Guestrin C: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining: 2016. ACM: 785-794. 53. Bava KA, Gromiha MM, Uedaira H, Kitajima K, Sarai A: ProTherm, version 4.0: thermodynamic database for proteins and mutants. Nucleic acids research 2004, 32(suppl_1):D120-D121. 54. Witvliet DK, Strokach A, Giraldo-Forero AF, Teyra J, Colak R, Kim PM: ELASPIC web-server: proteome-wide structure-based prediction of mutation effects on protein stability and binding affinity. Bioinformatics 2016, 32(10):1589-1591. 55. Cang Z, Wei G-W: TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions. PLoS Comput Biol 2017, 13(7):e1005690. 56. Cang Z, Wei G-W: Analysis and prediction of protein folding energy changes upon mutation by element specific persistent homology. Bioinformatics 2017, 33(22):3549-3557. 57. Rodrigues CH, Pires DE, Ascher DB: DynaMut: predicting the impact of mutations on protein conformation, flexibility and stability. Nucleic acids research 2018, 46(W1):W350-W355. 58. Atchley WR, Zhao J, Fernandes AD, Drüke T: Solving the protein sequence metric problem. Proceedings of the National Academy of Sciences 2005, 102(18):6395-6400. 59. Venkatarajan MS, Braun W: New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical–chemical properties. Molecular modeling annual 2001, 7(12):445-453. 60. Wu W, Wang Z, Cong P, Li T: Accurate prediction of protein relative solvent accessibility using a balanced model. BioData mining 2017, 10(1):1. 61. Hoskins J, Lovell S, Blundell TL: An algorithm for predicting protein–protein interaction sites: abnormally exposed amino acid residues and secondary structure elements. Protein Science 2006, 15(5):1017-1029. 62. Jones S, Thornton JM: Prediction of protein-protein interaction sites using patch analysis1. J Mol Biol 1997, 272(1):133-143. 63. Panchenko AR, Kondrashov F, Bryant S: Prediction of functional sites by analysis of sequence and structure conservation. Protein Science 2004, 13(4):884-892. 64. Mooney S: Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis. Briefings in bioinformatics 2005, 6(1):44-56. 65. Kubbutat MH, Vousden KH: Keeping an old friend under control: regulation of p53 stability. Molecular medicine today 1998, 4(6):250-256. 66. Cho Y, Gorina S, Jeffrey PD, Pavletich NP: Crystal structure of a p53 tumor suppressor-DNA complex: understanding tumorigenic mutations. Science 1994, 265(5170):346-355. 67. Kruiswijk F, Labuschagne CF, Vousden KH: p53 in survival, death and metabolic health: a lifeguard with a licence to kill. Nature reviews Molecular cell biology 2015, 16(7):393-405. 68. Vousden KH, Ryan KM: p53 and metabolism. Nature Reviews Cancer 2009, 9(10):691-700. 69. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The protein data bank. Nucleic acids research 2000, 28(1):235-242.
蛋白質突變後可能會導致結構改變,進而影響蛋白質功能甚至造成疾病的發生,且在蛋白質工程、藥物設計或者優化工業,常會透過突變提升蛋白質穩定性或改變蛋白質特性時能維持其穩定性。但目前預測蛋白質突變後穩定性之工具甚多,且常以不同演算法與特徵建構模型,可能產生相互矛盾的預測結果,導致使用者在決策上產生疑慮。因此,本研究以機器學習整合11個預測工具的結果並加入蛋白質序列特性作為特徵進行編碼,並透過六種組合之特徵選擇方法挑選最佳模型,進而提高模型準確度且降低訓練模型之時間複雜度。此系統中包含三種模組分別為網站模組(Website Module)、序列模組(Sequence Module)、單機模組(Stand-alone Module),且其中單機和序列模組於整合之線上工具無法運作時,能達到維持系統預測效能之功能。最終結構分類模型之MCC可從0.547提升至0.708,回歸模型PCC可達 0.697,且序列模型之準確度優於以結構資訊為輸入之單一方法預測工具,MCC高於0.105,不僅成功整合現有的預測工具,甚至提高整合工具之準確度。另外在單機版測試,分類模型之MCC僅相差0.019,而PCC也只有相差0.04,系統在整合之線上工具無法運作時,亦能維持效能之穩定性。

Mutation of a single amino acid residue may change protein structure which affect protein function and disease. Increasing protein stability or maintaining it stable while changing protein properties is often a goal in protein engineering, drug design or optimize industrial. A variety of methods and features have been proposed to predict the stability of protein mutations, the conflicting prediction results from different tools could cause confusion to users. Therefore, this study integrates 11 prediction tools with machine learning and adds information of protein sequences. The best model is selected through six combined feature selection methods to improve accuracy and reduce the time complexity of the training model. The three modules included in the system are website module, sequence module, and stand-alone module. When integrated online tools are not working, stand-alone and sequence modules can maintain prediction accuracy. The MCC (Matthews Correlation Coefficient) of the structural classification model can be increased from 0.547 to 0.708, and PCC (Pearson correlation coefficient) 0.697 on regression model. And the accuracy of the sequence model is better than the prediction tool with structural information as input, and MCC is higher than 0.105. Not only successfully integrates predictors, but also improves the accuracy of integration tools. In the stand-alone test, the MCC of the classification model a narrow margin by 0.019, and PCC a small margin by 0.04.Therefore, when the integrated online tools are not working, the stability of the system performance can be maintained.
Rights: 同意授權瀏覽/列印電子全文服務,2022-01-31起公開。
Appears in Collections:基因體暨生物資訊學研究所

Files in This Item:
File SizeFormat Existing users please Login
nchu-108-7105019014-1.pdf2.12 MBAdobe PDFThis file is only available in the university internal network   
Show full item record
TAIR Related Article

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.