Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/60851
標題: 利用基因演算法輔助蟻群系統和機器學習方法之評估策略預測蛋白質磷酸化位置
GasPhos: Protein phosphorylation sites prediction by GA-aided ant colony system and evaluation strategy of machine learning algorithms
作者: 廖家逢
Liao, Chia-Feng
關鍵字: 激酶
phosphorylation
蟻群系統
基因演算法
特徵選擇
kinase
ant colony system
genetic algorithms
feature selection
出版社: 基因體暨生物資訊學研究所
引用: 1. Li, T., Du, P., and Xu, N. (2010) Identifying human kinase-specific protein phosphorylation sites by integrating heterogeneous information from various sources. PloS one 5, e15411 2. Trost, B., and Kusalik, A. (2011) Computational prediction of eukaryotic phosphorylation sites. Bioinformatics 27, 2927-2935 3. Hubbard, M. J., and Cohen, P. (1993) On target with a new mechanism for the regulation of protein phosphorylation. Trends Biochem Sci 18, 172-177 4. Manning, G., Whyte, D. B., Martinez, R., Hunter, T., and Sudarsanam, S. (2002) The protein kinase complement of the human genome. Science 298, 1912-1934 5. Gao, J., Thelen, J. J., Dunker, A. K., and Xu, D. (2010) Musite, a tool for global prediction of general and kinase-specific phosphorylation sites. Molecular & cellular proteomics : MCP 9, 2586-2600 6. Wong, Y. H., Lee, T. Y., Liang, H. K., Huang, C. M., Wang, T. Y., Yang, Y. H., Chu, C. H., Huang, H. D., Ko, M. T., and Hwang, J. K. (2007) KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns. Nucleic Acids Res 35, W588-594 7. Xue, Y., Li, A., Wang, L., Feng, H., and Yao, X. (2006) PPSP: prediction of PK-specific phosphorylation site with Bayesian decision theory. BMC bioinformatics 7, 163 8. Xue, Y., Liu, Z., Cao, J., Ma, Q., Gao, X., Wang, Q., Jin, C., Zhou, Y., Wen, L., and Ren, J. (2011) GPS 2.1: enhanced prediction of kinase-specific phosphorylation sites with an algorithm of motif length selection. Protein engineering, design & selection : PEDS 24, 255-260 9. Song, C., Ye, M., Liu, Z., Cheng, H., Jiang, X., Han, G., Songyang, Z., Tan, Y., Wang, H., Ren, J., Xue, Y., and Zou, H. (2012) Systematic analysis of protein phosphorylation networks from phosphoproteomic data. Molecular & cellular proteomics : MCP 11, 1070-1083 10. Xue, Y., Ren, J., Gao, X., Jin, C., Wen, L., and Yao, X. (2008) GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy. Molecular & cellular proteomics : MCP 7, 1598-1608 11. Blom, N., Gammeltoft, S., and Brunak, S. (1999) Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol 294, 1351-1362 12. Iakoucheva, L. M., Radivojac, P., Brown, C. J., O''Connor, T. R., Sikes, J. G., Obradovic, Z., and Dunker, A. K. (2004) The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res 32, 1037-1049 13. Dorigo, M., Birattari, M., and Stutzle, T. (2006) Ant colony optimization. Computational Intelligence Magazine, IEEE 1, 28-39 14. Liaw, A., and Wiener, M. (2002) Classification and Regression by randomForest. R news 2, 18-22 15. Ebina, T., Toh, H., and Kuroda, Y. (2011) DROP: an SVM domain linker predictor trained with optimal features selected by random forest. Bioinformatics 27, 487-494 16. Boutet, E., Lieberherr, D., Tognolli, M., Schneider, M., and Bairoch, A. (2007) Uniprotkb/swiss-prot. Plant Bioinformatics, pp. 89-112, Springer 17. Diella, F., Cameron, S., Gemund, C., Linding, R., Via, A., Kuster, B., Sicheritz-Ponten, T., Blom, N., and Gibson, T. J. (2004) Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins. BMC bioinformatics 5, 79 18. Hornbeck, P. V., Kornhauser, J. M., Tkachev, S., Zhang, B., Skrzypek, E., Murray, B., Latham, V., and Sullivan, M. (2012) PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res 40, D261-270 19. Yang, C. Y., Chang, C. H., Yu, Y. L., Lin, T. C., Lee, S. A., Yen, C. C., Yang, J. M., Lai, J. M., Hong, Y. R., Tseng, T. L., Chao, K. M., and Huang, C. Y. (2008) PhosphoPOINT: a comprehensive human kinase interactome and phospho-protein database. Bioinformatics 24, i14-20 20. Crooks, G. E., Hon, G., Chandonia, J. M., and Brenner, S. E. (2004) WebLogo: a sequence logo generator. Genome research 14, 1188-1190 21. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009) The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 11, 10-18 22. Lee, T. Y., Bo-Kai Hsu, J., Chang, W. C., and Huang, H. D. (2011) RegPhos: a system to explore the protein kinase-substrate phosphorylation network in humans. Nucleic Acids Res 39, D777-787 23. Li, W., and Godzik, A. (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658-1659 24. Lee, T. Y., Bretana, N. A., and Lu, C. T. (2011) PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity. BMC bioinformatics 12, 261 25. Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T., and Kanehisa, M. (2008) AAindex: amino acid index database, progress report 2008. Nucleic Acids Res 36, D202-205 26. Atchley, W. R., Zhao, J., Fernandes, A. D., and Druke, T. (2005) Solving the protein sequence metric problem. Proceedings of the National Academy of Sciences of the United States of America 102, 6395-6400 27. Venkatarajan, M. S., and Braun, W. (2001) New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical–chemical properties. Molecular modeling annual 7, 445-453 28. Kabir, M. M., Shahjahan, M., and Murase, K. (2012) A new hybrid ant colony optimization algorithm for feature selection. Expert Systems with Applications 39, 3747-3763 29. Huang, C. L. (2009) ACO-based hybrid classification system with feature subset selection and model parameters optimization. Neurocomputing 73, 438-448 30. Huang, H., Xie, H. B., Guo, J. Y., and Chen, H. J. (2012) Ant colony optimization-based feature selection method for surface electromyography signals classification. Comput Biol Med 42, 30-38 31. Benesty, J., Chen, J., Huang, Y., and Cohen, I. (2009) Pearson correlation coefficient. Noise reduction in speech processing, pp. 1-4, Springer 32. Chen, L., Chen, B., and Chen, Y. (2011) Image feature selection based on ant colony optimization. AI 2011: Advances in Artificial Intelligence, pp. 580-589, Springer 33. Huang, J. H., Cao, D. S., Yan, J., Xu, Q. S., Hu, Q. N., and Liang, Y. Z. (2012) Using core hydrophobicity to identify phosphorylation sites of human G protein-coupled receptors. Biochimie 94, 1697-1704 34. Zhan, Z., He, K., Zhu, D., Jiang, D., Huang, Y.-H., Li, Y., Sun, C., and Jin, Y.-H. (2012) Phosphorylation of Rad9 at Serine 328 by Cyclin A-Cdk2 Triggers Apoptosis via Interfering Bcl-xL. PloS one 7, e44923 35. Witt, O., Deubzer, H. E., Milde, T., and Oehme, I. (2009) HDAC family: what are the cancer relevant targets? Cancer letters 277, 8-21 36. Pluemsampant, S., Safronova, O. S., Nakahama, K. i., and Morita, I. (2008) Protein kinase CK2 is a key activator of histone deacetylase in hypoxia‐associated tumors. International journal of cancer 122, 333-341 37. St Onge, R. P., Besley, B. D., Pelley, J. L., and Davey, S. (2003) A role for the phosphorylation of hRad9 in checkpoint signaling. The Journal of biological chemistry 278, 26620-26628 38. Khan, D. H., He, S., Yu, J., Winter, S., Cao, W., Seiser, C., and Davie, J. R. (2013) Protein Kinase CK2 Regulates the Dimerization of Histone Deacetylase 1 (HDAC1) and HDAC2 during Mitosis. The Journal of biological chemistry 288, 16518-16528
摘要: 蛋白質磷酸化是重要的後轉譯修飾,許多生物過程像是DNA的修復、轉錄調控以及訊號傳遞等等,都與磷酸化相關,所以控制失衡往往容易造成病變,如果能夠準確的預測人類的磷酸化位置,將有助於解決相關疾病,因此我們希望能夠提升人類磷酸化位置預測精準度,建立針對人類的磷酸化位置預測工具。本研究發展了針對特定激酶的磷酸化預測系統GasPhos,提出一個基於蟻群系統與基因演算法的特徵選擇方法Gas並利用效能評估策略針對不同的激酶選擇最佳的學習模型。Gas除了在路徑選擇上使用MDGI作為啟發值外,更採用二元轉換策略並提出新的轉換規則。GasPhos可預測20種激酶的磷酸化位置,然而本論文主要討論六種數量較多且常見的激酶。在五倍交叉驗證下,GasPhos預測效能至少高出其它五個磷酸化預測系統10%的整體平均馬修相關係數。在系統分析上,除了探討不同的啟發值、GA所扮演的角色、三種轉換規則、特徵選擇方法及常出現特徵的生物特性外,更觀察WebLogo與Gas所選出特徵之相關性。為了讓使用者更精確的使用GasPhos,我們分析各預測系統對不同功能蛋白的效能並探討兩種人類疾病相關的磷酸化作用。最後,Gas亦可應用於其他需要特徵選擇的問題上,將有助於提高預測效能。
Protein phosphorylation is one of the important post-translational modifications, many biological processes are related with phosphorylation, such as DNA repair, transcriptional regulation and signal transduction. Therefore, abnormal regulations of phosphorylation usually cause diseases. If we can accurately predict human phosphorylation sites, this could help to solve human-related diseases. Therefore, this study developed a kinase-specific phosphorylation prediction system, GasPhos, and proposed a feature selection method, called Gas, based on ant colony system and genetic algorithm, and the performance evaluation strategy was used to choose the best learning model for different kinases. Gas uses MDGI as heuristic value on path selection, and adopted binary transition strategies and proposed a new transition rules. GasPhos can predict phosphorylation sites for 20 kinases; however, this article is focuses on six kinases with the properties of larger and common. By 5-fold cross-validation, the average performance of GasPhos is higher than the other five phosphorylation prediction system 10% of Matthews’s correlation coefficient (MCC). In system analysis, we discussed different heuristic value, the role of GA, three kinds of transformation rules, different feature selection methods and the biological properties that frequently selected features; in addition, we observed the correlation of Weblogo and the selected feature number of Gas. In order to let users more precisely using GasPhos, we analyzed the performance of each prediction system for different functional proteins and explored two kinds of human disease-related phosphorylation. Finally, Gas can apply to other issues that need feature selection, which could help to improve the performance of prediction system.
URI: http://hdl.handle.net/11455/60851
其他識別: U0005-2808201314203900
文章連結: http://www.airitilibrary.com/Publication/alDetailedMesh1?DocID=U0005-2808201314203900
Appears in Collections:基因體暨生物資訊學研究所

文件中的檔案:

取得全文請前往華藝線上圖書館



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.