標題: 利用基因演算法輔助蟻群系統和機器學習方法之評估策略預測蛋白質磷酸化位置
GasPhos: Protein phosphorylation sites prediction by GA-aided ant colony system and evaluation strategy of machine learning algorithms
作者: 廖家逢
Liao, Chia-Feng
關鍵字: 激酶
ant colony system
genetic algorithms
feature selection
基因體暨生物資訊學研究所
摘要: 蛋白質磷酸化是重要的後轉譯修飾,許多生物過程像是DNA的修復、轉錄調控以及訊號傳遞等等,都與磷酸化相關,所以控制失衡往往容易造成病變,如果能夠準確的預測人類的磷酸化位置,將有助於解決相關疾病,因此我們希望能夠提升人類磷酸化位置預測精準度,建立針對人類的磷酸化位置預測工具。本研究發展了針對特定激酶的磷酸化預測系統GasPhos,提出一個基於蟻群系統與基因演算法的特徵選擇方法Gas並利用效能評估策略針對不同的激酶選擇最佳的學習模型。Gas除了在路徑選擇上使用MDGI作為啟發值外,更採用二元轉換策略並提出新的轉換規則。GasPhos可預測20種激酶的磷酸化位置,然而本論文主要討論六種數量較多且常見的激酶。在五倍交叉驗證下,GasPhos預測效能至少高出其它五個磷酸化預測系統10%的整體平均馬修相關係數。在系統分析上,除了探討不同的啟發值、GA所扮演的角色、三種轉換規則、特徵選擇方法及常出現特徵的生物特性外,更觀察WebLogo與Gas所選出特徵之相關性。為了讓使用者更精確的使用GasPhos,我們分析各預測系統對不同功能蛋白的效能並探討兩種人類疾病相關的磷酸化作用。最後,Gas亦可應用於其他需要特徵選擇的問題上,將有助於提高預測效能。
Protein phosphorylation is one of the important post-translational modifications, many biological processes are related with phosphorylation, such as DNA repair, transcriptional regulation and signal transduction. Therefore, abnormal regulations of phosphorylation usually cause diseases. If we can accurately predict human phosphorylation sites, this could help to solve human-related diseases. Therefore, this study developed a kinase-specific phosphorylation prediction system, GasPhos, and proposed a feature selection method, called Gas, based on ant colony system and genetic algorithm, and the performance evaluation strategy was used to choose the best learning model for different kinases. Gas uses MDGI as heuristic value on path selection, and adopted binary transition strategies and proposed a new transition rules. GasPhos can predict phosphorylation sites for 20 kinases; however, this article is focuses on six kinases with the properties of larger and common. By 5-fold cross-validation, the average performance of GasPhos is higher than the other five phosphorylation prediction system 10% of Matthews’s correlation coefficient (MCC). In system analysis, we discussed different heuristic value, the role of GA, three kinds of transformation rules, different feature selection methods and the biological properties that frequently selected features; in addition, we observed the correlation of Weblogo and the selected feature number of Gas. In order to let users more precisely using GasPhos, we analyzed the performance of each prediction system for different functional proteins and explored two kinds of human disease-related phosphorylation. Finally, Gas can apply to other issues that need feature selection, which could help to improve the performance of prediction system.
其他識別: U0005-2808201314203900
