Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/18172
標題: 利用多軌跡搜尋法調校支援向量機參數以預測雙硫鍵之鍵結型態
Disulfide Bonding Patterns Prediction Using Support Vector Machine with Parameters Tuned by Multiple Trajectory Search
作者: 林宣宏
Lin, Hsuan-Hung
關鍵字: 雙硫鍵;Disulfide bonding state;支援向量機;多軌跡搜尋法;disulfide bonding pattern;support vector machine;multiple trajectory search
出版社: 應用數學系所
引用: [1]. S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller and D. J. Lipman, “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs”, Nucleic Acids Research, Vol. 25, pp. 3389-3402, 1997. [2]. W. Antuch, P. Guntert, M. Billeter, T. Hawthorne, H. Grossenbacher, and K. Wuthrich, “NMR solution structure of the recombinant tick anticoagulant protein (rTAP), a factor Xa inhibitor from the tick Ornithodoros moubata”, FEBS Letters, Vol. 352, pp. 251-257, 1994. [3]. A. Bairoch and R. Apweiler, “The Swiss-Prot protein sequence database and its supplement TrEMBL in 2000”, Nucleic Acids Research, Vol. 28, pp. 45-48, 2000. [4]. H. M. Berman et al., “The Protein Data Bank”, Acta Crystallogr, Vol. D58, pp. 899-907, 2002. [5]. A. Ceroni, A. Passerini, A. Vullo and P. Frasconi, “DISULFIND: a Disulfide Bonding State and Cysteine Connectivity Prediction Server”, Nucleic Acids Res., Vol. 34, pp. W177-W181, 2006. [6]. C. C. Chang and C. J. Lin, “LIBSVM: a library for support vector machines”, 2001, Retrieved from http://www.csie.ntu.edu.tw/~cjlin/libsvm [7]. Y. C. Chen and J. K. Hwang, “Prediction of disulfide connectivity from protein sequences”, Proteins, Vol. 61, pp. 507-512, 2005. [8]. Y. C. Chen, Y. S. Lin, C. J. Lin and J. K. Hwang, “Prediction of the bonding states of cysteines using the support vector machines based on multiple feature vectors and cysteine state sequences”, Proteins, Vol. 55, pp. 1036-1042, 2004. [9]. B. J. Chen, C. H. Tsai, C. H. Chan and C. Y. Kao, “Disulfide connectivity prediction with 70% accuracy using two-level models”, Proteins, Vol. 64, pp. 246-252, 2006. [10]. J. Cheng, H. Saigo and P. Baldi, “Large-Scale Prediction of Disulphide Bridges Using Kernel Methods, Two-Dimensional Recursive Neural Networks, and Weighted Graph Matching”, Proteins, Vol. 62, pp. 617-629, 2006. [11]. C. C. Chuang, C. Y. Chen, J. M. Yang, P. C. Lyu, and J. K. Hwang, ”Relationship between protein structures and disulfide-bonding patterns”, Proteins, Vol. 55, pp. 1-5, 2003. [12]. N. Cristianini and J. Shawe-Taylor, An introduction to support vector machines and other kernel-based methods, Cambridge University Press, Cambridge, UK. 2000. [13]. P. Fariselli and R. Casadio, “Prediction of disulfide connectivity in proteins”, Bioinformatics, Vol. 17, pp. 957-964, 2001. [14]. P. Fariselli, P. L. Martelli and R. Casadio, “A neural network base method for predicting the disulfide connectivity in proteins”, In E. Damiani et al., eds. Knowledge based Intelligent Information Engineering Systems and Allied Technologies KES 2002, IOS Press, Amsterdam, Vol. 1, pp. 464-468, 2002. [15]. P. Fariselli, P. Riccobelli and R. Casadio, “Role of evolutionary information in predicting the disulfide-bonding state of cysteine in proteins”, Proteins, Vol. 36, pp. 340-346, 1999. [16]. F. Ferrè, and P. Clote, “Disulfide connectivity prediction using secondary structure information and diresidue frequencies”, Bioinformatics, Vol. 21, pp. 2336-2346, 2005. [17]. F. Ferrè, and P. Clote, “DiANNA 1.1: An extension of the DiANNA web server for ternary cysteine classification”, Nucleic Acids Res., Vol. 34, pp. W182-W185, 2006. [18]. A. Fiser, M. Cserzö, É. Tüdös and I. Simon, “Different sequence environments of cysteins and half cysteines in proteins: application to predict disulfide forming residues”, FEBS Letter, Vol. 302, pp.117-120, 1992. [19]. A. Fiser and I. Simon, “Predicting the oxidation state of cysteines by multiple sequence alignment”, Bioinformatics, Vol. 16, pp. 251-256, 2000. [20]. H. N. Gabow, “Implementation of algorithms for maximum matching on nonbipartite graphs”, Phd Thesis, Stanford University, CA. 1973. [21]. E. S. Huang, R. Samudrala, and J. W. Ponder, “Ab initio fold prediction of small helical proteins using distance geometry and knowledge-based scoring functions”, J. Mol. Biol., Vol. 290, pp. 267-281, 1999. [22]. D. T. Jones, “Protein secondary structure prediction based on position-specific scoring matrices”, J. Mol. Biol., Vol. 292, pp. 195-202, 1999. [23]. H. H. Lin and L. Y. Tseng, “Prediction of Disulfide Bonding Pattern Based on Support Vector Machine with Parameters Tuned by Multiple Trajectory Search”, WSEAS Transactions on Computers, Vol. 9, pp. 1429-1439, 2009. [24]. L. C. Loredana, E. B. Steven, J. P. H. Tim, C. Chothia, and A. G. Murzin, “SCOP database in 2002: refinements accommodate structural genomics”, Nucleic Acids Res. Vol. 30, pp.264-267, 2002. [25]. C. H. Lu, Y. C. Chen, C. S. Yu and J. K. Hwang, “Predicting disulfide connectivity patterns”, Proteins, Vol. 67, pp. 262-270, 2007. [26]. P. L. Martelli, P. Fariselli, L. Malaguti and R. Casadio,”Prediction of the disulfide-bonding state of cysteines in proteins with hidden neural networks”, Protein Engineering, Vol. 15, pp. 951-953, 2002. [27]. J. Meiler, M. Muller, A. Zeidler, F. Schmaschke, “Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks”, Journal of Molecular Modeling, Vol. 7, pp. 360-369, 2001. [28]. S. Mika and B. Rost, “Uniqueprot: creating representative protein sequence sets”, Nucleic Acids Res, Vol. 31, pp. 3789-3791, 2003. [29]. M. H. Mucchielli-Giorgi, S. Hazout and P. Tuffèry, “Predicting the disulfide bonding state of cysteines using protein descriptors”, Proteins, Vol. 46, pp. 243-240, 2002. [30]. S. M. Muskal, S. R. Holbrook and S. H. Kim, “Prediction of the disulfide-bonding state of cysteine in proteins”, Protein Eng., Vol. 3, pp. 667-672, 1990. [31]. R. Rubinstein and A. Fiser, “Predicting disulfide bond connectivity in proteins by correlated mutations analysis”, Bioinformatics, Vol. 24, pp. 498-504, 2008. [32]. A. Sali, and T. L. Blundell, “Comparative protein modeling by satisfaction of spatial restraints”, J. Mol. Biol., Vol. 234, pp. 799-815, 1993. [33]. C. Sander and R. Schneider, “Database of homology-derived protein structures and the structural meaning of sequence alignment”, Proteins, Vol. 9, 56-68, 1991. [34]. J. Skolnick, A. Kolinski, and A. R. Ortiz, “MONSSTER: a method for folding globular proteins with a small number of distance restraints”, J. Mol. Biol., Vol. 265, pp. 217-241, 1997 [35]. J. Song, Z. Yuan, H. Tan. T. Huber and K. Burrage, “Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure”, Bioinformatics, Vol. 23, pp. 3147-3154, 2007. [36]. S. Theodoridis and K. Koutroumbas, Pattern Recognition. 2nd edn. Academic Press, San Dieago, 2003. [37]. C. H. Tsai, B. J. Chen, H. H. Chan, H. L. Liu and C. Y. Kao, “Improving disulfide connectivity prediction with sequential distance between oxidized cysteines”, Bioinformatics, Vol. 21, pp. 4416-4419, 2005. [38]. L. Y. Tseng and C. Chen, “Multiple trajectory search for large scale global optimization”, Proceedings of 2008 IEEE Congress on Evolutionary Computation, CEC''08, Crystal City, Washington, DC, pp. 3052-3059, 2008. [39]. H. W. T. van Vlijmen, A. Gupta, L. S. Narasimhan, and J. Singh, “A novel database of disulfide patterns and its application to the discovery of distantly related homologs”, J. Mol. Biol., Vol. 335, pp. 1083-1092, 2004. [40]. A. Vullo, and P. Frasconi, “Disulfide connectivity prediction using recursive neural networks and evolutionary information”, Bioinformatics, Vol. 20, pp. 653-659, 2004. [41]. D. Witt, "Recent developments in disulfide bond formation", Synthesis, Vol. 16, pp. 2491-2509, 2008. [42]. E. Zhao, H. L. Liu, C. H. Tsai, H. K. Tsai, C. H. Chan, C. Y. Kao, “Cysteine separations profiles on protein sequences infer disulfide connectivity”, Bioinformatics, Vol. 21, pp. 1415-1420, 2005. [43]. It uses material from the Wikipedia article "Disulfide_bond" http://en.wikipedia.org/wiki/Disulfide_bond, used under the GNU Free Documentation License.
摘要: 
蛋白質結構預測問題是一個著名的計算生物學問題,目前仍是結構生物學的一大挑戰。雙硫鍵之鍵結型態於穩定蛋白質結構具有很重要的作用,對於蛋白質摺疊的預測而言,正確地預測雙硫鍵鍵結型態能大幅降低搜尋空間。因此,正確地預測雙硫鍵之鍵結的位置有助於解決蛋白質摺疊問題。由此觀之,發展一套能夠準確預測雙硫鍵之鍵結型態能有效促進蛋白質立體架構及其功能之預測。
本研究中,首先以位置加權矩陣(position specific scoring matrix, PSSM)、正規化雙硫鍵鍵距、預測的蛋白質二級結構與氨基酸的物理化學指標值作為支援向量機(SVM)之輸入特徵值,訓練及建構預測模組應用於計算半光氨酸對(cysteine pair)之間形成鍵結的機率。此外,本研究也利用多軌跡搜尋法(multiple trajectory search, MTS)調校支援向量機參數及特徵值的 window 值大小,再將支援向量機輸出的鍵結的機率值以最大權重最佳配對演算法(maximum weight perfect matching algorithm)找出雙硫鍵之鍵結型態。於事先已知道半光氨酸鍵結狀態下,對於資料集SP39,由實驗結果顯示,本論文提出的方法,預測雙硫鍵之鍵結型態之最佳預測準確率(QP)為79.8%(QP),而預測半光氨酸對之間是否形成鍵結的最佳正確率(QC)為80.9%。而於事先未知半光氨酸鍵結狀態下,對於資料集SPX,本論文之方法預測準確率將由目前已發表論文之最好結果51% (QP) 及52% (QC),分別提高至54.5% (QP) 及60% (QC)。
其次,我們使用與蛋白質三級結構相關的特徵,利用MODELLER預測蛋白質序列各氨基酸的Cα(α碳)的座標,先計算出各氨基酸之間的歐基里得距離(Euclidean distance),並延伸計算出正規化對距(normalized pair distance, NPD)向量作為輸入特徵值。利用多軌跡搜尋法調校支援向量機參數及特徵值NPD的 window 值大小,將支援向量機輸出的鍵結的機率值以修改後的最大權重最佳配對演算法找出雙硫鍵鍵結型態。由實驗得知,此方法於事先已知半光氨酸鍵結狀態下,對於資料集SP39,QP大幅提昇至92.2%,而QC也大幅提昇至94.2%。而於事先未知半光氨酸鍵結狀態下,對於資料集SPX,QP也可達84.4%,而QC則可達94.6%。由以上可知,本論文的方法能有效改善預測雙硫鍵的準確率。

Prediction of the protein structure is one of the most important problems in the computational biology, and it remains one of the biggest challenges in the structural biology. Disulfide bonds play an import structural role in stabilizing protein conformations. For the protein-folding prediction, a correct prediction of disulfide bridges can greatly reduce the search space. The prediction of disulfide bonding pattern helps, to a certain degree, predicts the 3D structure of a protein and hence its function since disulfide bonds imposes geometrical constraints on the protein backbones.
In this dissertation, we first used the position-specific scoring matrix (PSSM), normalized bond lengths, the predicted secondary structure of protein, and the physicochemical properties index of the amino acid as the features for designing the classifier based on the support vector machine (SVM). The classifier was trained to compute the connectivity probabilities of cysteine pairs. In addition, an evolutionary algorithm called the multiple trajectory search (MTS) was integrated with the SVM model to tune the parameters of the SVM and the window sizes for the features. The maximum weighted perfect matching algorithm was then used to find the disulfide connectivity pattern. In this study, the experimental results show that the accuracies rate reaches 79.8% for the prediction of the overall disulfide connectivity pattern (QP) and that of disulfide bridge prediction (QC) is 80.9% for dataset SP39. Without the prior knowledge of the bonding states of cysteines, the results show that the accuracies rate reaches 54.5% (QP) and 60% (QC), respectively.
Then, the protein 3D structure related features called normalized pair distance (NPD) vector were imposed. From experiments, we obtained the good performance for four problems in disulfide bond prediction. With the prior knowledge of the bonding states of cysteines, the results show that the accuracies rate reaches 92.2% (QP) and 94.2% (QC) respectively for dataset SP39. Without the prior knowledge of the bonding states of cysteines, the results show that the accuracies rate reaches 84.4% (QP) and 94.6% (QC) respectively for dataset SPX.
URI: http://hdl.handle.net/11455/18172
其他識別: U0005-0707201000020700
Appears in Collections:應用數學系所

Show full item record
 

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.