標題: 利用多軌跡搜尋法調校支援向量機參數以預測雙硫鍵之鍵結型態
標題: 利用多軌跡搜尋法調校支援向量機參數以預測雙硫鍵之鍵結型態
Disulfide Bonding Patterns Prediction Using Support Vector Machine with Parameters Tuned by Multiple Trajectory Search
作者: 林宣宏
Lin, Hsuan-Hung
關鍵字: 雙硫鍵;Disulfide bonding state;支援向量機;多軌跡搜尋法;disulfide bonding pattern;support vector machine;multiple trajectory search
出版社: 應用數學系所
本研究中,首先以位置加權矩陣(position specific scoring matrix, PSSM)、正規化雙硫鍵鍵距、預測的蛋白質二級結構與氨基酸的物理化學指標值作為支援向量機(SVM)之輸入特徵值,訓練及建構預測模組應用於計算半光氨酸對(cysteine pair)之間形成鍵結的機率。此外,本研究也利用多軌跡搜尋法(multiple trajectory search, MTS)調校支援向量機參數及特徵值的 window 值大小,再將支援向量機輸出的鍵結的機率值以最大權重最佳配對演算法(maximum weight perfect matching algorithm)找出雙硫鍵之鍵結型態。於事先已知道半光氨酸鍵結狀態下,對於資料集SP39,由實驗結果顯示,本論文提出的方法,預測雙硫鍵之鍵結型態之最佳預測準確率(QP)為79.8%(QP),而預測半光氨酸對之間是否形成鍵結的最佳正確率(QC)為80.9%。而於事先未知半光氨酸鍵結狀態下,對於資料集SPX,本論文之方法預測準確率將由目前已發表論文之最好結果51% (QP) 及52% (QC),分別提高至54.5% (QP) 及60% (QC)。
其次,我們使用與蛋白質三級結構相關的特徵,利用MODELLER預測蛋白質序列各氨基酸的Cα(α碳)的座標,先計算出各氨基酸之間的歐基里得距離(Euclidean distance),並延伸計算出正規化對距(normalized pair distance, NPD)向量作為輸入特徵值。利用多軌跡搜尋法調校支援向量機參數及特徵值NPD的 window 值大小,將支援向量機輸出的鍵結的機率值以修改後的最大權重最佳配對演算法找出雙硫鍵鍵結型態。由實驗得知,此方法於事先已知半光氨酸鍵結狀態下,對於資料集SP39,QP大幅提昇至92.2%,而QC也大幅提昇至94.2%。而於事先未知半光氨酸鍵結狀態下,對於資料集SPX,QP也可達84.4%,而QC則可達94.6%。由以上可知,本論文的方法能有效改善預測雙硫鍵的準確率。

Prediction of the protein structure is one of the most important problems in the computational biology, and it remains one of the biggest challenges in the structural biology. Disulfide bonds play an import structural role in stabilizing protein conformations. For the protein-folding prediction, a correct prediction of disulfide bridges can greatly reduce the search space. The prediction of disulfide bonding pattern helps, to a certain degree, predicts the 3D structure of a protein and hence its function since disulfide bonds imposes geometrical constraints on the protein backbones.
In this dissertation, we first used the position-specific scoring matrix (PSSM), normalized bond lengths, the predicted secondary structure of protein, and the physicochemical properties index of the amino acid as the features for designing the classifier based on the support vector machine (SVM). The classifier was trained to compute the connectivity probabilities of cysteine pairs. In addition, an evolutionary algorithm called the multiple trajectory search (MTS) was integrated with the SVM model to tune the parameters of the SVM and the window sizes for the features. The maximum weighted perfect matching algorithm was then used to find the disulfide connectivity pattern. In this study, the experimental results show that the accuracies rate reaches 79.8% for the prediction of the overall disulfide connectivity pattern (QP) and that of disulfide bridge prediction (QC) is 80.9% for dataset SP39. Without the prior knowledge of the bonding states of cysteines, the results show that the accuracies rate reaches 54.5% (QP) and 60% (QC), respectively.
Then, the protein 3D structure related features called normalized pair distance (NPD) vector were imposed. From experiments, we obtained the good performance for four problems in disulfide bond prediction. With the prior knowledge of the bonding states of cysteines, the results show that the accuracies rate reaches 92.2% (QP) and 94.2% (QC) respectively for dataset SP39. Without the prior knowledge of the bonding states of cysteines, the results show that the accuracies rate reaches 84.4% (QP) and 94.6% (QC) respectively for dataset SPX.
