標題: 基於蛋白質序列透過新穎的特徵設計與整體學習預測結構型B細胞抗原表位
Predicting conformational B-cell epitope based on protein sequence by new feature designing and ensemble learning
作者: Po-feng chem
關鍵字: B-cell epitope;machine learning;tertiary structure of proteins;structural epitope;B細胞抗原表位;機器學習;蛋白質三級結構;B細胞抗原表位預測;結構型抗原表位
B cell epitope is located in the active site on the protein, which can combine with specific antibody. The binding of antigen and antibody can activate immune system and induce systematic immune response that is a very important part of immune system. In this way, harmful viruses and bacterial can be effectively eliminated. B-cell epitope can be classified into linear epitopes and structural epitopes. Prediction of epitopes has been widely discussed, but discussion only focus on linear epitopes, prediction of structural epitopes is still difficult. The problems include the lack of information of the three-dimensional structure of proteins. The mainstream epitope prediction is to analyzing three-dimensional structure of proteins. However, it is not easy to resolve three-dimensional structure of proteins. There is only a few resource can be applied in research, and it also narrows the scope of which can be explored. So this study is through analysis of the primary sequence of the protein to explore, analyze and predict the distribution of B-cell epitopes located on the protein sequence. In this study, the machine learning methods were applied to explore the features of different proteins, and multiple experimental design was applied to screen and find the correlation of the antibody-antigen bond. This study explored multiple features including the physical and chemical properties of the protein, secondary structure of protein, and the exposed surface area of the protein. And through the analysis of the characteristics of epitopes, experiments were designed to analyze the differences in molecular weight, the distribution of amino acid, and the distribution of functional group. The method is the ensemble learning model, the result of accuracy of independent test is 60.201%, TPR is 70.698%, SPC is 59.678%. For case studies, this study was to predict two resolution of protein structures, phosphocarrier HPr and IL-10, the accuracy is 69.333% and 65.824%, respectively, TPR is 0.611 and 0.8,respectively, and SPC is 0.704 and 0.628, respectively.

B細胞抗原表位(B-cell epitope)為位於蛋白上的活化位點,能夠與特定抗體所鍵結,並透過抗體與抗原的鍵結活化免疫系統,誘發有系統性的免疫反應,為免疫系統中防護侵入相當重要的機制,同時也是免疫系統中相當重要的一環,使宿主得以有效清除對生物體有害的外來病菌。B細胞抗原表位分為線性抗原表位與結構型抗原表位,而抗原表位的預測現今已被廣泛的探討,但僅限於線性抗原表位,結構型抗原表位仍有許多艱困的課題仍難以克服,如缺乏蛋白實體的三維結構資訊等。現今抗原表位預測研究的主流主要透過蛋白質三級結構作為分析,然而蛋白質三級解構解析不易,得以取得並取之作為研究用途的有限,也限縮了其可探討的範疇,因此本研究希望發展透過蛋白質一級序列的分析所建構的系統,來探討、分析並預測B細胞抗原表位於蛋白質序列上的分佈。本研究將透過機器學習的方式,探討不同的蛋白質特性,並透過多重實驗設計,篩選出與抗體抗原鍵結的相關特性,探討的特性囊括如目前被廣泛探討的蛋白質的物理化學特性、蛋白質的二級結構、蛋白質表面暴露面積等等,以及透過本次實驗的針對抗原表位的分析所設計特性,如分子基團大小的差異,抗原表位胺基酸分佈傾向、官能基團的分佈等特性作探討。透過上訴的特性,組建整合是學習模組(ensemble learning model),其在獨立測試下的結果精確度為60.201%,TPR為70.698%、SPC為59.678%。在case studies本實驗針對兩個以解結構的蛋白phosphocarrier HPr與IL-10進行預測其精確度分別為69.333%與65.824%,TPR分別為0.611與0.8,SPC則為0.704與0.628。
