Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/92367
標題: 基於蛋白質序列透過新穎的特徵設計與整體學習預測結構型B細胞抗原表位
Predicting conformational B-cell epitope based on protein sequence by new feature designing and ensemble learning
作者: Po-feng chem
陳柏逢
關鍵字: B-cell epitope;machine learning;tertiary structure of proteins;structural epitope;B細胞抗原表位;機器學習;蛋白質三級結構;B細胞抗原表位預測;結構型抗原表位
引用: 1. Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P: Lymphocytes and the Cellular Basis of Adaptive Immunity. 2002. 2. Emini EA, Hughes JV, Perlow D, Boger J: Induction of hepatitis A virus-neutralizing antibody by a virus-specific synthetic peptide. Journal of virology 1985, 55(3):836-839. 3. Blythe MJ, Flower DR: Benchmarking B cell epitope prediction: underperformance of existing methods. Protein Science 2005, 14(1):246-248. 4. Sollner J, Mayer B: Machine learning approaches for prediction of linear B‐cell epitopes on proteins. Journal of Molecular Recognition 2006, 19(3):200-208. 5. Kringelum JV, Lundegaard C, Lund O, Nielsen M: Reliable B cell epitope predictions: impacts of method development and improved benchmarking. PLoS computational biology 2012, 8(12):e1002829. 6. Arpin C, Dechanet J, Van Kooten C, Merville P, Grouard G, Briere F, Banchereau J, Liu Y-J: Generation of memory B cells and plasma cells in vitro. Science 1995, 268(5211):720-722. 7. Mauri C, Bosma A: Immune regulatory function of B cells. Annual review of immunology 2012, 30:221-241. 8. Bernasconi NL, Traggiai E, Lanzavecchia A: Maintenance of serological memory by polyclonal activation of human memory B cells. Science 2002, 298(5601):2199-2202. 9. Abbas AK, Lichtman AH, Pillai S: Cellular and molecular immunology: Elsevier Health Sciences; 1994. 10. Van Regenmortel M: The concept and operational definition of protein epitopes. Philosophical Transactions of the Royal Society of London B, Biological Sciences 1989, 323(1217):451-466. 11. Kindt TJ, Goldsby RA, Osborne BA, Kuby J: Kuby immunology: Macmillan; 2007. 12. Davidson EA: Immunoinformatics: Predicting Immunogenicity in Silico. In.: LWW; 2008. 13. Barlow D, Edwards M, Thornton J: Continuous and discontinuous protein antigenic determinants. 1986. 14. Zhang W, Niu Y, Xiong Y, Zhao M, Yu R, Liu J: Computational prediction of conformational B-cell epitopes from antigen primary structures by ensemble learning. PloS one 2012, 7(8):e43575. 15. Van Regenmortel MH: Mapping epitope structure and activity: from one-dimensional prediction to four-dimensional description of antigenic specificity. Methods 1996, 9(3):465-472. 16. Horsfall AC, Hay FC, Soltys AJ, Jones MG: Epitope mapping. Immunology today 1991, 12(7):211-213. 17. Padlan EA: X-ray crystallography of antibodies. Advances in protein chemistry 1996, 49:57-133. 18. Benjamin DC: B-cell epitopes: fact and fiction. In: Inhibitors to Coagulation Factors. Springer; 1995: 95-108. 19. Vinion-Dubiel AD, McClain MS, Cao P, Mernaugh RL, Cover TL: Antigenic Diversity among Helicobacter pyloriVacuolating Toxins. Infection and immunity 2001, 69(7):4329-4336. 20. Haste Andersen P, Nielsen M, Lund O: Prediction of residues in discontinuous B‐cell epitopes using protein 3D structures. Protein Science 2006, 15(11):2558-2567. 21. Parker J, Guo D, Hodges R: New hydrophilicity scale derived from high-performance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites. Biochemistry 1986, 25(19):5425-5432. 22. Janin J, Wodak S, Levitt M, Maigret B: Conformation of amino acid side-chains in proteins. Journal of molecular biology 1978, 125(3):357-386. 23. Ponnuswamy P, Prabhakaran M, Manavalan P: Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins. Biochimica et Biophysica Acta (BBA)-Protein Structure 1980, 623(2):301-316. 24. Pellequer J-L, Westhof E, Van Regenmortel MH: Correlation between the location of antigenic sites and the prediction of turns in proteins. Immunology letters 1993, 36(1):83-99. 25. Larsen JE, Lund O, Nielsen M: Improved method for predicting linear B-cell epitopes. Immunome research 2006, 2(1):2. 26. Saha S, Raghava G: Prediction of continuous B‐cell epitopes in an antigen using recurrent neural network. Proteins: Structure, Function, and Bioinformatics 2006, 65(1):40-48. 27. Chen J, Liu H, Yang J, Chou K-C: Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino acids 2007, 33(3):423-428. 28. EL‐Manzalawy Y, Dobbs D, Honavar V: Predicting linear B‐cell epitopes using string kernels. Journal of molecular recognition 2008, 21(4):243-255. 29. Sweredoski MJ, Baldi P: COBEpro: a novel system for predicting continuous B-cell epitopes. Protein Engineering Design and Selection 2009, 22(3):113-120. 30. Liang S, Zheng D, Zhang C, Zacharias M: Prediction of antigenic epitopes on protein surfaces by consensus scoring. BMC bioinformatics 2009, 10(1):302. 31. Rubinstein ND, Mayrose I, Pupko T: A machine-learning approach for predicting B-cell epitopes. Molecular immunology 2009, 46(5):840-847. 32. Liang S, Zheng D, Standley DM, Yao B, Zacharias M, Zhang C: EPSVR and EPMeta: prediction of antigenic epitopes using support vector regression and multiple server results. BMC bioinformatics 2010, 11(1):381. 33. Rubinstein ND, Mayrose I, Halperin D, Yekutieli D, Gershoni JM, Pupko T: Computational characterization of B-cell epitopes. Molecular immunology 2008, 45(12):3477-3489. 34. Zhang W, Xiong Y, Zhao M, Zou H, Ye X, Liu J: Prediction of conformational B-cell epitopes from 3D structures by random forests with a distance-based feature. BMC bioinformatics 2011, 12(1):341. 35. Ansari HR, Raghava GP: Identification of conformational B-cell Epitopes in an antigen from its primary sequence. Immunome research 2010, 6(1):6. 36. Walter G: Production and use of antibodies against synthetic peptides. Journal of immunological methods 1986, 88(2):149-161. 37. Gomara M, Haro I: Synthetic peptides for the immunodiagnosis of human diseases. Current medicinal chemistry 2007, 14(5):531-546. 38. Steimer KS, Scandella CJ, Skiles PV, Haigwood NL: Neutralization of divergent HIV-1 isolates by conformation-dependent human antibodies to Gp120. Science 1991, 254(5028):105-108. 39. Laver WG, Air GM, Webster RG, Smith-Gill SJ: Epitopes on protein antigens: misconceptions and realities. Cell 1990, 61(4):553-556. 40. Peters B, Sidney J, Bourne P, Bui H-H, Buus S, Doh G, Fleri W, Kronenberg M, Kubo R, Lund O: The immune epitope database and analysis resource: from vision to blueprint. PLoS biology 2005, 3(3):e91. 41. Salimi N, Fleri W, Peters B, Sette A: The immune epitope database: a historical retrospective of the first decade. Immunology 2012, 137(2):117-123. 42. Vita R, Overton JA, Greenbaum JA, Sette A, Peters B: Query enhancement through the practical application of ontology: the IEDB and OBI. Journal of biomedical semantics 2013, 4(1):1-10. 43. Nakai K, Kidera A, Kanehisa M: Cluster analysis of amino acid indices for prediction of protein structure and function. Protein Engineering 1988, 2(2):93-100. 44. Tomii K, Kanehisa M: Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. Protein Engineering 1996, 9(1):27-36. 45. Kawashima S, Ogata H, Kanehisa M: AAindex: amino acid index database. Nucleic Acids Research 1999, 27(1):368-369. 46. Kawashima S, Kanehisa M: AAindex: amino acid index database. Nucleic acids research 2000, 28(1):374-374. 47. Bentley G, Dodson E, Dodson G, Hodgkin D, Mercola D: Structure of insulin in 4-zinc insulin. 1976. 48. Adams MJ, Blundell T, Dodson E, Dodson G, Vijayan M, Baker E, Harding M, Hodgkin D, Rimmer B, Sheat S: Structure of rhombohedral 2 zinc insulin crystals. Nature 1969, 224(5218):491-495. 49. Avey H, Boles M, Carlisle C, Evans S, Morris S, Palmer R, Woolhouse B, Shall S: Structure of ribonuclease. 1967. 50. Berman HM, Kleywegt GJ, Nakamura H, Markley JL: The Protein Data Bank archive as an open data resource. Journal of computer-aided molecular design 2014, 28(10):1009-1014. 51. Bernstein HJ: The CRYSNET terminal organization. In: Proceedings of the 1975 annual conference: 1975. ACM: 87. 52. Berman HM: The protein data bank: a historical perspective. Acta Crystallographica Section A: Foundations of Crystallography 2007, 64(1):88-95. 53. Chang C-C, Lin C-J: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2011, 2(3):27. 54. Petersen B, Petersen TN, Andersen P, Nielsen M, Lundegaard C: A generic method for assignment of reliability scores applied to solvent accessibility predictions. BMC Structural Biology 2009, 9(1):51. 55. Connolly ML: Molecular surface triangulation. Journal of Applied Crystallography 1985, 18(6):499-505. 56. Chothia C: The nature of the accessible and buried surfaces in proteins. Journal of molecular biology 1976, 105(1):1-12. 57. Ahmad S, Gromiha MM, Sarai A: Real value prediction of solvent accessibility from amino acid sequence. Proteins: Structure, Function, and Bioinformatics 2003, 50(4):629-635. 58. Jones S, Thornton JM: Prediction of protein-protein interaction sites using patch analysis. Journal of molecular biology 1997, 272(1):133-143. 59. Panchenko AR, Kondrashov F, Bryant S: Prediction of functional sites by analysis of sequence and structure conservation. Protein Science 2004, 13(4):884-892. 60. Mooney S: Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis. Briefings in Bioinformatics 2005, 6(1):44-56. 61. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH: The WEKA data mining software: an update. ACM SIGKDD explorations newsletter 2009, 11(1):10-18. 62. Prasad L, Waygood EB, Lee JS, Delbaere LT: The 2.5 A resolution structure of the jel42 Fab fragment/HPr complex. Journal of molecular biology 1998, 280(5):829-845. 63. Postma P, Lengeler J, Jacobson G: Phosphoenolpyruvate: carbohydrate phosphotransferase systems of bacteria. Microbiological reviews 1993, 57(3):543. 64. Josephson K, Jones BC, Walter LJ, DiGiacomo R, Indelicato SR, Walter MR: Noncompetitive antibody neutralization of IL-10 revealed by protein engineering and x-ray crystallography. Structure 2002, 10(7):981-987.
摘要: 
B cell epitope is located in the active site on the protein, which can combine with specific antibody. The binding of antigen and antibody can activate immune system and induce systematic immune response that is a very important part of immune system. In this way, harmful viruses and bacterial can be effectively eliminated. B-cell epitope can be classified into linear epitopes and structural epitopes. Prediction of epitopes has been widely discussed, but discussion only focus on linear epitopes, prediction of structural epitopes is still difficult. The problems include the lack of information of the three-dimensional structure of proteins. The mainstream epitope prediction is to analyzing three-dimensional structure of proteins. However, it is not easy to resolve three-dimensional structure of proteins. There is only a few resource can be applied in research, and it also narrows the scope of which can be explored. So this study is through analysis of the primary sequence of the protein to explore, analyze and predict the distribution of B-cell epitopes located on the protein sequence. In this study, the machine learning methods were applied to explore the features of different proteins, and multiple experimental design was applied to screen and find the correlation of the antibody-antigen bond. This study explored multiple features including the physical and chemical properties of the protein, secondary structure of protein, and the exposed surface area of the protein. And through the analysis of the characteristics of epitopes, experiments were designed to analyze the differences in molecular weight, the distribution of amino acid, and the distribution of functional group. The method is the ensemble learning model, the result of accuracy of independent test is 60.201%, TPR is 70.698%, SPC is 59.678%. For case studies, this study was to predict two resolution of protein structures, phosphocarrier HPr and IL-10, the accuracy is 69.333% and 65.824%, respectively, TPR is 0.611 and 0.8,respectively, and SPC is 0.704 and 0.628, respectively.

B細胞抗原表位(B-cell epitope)為位於蛋白上的活化位點,能夠與特定抗體所鍵結,並透過抗體與抗原的鍵結活化免疫系統,誘發有系統性的免疫反應,為免疫系統中防護侵入相當重要的機制,同時也是免疫系統中相當重要的一環,使宿主得以有效清除對生物體有害的外來病菌。B細胞抗原表位分為線性抗原表位與結構型抗原表位,而抗原表位的預測現今已被廣泛的探討,但僅限於線性抗原表位,結構型抗原表位仍有許多艱困的課題仍難以克服,如缺乏蛋白實體的三維結構資訊等。現今抗原表位預測研究的主流主要透過蛋白質三級結構作為分析,然而蛋白質三級解構解析不易,得以取得並取之作為研究用途的有限,也限縮了其可探討的範疇,因此本研究希望發展透過蛋白質一級序列的分析所建構的系統,來探討、分析並預測B細胞抗原表位於蛋白質序列上的分佈。本研究將透過機器學習的方式,探討不同的蛋白質特性,並透過多重實驗設計,篩選出與抗體抗原鍵結的相關特性,探討的特性囊括如目前被廣泛探討的蛋白質的物理化學特性、蛋白質的二級結構、蛋白質表面暴露面積等等,以及透過本次實驗的針對抗原表位的分析所設計特性,如分子基團大小的差異,抗原表位胺基酸分佈傾向、官能基團的分佈等特性作探討。透過上訴的特性,組建整合是學習模組(ensemble learning model),其在獨立測試下的結果精確度為60.201%,TPR為70.698%、SPC為59.678%。在case studies本實驗針對兩個以解結構的蛋白phosphocarrier HPr與IL-10進行預測其精確度分別為69.333%與65.824%,TPR分別為0.611與0.8,SPC則為0.704與0.628。
URI: http://hdl.handle.net/11455/92367
其他識別: U0005-0202201522012100
Rights: 同意授權瀏覽/列印電子全文服務,2018-02-05起公開。
Appears in Collections:基因體暨生物資訊學研究所

Files in This Item:
File Description SizeFormat Existing users please Login
nchu-104-7101019002-1.pdf3.37 MBAdobe PDFThis file is only available in the university internal network    Request a copy
Show full item record
 

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.