Please use this identifier to cite or link to this item:
標題: 使用機器學習與模型選擇預測蛋白質瓜胺酸化位置
Protein Citrullination Sites Prediction Using Machine Learning and Model Selection
作者: 陳啟瑋
Chen, Chi-Wei
關鍵字: 胜肽精胺酸去亞胺酶支持向量機
Peptidylarginine Deiminases
Support Vector Machine
Genetic Algorithm
出版社: 基因體暨生物資訊學研究所
引用: 1.Jones JE, Causey CP, Knuckley B, Slack-Noyes JL, Thompson PR: Protein arginine deiminase 4 (PAD4): Current understanding and future therapeutic potential. Current opinion in drug discovery & development 2009, 12(5):616-627. 2.Ke Z, Zhou Y, Hu P, Wang S, Xie D, Zhang Y: Active site cysteine is protonated in the PAD4 Michaelis complex: evidence from Born-Oppenheimer ab initio QM/MM molecular dynamics simulations. The journal of physical chemistry B 2009, 113(38):12750-12758. 3.Cafaro TA, Santo S, Robles LA, Crim N, Urrets-Zavalia JA, Serra HM: Peptidylarginine deiminase type 2 is over expressed in the glaucomatous optic nerve. Molecular vision 2010, 16:1654-1658. 4.Arita K, Shimizu T, Hashimoto H, Hidaka Y, Yamada M, Sato M: Structural basis for histone N-terminal recognition by human peptidylarginine deiminase 4. Proceedings of the National Academy of Sciences of the United States of America 2006, 103(14):5291-5296. 5.Gyorgy B, Toth E, Tarcsa E, Falus A, Buzas EI: Citrullination: a posttranslational modification in health and disease. The international journal of biochemistry & cell biology 2006, 38(10):1662-1677. 6.Foulquier C, Sebbag M, Clavel C, Chapuy-Regaud S, Al Badine R, Mechin MC, Vincent C, Nachat R, Yamada M, Takahara H et al: Peptidyl arginine deiminase type 2 (PAD-2) and PAD-4 but not PAD-1, PAD-3, and PAD-6 are expressed in rheumatoid arthritis synovium in close association with tissue inflammation. Arthritis and rheumatism 2007, 56(11):3541-3553. 7.Vossenaar ER, Zendman AJ, van Venrooij WJ, Pruijn GJ: PAD, a growing family of citrullinating enzymes: genes, features and involvement in disease. BioEssays : news and reviews in molecular, cellular and developmental biology 2003, 25(11):1106-1118. 8.Anzilotti C, Pratesi F, Tommasi C, Migliorini P: Peptidylarginine deiminase 4 and citrullination in health and disease. Autoimmunity reviews 2010, 9(3):158-160. 9.Moscarello MA, Pritzker L, Mastronardi FG, Wood DD: Peptidylarginine deiminase: a candidate factor in demyelinating disease. Journal of neurochemistry 2002, 81(2):335-343. 10.Baka Z, Barta P, Losonczy G, Krenacs T, Papay J, Szarka E, Sarmay G, Babos F, Magyar A, Geher P et al: Specific expression of PAD4 and citrullinated proteins in lung cancer is not associated with anti-CCP antibody production. International immunology 2011, 23(6):405-414. 11.Baka Z, György B, Géher P, Buzás EI, Falus A, Nagy G: Citrullination under physiological and pathological conditions. Joint Bone Spine 2012. 12.Tanikawa C, Ueda K, Nakagawa H, Yoshida N, Nakamura Y, Matsuda K: Regulation of protein Citrullination through p53/PADI4 network in DNA damage response. Cancer research 2009, 69(22):8761-8769. 13.Bhattacharya SK: Retinal deimination in aging and disease. IUBMB life 2009, 61(5):504-509. 14.Guo Q, Bedford MT, Fast W: Discovery of peptidylarginine deiminase-4 substrates by protein array: antagonistic citrullination and methylation of human ribosomal protein S2. Mol BioSyst 2011, 7(7):2286-2295. 15.Boutet E, Lieberherr D, Tognolli M, Schneider M, Bairoch A: UniProtKB/Swiss-Prot. Methods in molecular biology (Clifton, NJ) 2007, 406:89-112. 16.Prasad TSK, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A: Human protein reference database—2009 update. Nucleic acids research 2009, 37(suppl 1):D767-D772. 17.Khare SP, Habib F, Sharma R, Gadewal N, Gupta S, Galande S: HIstome—a relational knowledgebase of human histone proteins and histone modifying enzymes. Nucleic acids research 2012, 40(D1):D337-D342. 18.Tarcsa E, Marekov LN, Mei G, Melino G, Lee SC, Steinert PM: Protein unfolding by peptidylarginine deiminase. Journal of Biological Chemistry 1996, 271(48):30709-30716. 19.Stensland ME, Pollmann S, Molberg O, Sollid LM, Fleckenstein B: Primary sequence, together with other factors, influence peptide deimination by peptidylarginine deiminase-4. Biological chemistry 2009, 390(2):99-107. 20.Atchley WR, Zhao J, Fernandes AD, Drüke T: Solving the protein sequence metric problem. Proceedings of the National Academy of Sciences of the United States of America 2005, 102(18):6395. 21.Venkatarajan MS, Braun W: New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical–chemical properties. Journal of Molecular Modeling 2001, 7(12):445-453. 22.McGuffin LJ, Bryson K, Jones DT: The PSIPRED protein structure prediction server. Bioinformatics 2000, 16(4):404-405. 23.Ward JJ, McGuffin LJ, Bryson K, Buxton BF, Jones DT: The DISOPRED server for the prediction of protein disorder. Bioinformatics 2004, 20(13):2138-2139. 24.Petersen B, Petersen TN, Andersen P, Nielsen M, Lundegaard C: A generic method for assignment of reliability scores applied to solvent accessibility predictions. BMC Structural Biology 2009, 9(1):51. 25.Chang CC, Lin CJ: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2011, 2(3):27. 26.Guo Q, Fast W: Citrullination of Inhibitor of Growth 4 (ING4) by Peptidylarginine Deminase 4 (PAD4) Disrupts the Interaction between ING4 and p53. Journal of Biological Chemistry 2011, 286(19):17069. 27.Lee YH, Coonrod SA, Kraus WL, Jelinek MA, Stallcup MR: Regulation of coactivator complex assembly and function by protein arginine methylation and demethylimination. Proceedings of the National Academy of Sciences of the United States of America 2005, 102(10):3611.
摘要: 蛋白質瓜胺酸化藉由胜肽精胺酸脫亞胺酶將受質蛋白上的精胺酸轉變成不帶電的瓜胺酸,瓜胺酸化與類風濕性關節炎、多發性硬化症和阿茲海默症等疾病有關。由於受限於目前的資料量,現今尚未有針對瓜胺酸化的相關工具提供使用。因此,本研究發展一套針對資料量不足情況下,使用機器學習建構預測模型之實驗方法,並用於發展蛋白質瓜安酸化位置預測工具。從8種特徵;文獻收錄之催化規則、序列相似度、演化保留訊息、胺基酸理化和生化特性、二級結構、蛋白質不穩定結構及結構表面可接觸性中挑選適合之特徵做為編碼,並打破以往挑選最佳預測模型方式,以模型學習過程及特徵重要性做為評估方式。最終挑選之預測模型對於Histone蛋白質準確度達99%、BMP蛋白質79%、CXL蛋白質66%及其他蛋白質68%,並以目前已知參與調控路徑之胜肽精胺酸脫亞胺酶受質再次驗證,準確度達88%,最後並與生物實驗室合作將未知蛋白做預測,經實驗結果證實準確度達90%。
Protein citrullination is catalyzed by peptidylarginine deiminase (PAD), which converts charged arginine residues to neutral citrulline residues in protein. Abnormal citrullination has been shown to play a role in rheumatoid arthritis, multiple sclerosis and Alzheimer’s disease. Due to limited amount of data, the development of citrullination tools is slow in the present. For this reason, a novel experimental approach for the small amounts of data to machine learning was designed in this study, and can be used for predicting citrullination sites. It is important to solve citrullination problem that feature selection from eight features such as rules, sequence similarity, evolutionary information, physicochemical and biochemical properties of amino acid, secondary structure, disorder and surface accessibility. Moreover, we evaluated models from learning process and importance of features, different from previous study. Finally the prediction model can achieve up to Histone accuracy for 99%, BMP accuracy for 79%, CXL accuracy for 66% and other protein accuracy for 68%. A case study demonstrated that the participated in regulatory pathways of peptidylarginine deiminase substrate can reach 88% in accuracy. Furthermore, we predicted and verified an unknown protein in cooperation with a biological laboratory, and the accuracy can be as high as 90%.
其他識別: U0005-2508201204262900
Appears in Collections:基因體暨生物資訊學研究所



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.