Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/60850
標題: 使用機器學習與特徵選擇預測明膠酶的受質切位
Gelatinase Substrate Cleavage Sites Prediction Using Machine Learning and Feature Selection
作者: 張浩禎
Chang, Hao-Chen
關鍵字: 明膠蛋白酶基質金屬蛋白酶-2
gelatinase
基質金屬蛋白酶-9
支持向量機
群間差異
GelCut
matrix metalloproteinase-2
matrix metalloproteinase-9
support vector machine
fold change
GelCut
出版社: 基因體暨生物資訊學研究所
引用: 1. Doucet A, Butler GS, Rodriguez D, Prudova A, Overall CM: Metadegradomics. Molecular & Cellular Proteomics 2008, 7(10):1925-1951. 2. Puente XS, Sanchez LM, Overall CM, Lopez-Otin C: Human and mouse proteases: a comparative genomic approach. Nature Reviews Genetics 2003, 4(7):544-558. 3. Overall CM, Blobel CP: In search of partners: linking extracellular proteases to substrates. Nature Reviews Molecular Cell Biology 2007, 8(3):245-257. 4. Lopez-Otin C, Overall CM: Protease degradomics: a new challenge for proteomics. Nature Reviews Molecular Cell Biology 2002, 3(7):509-519. 5. Kessenbrock K, Plaks V, Werb Z: Matrix metalloproteinases: regulators of the tumor microenvironment. Cell 2010, 141(1):52-67. 6. Coussens LM, Fingleton B, Matrisian LM: Matrix metalloproteinase inhibitors and cancer—trials and tribulations. Science 2002, 295(5564):2387-2392. 7. Turk B: Targeting proteases: successes, failures and future prospects. Nature reviews Drug discovery 2006, 5(9):785-799. 8. Drag M, Salvesen GS: Emerging principles in protease-based drug discovery. Nature reviews Drug discovery 2010, 9(9):690-701. 9. Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M: AAindex: amino acid index database, progress report 2008. Nucleic acids research 2008, 36(suppl 1):D202-D205. 10. Roy R, Yang J, Moses MA: Matrix Metalloproteinases As Novel Biomarker s and Potential Therapeutic Targets in Human Cancer. Journal of Clinical Oncology 2009, 27(31):5287-5297. 11. Turpeenniemi-Hujanen T: Gelatinases (MMP-2 and-9) and their natural inhibitors as prognostic indicators in solid cancers. Biochimie 2005, 87(3):287-297. 12. Bauvois B: New facets of matrix metalloproteinases MMP-2 and MMP-9 as cell surface transducers: Outside-in signaling and relationship to tumor progression. Biochimica et Biophysica Acta (BBA)-Reviews on Cancer 2011. 13. Prudova A, Auf Dem Keller U, Butler GS, Overall CM: Multiplex N-terminome analysis of MMP-2 and MMP-9 substrate degradomes by iTRAQ-TAILS quantitative proteomics. Molecular & Cellular Proteomics 2010, 9(5):894-911. 14. Sela-Passwell N, Rosenblum G, Shoham T, Sagi I: Structural and functional bases for allosteric control of MMP activities: Can it pave the path for selective inhibition? Biochimica et Biophysica Acta (BBA)-Molecular Cell Research 2010, 1803(1):29-38. 15. Timmer JC, Zhu W, Pop C, Regan T, Snipas SJ, Eroshkin AM, Riedl SJ, Salvesen GS: Structural and kinetic determinants of protease substrates. Nature structural & molecular biology 2009, 16(10):1101-1108. 16. SONG J, TAN H, BOYD SE, SHEN H, MAHMOOD K, WEBB GI, AKUTSU T, WHISSTOCK JC, PIKE RN: Bioinformatic approaches for predicting substrates of proteases. Journal of bioinformatics and computational biology 2011, 9(1):149. 17. Liu Z, Cao J, Gao X, Ma Q, Ren J, Xue Y: GPS-CCD: a novel computational program for the prediction of calpain cleavage sites. PLoS One 2011, 6(4):e19001. 18. Garay-Malpartida H, Occhiucci J, Alves J, Belizario J: CaSPredictor: a new computer-based tool for caspase substrate prediction. Bioinformatics 2005, 21(suppl 1):i169-i176. 19. Boyd SE, de la Banda MG, Pike RN, Whisstock JC, Rudy GB: PoPS: a computational tool for modeling and predicting protease specificity. In: 2004. IEEE: 372-381. 20. Verspurten J, Gevaert K, Declercq W, Vandenabeele P: SitePredicting the cleavage of proteinase substrates. Trends in biochemical sciences 2009, 34(7):319-323. 21. Piippo M, Lietzen N, Nevalainen OS, Salmi J, Nyman TA: Pripper: prediction of caspase cleavage sites from whole proteomes. BMC Bioinformatics 2010, 11(1):320. 22. Song J, Tan H, Shen H, Mahmood K, Boyd SE, Webb GI, Akutsu T, Whisstock JC: Cascleave: towards more Accurate prediction of caspase substrate cleavage sites. Bioinformatics 2010, 26(6):752-760. 23. Barkan DT, Hostetter DR, Mahrus S, Pieper U, Wells JA, Craik CS, Sali A: Prediction of protease substrates using sequence and structure features. Bioinformatics 2010, 26(14):1714-1722. 24. Rawlings ND, Barrett AJ, Bateman A: MEROPS: the database of proteolytic enzymes, their substrates and inhibitors. Nucleic acids research 2012, 40(D1):D343-D350. 25. Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006, 22(13):1658-1659. 26. Witten IH, Frank E: Data Mining: Practical Machine Learing Tools and and Techniques. 2005. 27. Colaert N, Helsens K, Martens L, Vandekerckhove J, Gevaert K: Improved visualization of protein consensus sequences by iceLogo. Nature methods 2009, 6(11):786-787. 28. Ward J, Sodhi J, McGuffin L, Buxton B, Jones D: Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. Journal of molecular biology 2004, 337(3):635-645. 29. Bent P, Thomas P, Pernille A, Morten N, Claus L: A generic method for assignment of reliability scores applied to solvent Accessibility predictions. BMC Structural Biology 2009, 9. 30. Venkatarajan MS, Braun W: New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical–chemical properties. Journal of Molecular Modeling 2001, 7(12):445-453. 31. Atchley WR, Zhao J, Fernandes AD, Druke T: Solving the protein sequence metric problem. Proceedings of the National Academy of Sciences of the United States of America 2005, 102(18):6395. 32. Roher A, Kasunic T, Woods A, Cotter R, Ball M, Fridman R: Proteolysis of A [beta] Peptide from Alzheimer Disease Brain by Gelatinase A. Biochemical and biophysical research communications 1994, 205(3):1755-1761. 33. Fosang AJ, Neame P, Last K, Hardingham T, Murphy G, Hamilton J: The interglobular domain of cartilage aggrecan is cleaved by PUMP, gelatinases, and cathepsin B. Journal of Biological Chemistry 1992, 267(27):19470-19474. 34. Imai K, Hiramatsu A, Fukushima D, Pierschbacher MD, Okada Y: Degradation of decorin by matrix metalloproteinases: identification of the cleavage sites, kinetic analyses and transforming growth factor-beta1 release. Biochemical Journal 1997, 322(Pt 3):809. 35. Zhen EY, Brittain IJ, Laska DA, Mitchell PG, Sumer EU, Karsdal MA, Duffin KL: Characterization of metalloprotease cleavage products of human articular cartilage. Arthritis & Rheumatism 2008, 58(8):2420-2431. 36. Shiryaev SA, Savinov AY, Cieplak P, Ratnikov BI, Motamedchaboki K, Smith JW, Strongin AY: Matrix metalloproteinase proteolysis of the myelin basic protein isoforms is a source of immunogenic peptides in autoimmune multiple sclerosis. PLoS One 2009, 4(3):e4952. 37. Yan P, Hu X, Song H, Yin K, Bateman RJ, Cirrito JR, Xiao Q, Hsu FF, Turk JW, Xu J: Matrix metalloproteinase-9 degrades amyloid-β fibrils in vitro and compact plaques in situ. Journal of Biological Chemistry 2006, 281(34):24566-24574. 38. Descamps FJ, Van den Steen PE, Martens E, Ballaux F, Geboes K, Opdenakker G: Gelatinase B is diabetogenic in acute and chronic pancreatitis by cleaving insulin. The FASEB journal 2003, 17(8):887-889. 39. Levin J, Giese A, Boetzel K, Israel L, Hogen T, Nubling G, Kretzschmar H, Lorenzl S: Increased [alpha]-synuclein aggregation following limited cleavage by certain matrix metalloproteinases. Experimental neurology 2009, 215(1):201-208.
摘要: 明膠蛋白酶有明膠蛋白酶A (基質金屬蛋白酶-2) 和明膠蛋白酶B (基質金屬蛋白酶-9)兩型,具有裂解細胞外基質的活性。近來研究指出基質金屬蛋白酶家族在生理和病理機制上,有多種調控,如免疫反應, 腫瘤發展和幹細胞分化等。基質金屬蛋白酶-2, 基質金屬蛋白酶-9會引發腫瘤轉移,其抑制藥物也進入臨床試驗,成功抑制腫瘤轉移但病人的存活率卻沒有提升,前人探究其可能的原因為基質金屬蛋白酶家族同源性高,結構相似,使得抑制藥物專一性不高;以及對基質金屬蛋白酶-2, 基質金屬蛋白酶-9受質的調控路徑尚未全面了解。因此,若能 準確預測基質金屬蛋白酶之作用位置,有益於對其作用機制之探討。本實驗建立預測系統,GelCut,其架構為整合二進制, 物理化學屬性, 蛋白質不穩定區段和溶劑可接觸性與二級結構等不同類別的資訊, 加上我們首先採用的群間差異特徵,配合支持向量機的使用,建立第一層預測模型;接著將其輸出的預測機率做為第二層系統之特徵,同時比較多種機器學習方法來建構第二層的預測系統,基質金屬蛋白酶-2及基質金屬蛋白酶-9受質的預測效能分別之馬修斯相關係數為89.4%和64.4%。進一步利用物理化學屬性之特徵選擇,對基質金屬蛋白酶-2與基質金屬蛋白酶-9在活性中心的屬性進行分析,以提供藥物設計時參考。最後,以基質金屬蛋白酶引起疾病之相關蛋白質進行預測並與SitePrediction網站工具比較,GelCut的預測結果比SitePrediction的預測準確度高約14%。預測系統建立完成後,有助於發現基質金屬蛋白酶-2與基質金屬蛋白酶-9新的可能受質,以推估其未發現的調控路徑。
There are two types in gelatinase family: Gelatinase A (MMP-2) and gelatinase B (MMP-9), which degrade extracelluar matrix. Recent studies have pointed out that the MMPs in physiological and pathological mechanisms have a variety of regulations, e.g. the immune response, tumor development and stem cell differentiation. MMP-2 and MMP-9 regulated tumor metastasis. The inhibiting drugs in clinical trials are successful in suppression of tumor metastasis; however the survival rate of patients is not improved. Scientists found the possible reasons can be 1) MMPs have high homology and similar structure, which make the specificity of inhibitors not high. 2) The substrates of MMP-2 and MMP-9 in regulatory pathways are not complete yet. The architecture of our “GelCut” prediction system has two layers. First layer builds 4 models by SVM and four types feature of binary, physical-chemical property, disorder, and solvent Accessibility and secondary structures. In particular, fold change characteristics used in this experiment. And our models compared a variety of machine learning methods to construct the second layer of prediction system. The performance of MMP-2 substrate prediction is 0.894 in MCC. MMP-9 substrate prediction is 0.644. In this study, the feature selection of physical-chemical property shown the active sites of MMP-2 and MMP-9 are different. The information is available for drug design reference. Comparing with SitePrediction, the GelCut Accuracy is 13% higher than SitePrediction. Our MMP-2 and MMP-9 substrate prediction system, GelCut, will provide biologists to information find new substrates. New possible substrates could estimate the undiscovered regulatory pathways.
URI: http://hdl.handle.net/11455/60850
其他識別: U0005-2508201217594000
文章連結: http://www.airitilibrary.com/Publication/alDetailedMesh1?DocID=U0005-2508201217594000
Appears in Collections:基因體暨生物資訊學研究所

文件中的檔案:

取得全文請前往華藝線上圖書館



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.