Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/35820
標題: 使用區塊組成和功能域資訊預測寡聚體蛋白質四級結構
QuaBingo: Quaternary Structure Prediction by the Composition of Block and Functional Domain for Oligomeric Proteins
作者: 郭任超
Guo, Ren-Chao
關鍵字: 蛋白質四級結構分類
Protein quaternary structure classification
寡聚體
區塊組成
功能域組成
oligomer
block composition
functional domain composition
出版社: 生物科技學研究所
引用: [1] H. Sund, K. Weber, The quaternary structure of proteins, Angew. Chem. Int. Ed.5 1966, 231-245. [2] Gregory A. Petsko,Dagmar Ringe, Protein Structure and Function ,2004,40-47. [3] Goodsell, D.S.; Olson, A.J. Structural Symmetry and Protein Function. Annu. Rev. Biophys. Biomol. Struct. 2000, 29, 105-153. [4] Levy ED, Boeri Erba E, Robinson CV, Teichmann SA: Assembly reflects evolution of protein complexes. Nature 2008, 453:1262-1265. [5] Miller S, Lesk AM, Janin J, Chothia C. The accessible surfacearea. and stability of oligomeric proteins. Nature 1987;328:834–836. [6] Larsen TA, Olson AJ, Goodsell DS: Morphology of protein-protein interfaces. Structure 1998, 6:421-427. [7] Daniel L. Minor, An Overview of Ion Channel Structure. Oxford: Academic Press, 2009, pp. 201-207. [8] Edelstein, SJ. Cooperative interactions of hemoglobin. Annu Rev Biochem 44 1975: 209–232. [9] Evans PR. Structural aspects of allostery. Curr. Opin. Struct. Biol. 1991,1:773–79. [10] R. Garian, Prediction of quaternary structure from primary structure, Bioinformatics 17 (2001) 551556. [11] S.W. Zhang, Q. Pan, H.C. Zhang, Y.L. Zhang, H.Y. Wang, Classification of protein quaternary structure with support vector machine, Bioinformatics 19 (2003) 23902396. [12] Chou KC, Cai YD, Predicting protein quaternary structure by pseudo amino acid composition. Proteins Struct Func Gene 53: 282–289 (2003b) [13] Chou KC. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Structure, Function, and Genetics, 2001,43:246~255 [14] Du, Pufeng, et al. PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou’s pseudo-amino acid compositions. Analytical biochemistry (2012). [15] Yu, Xiaojing, Chuan Wang, and Yixue Li. Classification of protein quaternary structure by functional domain composition. BMC bioinformatics 7.1 (2006): 187. [16] Shen, Hong-Bin, and Kuo-Chen Chou. QuatIdent: A web server for identifying protein quaternary structural attribute by fusing functional domain and sequential evolution information. Journal of proteome research 8.3 (2009): 1577-1584. [17] Levy ED, Pereira-Leal JB, Chothia C, Teichmann SA, 3D complex: a structural classification of protein complexes. PLoS Comput Biol. 2006 Nov 17;2(11):e155. Epub 2006 Oct 5 [18] Henikoff, Steven, and Jorja G. Henikoff. Automated assembly of protein blocks for database searching. Nucleic Acids Research 19.23 (1991): 6565-6572. [19] Henikoff, Steven, Jorja G. Henikoff, and Shmuel Pietrokovski. Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations. Bioinformatics 15.6 (1999): 471-479. [20] Henikoff, Jorja G., et al. Increased coverage of protein families with the blocks database servers. Nucleic acids research 28.1 (2000): 228-230. [21] Marchler-Bauer,A., Panchenko,AR, Shoemaker,BA, Thiessen,PA, Geer,LY and Bryant,SH (2002) CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res., 30, 281–283 [22] Petersen B, Petersen TN, Andersen P, Nielsen M, Lundegaard C. A generic method for assignment of reliability scores applied to solvent accessibility predictions. BMC Struct Biol 2009;31:9-51. [23] Cortes, Corinna; and Vapnik, Vladimir N.; Support-Vector Networks, Machine Learning, 20, 1995. [24] Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm [25] Chen, Yi-Wei, and Chih-Jen Lin. Combining SVMs with various feature selection strategies. Feature Extraction. Springer Berlin Heidelberg, 2006. 315-324. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/fselect/fselect.py [26] Witten IH, Frank E: Data Mining: Practical Machine Learing Tools and and Techniques. 2005. [27] Li, Weizhong, and Adam Godzik. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22.13 (2006): 1658-1659. [28] Tuncbag, Nurcan, Attila Gursoy, and Ozlem Keskin. Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy. Bioinformatics 25.12 (2009): 1513-1520. [29] Ofran, Yanay, and Burkhard Rost. Protein–protein interaction hotspots carved into sequences. PLoS computational biology 3.7 (2007): e119. [30] Venier, Paola, et al. Insights into the innate immunity of the Mediterranean mussel Mytilus galloprovincialis. BMC genomics 12.1 (2011): 69. [31] Shapiro, Lawrence, and Philipp E. Scherer. The crystal structure of a complement-1q family protein suggests an evolutionary link to tumor necrosis factor. Current Biology 8.6 (1998): 335-340.
摘要: 蛋白質四級結構與基因調控和訊號傳遞等許多生物功能有緊密相關性,使得蛋白質具有形態機能多樣性,能執行變構效應及更複雜的功能。本研究提出block composition特徵編碼方法,以保留性motif序列區塊為基礎,不僅包含了保留性殘基的特定序列模式,而且包含功能和結構組成訊息,有助於發掘隱藏的蛋白質聚合機制;同時再結合functional domain composition資訊,發展出「QuaBingo」預測同源和異源寡聚體四級結構類型。QuaBingo由三層分類器建構而成。第一層使用SVM作為分類器,根據蛋白質序列中的區塊、功能域組成進行特徵編碼。為了改善第一層的方法,使用第二層SVM整合第一層的輸出結果,然後送入第三層由random forest所建立的分類器中,決定蛋白質所屬之17種四級結構類型。十倍交叉驗證結果顯示, block composition之單體、同源和異源寡聚體整體準確率優於functional domain composition和solvent accessibility之胺基酸組成方法。特徵方法的整合結果,大部分的類別獲得1%~6%的提升。在功能性分析的部份,QuaBingo對於酵素、基因調控、訊息傳遞等蛋白質擁有較佳的預測能力。
Quaternary structures of proteins are closely relevant to gene regulation, signal transduction and many other biological functions of proteins. Protein oligomers are usually with symmetry and morphological diversity can perform allosteric effects and more complex functions. In this study, a method based on protein conserved motif composition in blocks format for feature extraction was proposed. The method is called block composition which contains not only information of specific sequence patterns but also function and structural features, it is useful to explore the hidden mechanisms of protein aggregation. In this paper, we developed the quaternary structure attribute prediction system called "QuaBingo" which also combined with functional domain composition information. QuaBingo constructed by three-layer classifiers that can identify monomer, homo- and hetero-oligomer quaternary structure. The building of first layer classifiers using SVM based on blocks and functional domains of proteins. In order to improve the performance of the first layer, thus the second layer SVM was used to process the outputs of the first layer. Finally, the results were determined by random forest of the third layer. 10-fold cross-validation results show that the block composition obtains better results for monomer, homo- and hetero-oligomers than functional domain composition and the pseudo amino acid composition method based on solvent accessibility. Additionally, results of the integration method of characteristics that indicates the accuracy of most quaternary structure classifications reached between 1% and 6% improvement. In functional analysis, QuaBingo has better predictive ability for enzymes, gene regulation and signal transduction, might be useful for related protein research.
URI: http://hdl.handle.net/11455/35820
其他識別: U0005-2808201301240900
文章連結: http://www.airitilibrary.com/Publication/alDetailedMesh1?DocID=U0005-2808201301240900
Appears in Collections:生物科技學研究所

文件中的檔案:

取得全文請前往華藝線上圖書館

Show full item record
 
TAIR Related Article
 
Citations:


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.