Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/92370
標題: iMADS 2.0:建構延伸型機器學習方法改進iMADS在蘭科之MADS-box基因預測
iMADS 2.0: an extended iMADS for MADS-box gene classification in orchid by machine learning approach
作者: Kuan-Chun Chen
陳冠群
關鍵字: MADS-box transcription factor
ABCDE model
Phylogenetic tree
SVMs
iMADS
MADS-box轉錄因子
ABCDE模型
演化樹
支持向量機
iMADS
引用: 1. Su CL, Chen WC, Lee AY, Chen CY, Chang YC, Chao YT, Shih MC: A modified ABCDE model of flowering in orchids based on gene expression profiling studies of the moth orchid Phalaenopsis aphrodite. PLoS One 2013, 8(11):e80462. 2. Theissen G, Becker A, Di Rosa A, Kanno A, Kim JT, Munster T, Winter KU, Saedler H: A short history of MADS-box genes in plants. Plant Mol Biol 2000, 42(1):115-149. 3. Egea-Cortines M, Saedler H, Sommer H: Ternary complex formation between the MADS-box proteins SQUAMOSA, DEFICIENS and GLOBOSA is involved in the control of floral architecture in Antirrhinum majus. EMBO J 1999, 18(19):5370-5379. 4. Yang Y, Fanning L, Jack T: The K domain mediates heterodimerization of the Arabidopsis floral organ identity proteins, APETALA3 and PISTILLATA. Plant J 2003, 33(1):47-59. 5. van Dijk AD, Morabito G, Fiers M, van Ham RC, Angenent GC, Immink RG: Sequence motifs in MADS transcription factors responsible for specificity and diversification of protein-protein interaction. PLoS Comput Biol 2010, 6(11):e1001017. 6. Masiero S, Imbriano C, Ravasio F, Favaro R, Pelucchi N, Gorla MS, Mantovani R, Colombo L, Kater MM: Ternary complex formation between MADS-box transcription factors and the histone fold protein NF-YB. J Biol Chem 2002, 277(29):26429-26435. 7. Parenicova L, de Folter S, Kieffer M, Horner DS, Favalli C, Busscher J, Cook HE, Ingram RM, Kater MM, Davies B et al: Molecular and phylogenetic analyses of the complete MADS-box transcription factor family in Arabidopsis: New openings to the MADS world. Plant Cell 2003, 15(7):1538-1551. 8. Kaufmann K, Melzer R, Theissen G: MIKC-type MADS-domain proteins: structural modularity, protein interactions and network evolution in land plants. Gene 2005, 347(2):183-198. 9. Kater MM, Dreni L, Colombo L: Functional conservation of MADS-box factors controlling floral organ identity in rice and Arabidopsis. J Exp Bot 2006, 57(13):3433-3444. 10. Bowman JL, Smyth DR, Meyerowitz EM: Genes Directing Flower Development in Arabidopsis. Plant Cell 1989, 1(1):37-52. 11. Pelaz S, Ditta GS, Baumann E, Wisman E, Yanofsky MF: B and C floral organ identity functions require SEPALLATA MADS-box genes. Nature 2000, 405(6783):200-203. 12. Liljegren SJ, Ditta GS, Eshed Y, Savidge B, Bowman JL, Yanofsky MF: SHATTERPROOF MADS-box genes control seed dispersal in Arabidopsis. Nature 2000, 404(6779):766-770. 13. Soltis DE, Chanderbali AS, Kim S, Buzgo M, Soltis PS: The ABC model and its applicability to basal angiosperms. Ann Bot 2007, 100(2):155-163. 14. Litt A, Kramer EM: The ABC model and the diversification of floral organ identity. Semin Cell Dev Biol 2010, 21(1):129-137. 15. Bowman JL, Smyth DR, Meyerowitz EM: The ABC model of flower development: then and now. Development 2012, 139(22):4095-4098. 16. Masiero S, Colombo L, Grini PE, Schnittger A, Kater MM: The emerging importance of type I MADS box transcription factors for plant reproduction. Plant Cell 2011, 23(3):865-872. 17. Fornara F, Parenicova L, Falasca G, Pelucchi N, Masiero S, Ciannamea S, Lopez-Dee Z, Altamura MM, Colombo L, Kater MM: Functional characterization of OsMADS18, a member of the AP1/SQUA subfamily of MADS box genes. Plant Physiol 2004, 135(4):2207-2219. 18. Sundstrom J, Engstrom P: Conifer reproductive development involves B-type MADS-box genes with distinct and different activities in male organ primordia. Plant J 2002, 31(2):161-169. 19. Sather DN, York A, Pobursky KJ, Golenberg EM: Sequence evolution and sex-specific expression patterns of the C class floral identity gene, SpAGAMOUS, in dioecious Spinacia oleracea L. Planta 2005, 222(2):284-292. 20. Ainsworth C, Crossley S, Buchananwollaston V, Thangavelu M, Parker J: Male and Female Flowers of the Dioecious Plant Sorrel Show Different Patterns of Mads Box Gene-Expression. Plant Cell 1995, 7(10):1583-1598. 21. Brunner AM, Rottmann WH, Sheppard LA, Krutovskii K, DiFazio SP, Leonardi S, Strauss SH: Structure and expression of duplicate AGAMOUS orthologues in poplar. Plant Mol Biol 2000, 44(5):619-634. 22. Yanofsky MF, Ma H, Bowman JL, Drews GN, Feldmann KA, Meyerowitz EM: The protein encoded by the Arabidopsis homeotic gene agamous resembles transcription factors. Nature 1990, 346(6279):35-39. 23. Hsu HF, Hsieh WP, Chen MK, Chang YY, Yang CH: C/D Class MADS Box Genes from Two Monocots, Orchid (Oncidium Gower Ramsey) and Lily (Lilium longiflorum), Exhibit Different Effects on Floral Transition and Formation in Arabidopsis thaliana. Plant Cell Physiol 2010, 51(6):1029-1045. 24. Liljegren SJ, Ditta GS, Eshed HY, Savidge B, Bowman JL, Yanofsky MF: SHATTERPROOF MADS-box genes control seed dispersal in Arabidopsis. Nature 2000, 404(6779):766-770. 25. Kramer EM, Jaramillo MA, Di Stilio VS: Patterns of gene duplication and functional evolution during the diversification of the AGAMOUS subfamily of MADS box genes in angiosperms. Genetics 2004, 166(2):1011-1023. 26. Huala E, Dickerman AW, Garcia-Hernandez M, Weems D, Reiser L, LaFond F, Hanley D, Kiphart D, Zhuang M, Huang W et al: The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucleic Acids Res 2001, 29(1):102-105. 27. Yuan Q, Ouyang S, Liu J, Suh B, Cheung F, Sultana R, Lee D, Quackenbush J, Buell CR: The TIGR rice genome annotation resource: annotating the rice genome and creating resources for plant biologists. Nucleic Acids Res 2003, 31(1):229-233. 28. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S: MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 2011, 28(10):2731-2739. 29. Gan Y, Filleur S, Rahman A, Gotensparre S, Forde BG: Nutritional regulation of ANR1 and other root-expressed MADS-box genes in Arabidopsis thaliana. Planta 2005, 222(4):730-742. 30. Moon J, Suh SS, Lee H, Choi KR, Hong CB, Paek NC, Kim SG, Lee I: The SOC1 MADS-box gene integrates vernalization and gibberellin signals for flowering in Arabidopsis. Plant J 2003, 35(5):613-623. 31. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic Local Alignment Search Tool. J Mol Biol 1990, 215(3):403-410. 32. Hess JF, Casselman JT, Kong AP, FitzGerald PG: Primary sequence, secondary structure, gene structure, and assembly properties suggests that the lens-specific cytoskeletal protein filensin represents a novel class of intermediate filament protein. Exp Eye Res 1998, 66(5):625-644. 33. Lupas A, Van Dyke M, Stock J: Predicting coiled coils from protein sequences. Science 1991, 252(5009):1162-1164. 34. Wang L, Brown SJ: BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res 2006, 34(Web Server issue):W243-248. 35. Fan RE, Chen PH, Lin CJ: Working set selection using second order information for training support vector machines. J Mach Learn Res 2005, 6:1889-1918. 36. Bailey TL, Elkan C: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 1994, 2:28-36. 37. Gupta S, Stamatoyannopoulos JA, Bailey TL, Noble WS: Quantifying similarity between motifs. Genome Biol 2007, 8(2). 38. Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res 2004, 32(Database issue):D91-94. 39. Muino JM, Smaczniak C, Angenent GC, Kaufmann K, van Dijk AD: Structural determinants of DNA recognition by plant MADS-domain transcription factors. Nucleic Acids Res 2014, 42(4):2138-2146. 40. Kawashima S, Ogata H, Kanehisa M: AAindex: Amino Acid Index Database. Nucleic Acids Res 1999, 27(1):368-369. 41. Venkatarajan MS, Braun W: New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical-chemical properties. J Mol Model 2001, 7(12):445-453. 42. Atchley WR, Zhao J, Fernandes AD, Druke T: Solving the protein sequence metric problem. Proc Natl Acad Sci U S A 2005, 102(18):6395-6400. 43. Winter D, Vinegar B, Nahal H, Ammar R, Wilson GV, Provart NJ: An 'Electronic Fluorescent Pictograph' Browser for Exploring and Analyzing Large-Scale Biological Data Sets. Plos One 2007, 2(8). 44. Su CL, Chao YT, Yen SH, Chen CY, Chen WC, Chang YCA, Shih MC: Orchidstra: An Integrated Orchid Functional Genomics Database. Plant Cell Physiol 2013, 54(2):E11-+. 45. Yang C-H, Chang Y-Y, Chen C-W, Chan M-C, Chu Y-W: iMADS: The Class Identification of MADS-Box Gene on Angiosperm. Advanced Science Letters 2012, 18(1):170-175. 46. Hosoda K, Imamura A, Katoh E, Hatta T, Tachiki M, Yamada H, Mizuno T, Yamazaki T: Molecular structure of the GARP family of plant Myb-related DNA binding motifs of the Arabidopsis response regulators. Plant Cell 2002, 14(9):2015-2029. 47. Li G, Siddiqui H, Teng Y, Lin R, Wan XY, Li J, Lau OS, Ouyang X, Dai M, Wan J et al: Coordinated transcriptional regulation underlying the circadian clock in Arabidopsis. Nat Cell Biol 2011, 13(5):616-622. 48. Sreekantan L, Torregrosa L, Fernandez L, Thomas MR: VvMADS9, a class B MADS-box gene involved in grapevine flowering, shows different expression patterns in mutants with abnormal petal and stamen structures. Funct Plant Biol 2006, 33(9):877-886. 49. Mack DP, Sluka JP, Shin JA, Griffin JH, Simon MI, Dervan PB: Orientation of the Putative Recognition Helix in the DNA-Binding Domain of Hin Recombinase Complexed with the Hix Site. Biochemistry-Us 1990, 29(28):6561-6567. 50. Wellmer F, Riechmann JL, Alves-Ferreira M, Meyerowitz EM: Genome-wide analysis of spatial gene expression in Arabidopsis flowers. Plant Cell 2004, 16(5):1314-1326. 51. Yamaguchi T, Hirano HY: Function and diversification of MADS-box genes in rice. ScientificWorldJournal 2006, 6:1923-1932. 52. Xu Y, Teo LL, Zhou J, Kumar PP, Yu H: Floral organ identity genes in the orchid Dendrobium crumenatum. Plant J 2006, 46(1):54-68. 53. Hsu HF, Huang CH, Chou LT, Yang CH: Ectopic expression of an orchid (Oncidium Gower Ramsey) AGL6-like gene promotes flowering by activating flowering time genes in Arabidopsis thaliana. Plant Cell Physiol 2003, 44(8):783-794. 54. Chang YY, Chiu YF, Wu JW, Yang CH: Four orchid (Oncidium Gower Ramsey) AP1/AGL9-like MADS box genes show novel expression patterns and cause different effects on floral transition and formation in Arabidopsis thaliana. Plant Cell Physiol 2009, 50(8):1425-1438. 55. Chang YY, Kao NH, Li JY, Hsu WH, Liang YL, Wu JW, Yang CH: Characterization of the possible roles for B class MADS box genes in regulation of perianth formation in orchid. Plant Physiol 2010, 152(2):837-853. 56. Dekker J, Rippe K, Dekker M, Kleckner N: Capturing chromosome conformation. Science 2002, 295(5558):1306-1311.
摘要: Plant MIKC-type MADS-box transcription factors play an important role in controlling floral organ development. ABCDE model is an essential model to describe how angiosperm MADS-box genes regulate floral organ identity. Phylogenetic tree is the most common method for gene classification. However, in the previous study we find that when phylogenetic tree faces to massive, multi-species or incomplete sequences, it might lead to waste of time and error classification. NCB lab developed a web-based tool for angiosperm MADS-box gene classification by machine learning method, iMADS. However, the training dataset of the system was old, and didn't treat with appropriate filtration. On the other hand, the five-class ABCDE model also cannot have a better description for the species which contain unique floral organ. In this study, we use phylogenetic analysis to group data by unsupervised clustering. The error classification data are modified by literatures. All of the genes which specifically express on the floral organ will be select as training dataset. The training model is constructed by two-stage. In addition to trying various features, we also extended five-class ABCDE model to eight-class, and constructed multiple prediction models, iMADS2.0 according to MADS-box gene domain characteristic by support vector machines. From the resulst, BLAST can get the best accuracy than other features of BindN and COILS. The datasets from the independent and the error classification of phylogenetic tree are submitted to prediction model for performance evaluation. The results showed that it could not only upgrade the prediction accuracy but correct every sequence to proper class. Finally, we used bioinformatics tools to discuss the relationship between physiochemical property of C-terminal domain and the regulation mechanism of transcription activation region. iMADS2.0 provides MADS-box gene predicted classification, other most similar predicted sequences and visualized expression patterns according to the region by user input. The web-based tool is freely available at http://predictor.nchu.edu.tw/iMADS2.
植物的MIKC-type MADS-box轉錄因子對於調控花器的生長扮演重要的角色,ABCDE模型即是用來描述MADS-box基因如何調控被子植物花器發育重要的基礎模型。通常在進行基因的分類主要是利用演化樹的方法,然而我們發現當演化樹處理大量、多物種或是不完整的序列時容易造成費時、分類模糊,並進而提高人為判斷錯誤率。過去本實驗室藉由機器學習方法針對被子植物建立MADS-box基因的網頁分類系統-iMADS。然而,該系統由於使用較為老舊的資料,且在訓練資料處理上,沒有經過適當的篩選;另一方面,由於少部分的被子植物具有獨特花器,而該系統使用傳統五個類別的ABCDE模型亦無法適當的描述。本研究首先透過親緣分析針對所蒐集到的MADS-box基因資料進行非監督式分群,隨後藉由文獻來修正群組當中類別錯誤的基因,並選取專一性表現於花器的基因來作為訓練的資料。學習模型使用兩階段式建構,除了嘗試多種編碼外,同時也針對具有獨特花器的蘭科從傳統五個類別延伸至八個類別。隨後依據MADS-box基因區塊的特性,借助支持向量機來各別建構多個預測模型,期望達到最佳的分類效果。在結果中我們發現,和BindN以及COILS特徵相比,BLAST能達到最佳的準確率。而我們蒐集獨立的以及演化樹分類模糊的資料集來分別和iMADS以及演化樹的結果進行比較發現,除了準確率提高,還能精準地修正每條序列至正確的類別。最後,透過生物資訊的工具來探討在C端區塊之物化特性與轉錄調控間的關係。本系統可依據使用者輸入不同區塊的片段來進行MADS-box基因類別的預測、相近的序列資訊以及提供目前表現量資料較完整的物種其可能的表現位置樣式,建立整合型網站iMADS2.0提供使用者查詢。
URI: http://hdl.handle.net/11455/92370
其他識別: U0005-0408201514571400
文章公開時間: 2018-08-14
Appears in Collections:基因體暨生物資訊學研究所

文件中的檔案:

取得全文請前往華藝線上圖書館



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.