請用此 Handle URI 來引用此文件: http://hdl.handle.net/11455/19355
標題: 計算系統生物學之整合性生物資訊研究
The integrated bioinformatics study for computational systems biology
作者: 劉俊吉
Liu, Chun-Chi
關鍵字: systems biology
gene network
出版社: 資訊科學系所
摘要: 系統生物學(systems biology)是一門整合多方面生物系統的學科,涵蓋基因體(genomics)、基因轉錄體(transcriptomics)、蛋白質體(proteomics)、基因調控網路(gene regulatory networks)、生物途徑(pathways)等生物系統,並且以生物資訊(bioinformatics)為整合工具,進行系統性的研究,包含了資料分析與生物模型的建立。本論文探討了專一性寡核苷酸序列分析、基因微陣列(microarray)探針設計、基因微陣列資料分析、啟動子分析(promoter analysis)、基因體分析(genome context analysis)、轉錄因子(transcription factor)調控預測、微核醣核酸(microRNA)調控預測、生物途徑分析、基因調控網路之建構、以及複雜的轉錄因子與微核醣核酸的調控預測。整體架構分為三個部份:專一性寡核苷酸序列預測、發展基因微陣列資料的分析方法、以及整合廣泛多元的生物資料庫。 第一部分是研究基因組規模的專一性寡核苷酸序列預測,用於基因微陣列探針設計、引子設計(primer design)及siRNA設計。類神經網路(Artificial neural network)是一個重要的機器學習技術,可處理高度雜訊且複雜的龐大資料,本論文運用類神經網路技術將10-mer ~ 26-mer的專一性子序列(unique subsequence)密度作了最佳化的整合。因此我們提出一個創新的演算法,整合了類神經網路與BLAST (稱為IAB algorithm),用於預測專一性寡核苷酸序列。在執行的效能上,相較於用純粹的BLAST有5-7倍的速度提升。 第二部分是研究基因微陣列資料分析,其用於癌症分類(cancer classification)與生物途徑分析。結合生物晶片的癌症分類與生物途徑分析是目前功能基因體醫學中的前瞻趨勢,然而藉由基因網路連接分子分類技術與生物途徑分析的研究未見文獻報導。由於某一類別的基因網路在該類別內具有保守的拓撲性質,因此我們運用此性質來發展基因網路拓撲的相關技術。研究結果發現基因網路蘊含了癌症分類與生物途徑分析所需資訊,因此我們發展了一個「以基因網路為基礎的癌症分類技術與生物途徑分析技術」。其中也衍生出一個創新的基因網路建構技術,稱為次序結構網路。我們分別就癌症分類的精確性與穩定性、線性關係的侷限性、羃次律法則(power-law)的性質、網路建構的時間複雜度及文獻研究的探討,闡明了以次序結構為基礎的基因網路建構技術是一個優越的技術。 第三部分是整合廣泛多元的生物資料庫以及研究轉錄因子與微核醣核酸的複雜調控機制。在基因表達調節上,轉錄因子與微核醣核酸扮演著重要的角色,而同時結合此二者的觀念以研究基因表現的複雜調節機制是一個嶄新的研究領域。在研究中,我們建構一個廣泛型生物資訊網站 (CRSD),有效整合了單一基因(UniGene)、微核醣核酸、啟動子、轉錄因子、生物途徑、基因功能分類及基因體等資料庫。為了提供廣泛的基因微陣列資料分析,我們發展了創新的整合性分析架構,CRSD包含了微陣列生物晶片資料預先處理、統計分析、分群分析、生物意義增強分析(enrichment analysis)及功能序列(Motif)偵測。再者我們延伸並增強了CRSD的功能與架構,建立了一個植物的廣泛型生物資訊網站 (PlantCRSD),其包含了植物微核醣核酸的廣泛性註解,涵蓋了阿拉伯芥(Arabidopsis thaliana) 和水稻(Oryza sativa)兩物種。 本論文具備系統生物學的整合性架構,涵蓋了多個層面的生物資訊技術。基因微陣列探針設計,可以輔助生物實驗以取得大量資料。以基因網路拓撲為基礎的癌症分類與生物途徑分析技術,賦予基因網路新的應用,同時也提供了一個評估基因網路精確度的可能方法。這開創了一個嶄新的思維方向,有效的從人造基因網路中萃取出重要的資訊。生物系統具有高度複雜與多元性,廣泛多元的生物資料庫、序列分析與基因微陣列資料分析之整合是必然的趨勢。因此我們建立了廣泛型生物資訊網站,其為一個整合性系統生物學的研究發展平台,其具備了基因組規模的生物現象預測及友善的使用者介面,對於國際性的學術交流、知識庫的建立與相互驗證將提供具體的助益。
Systems biology is the study integrated various biological systems such as genomics, transcriptomics, proteomics, gene regulatory networks, and pathways, in which bioinformatics is an integrating tool to perform the systematic data analysis and the construction of biological models. This study contains specific oligonucleotide (oligo) identification, microarray probe design, microarray data analysis, promoter analysis, genome context analysis, transcription factor (TF) regulation prediction, microRNA (miRNA) regulation prediction, pathway analysis, gene network construction, complex regulation prediction of miRNAs and TFs, and so on. In this dissertation, the architecture consists of three major components as follows: (i) identification of specific oligos; (ii) development of the method in microarray data analysis; and (iii) integration of comprehensive biological databases. The first component is genome-wide identification of the specific oligos, which can be employed in microarray probe design, primer design, and siRNA design. An artificial neural network (ANN) is a popular learning approach that effectively handles noise and complex relationships in a robust way. In this dissertation, the ANN has been utilized to integrate the 10-mer ~ 26-mer densities of unique subsequence. We presented a novel and efficient algorithm that integrates the ANN and BLAST, named IAB algorithm, to identify the specific oligos. The performance of the IAB algorithm was about 5-7 times faster than the BLAST search without ANN in the experimental results. The second component is the development of the novel method in microarray data analysis, which can be employed in cancer classification and pathway analysis. Cancer classification and pathway analysis are promising methods to discover the underlying molecular mechanisms by using microarray data. However, linking molecular classification and pathway analysis by gene network approach has not been discussed yet. After continuous investigation, we discovered that the inside of the gene networks have information for cancer classification and pathway analysis. In this dissertation, we developed a novel framework that can construct the class-specific gene networks for classification and pathway analysis, which includes a novel network construction, named ordering networks. Thus, the topology-based classification and pathway analysis have been developed in this dissertation. The accuracy and stability of classification performance, the limitation of linear relationship, the power-law property, the time complexity of network construction, and literature studies have been investigated. Our results suggest that the ordering network construction has outstanding performances. The integration of comprehensive biological databases and the study of complex regulation of TFs and miRNAs are the third component. TFs and miRNAs play important roles in regulation of gene expression and the study on their combinatory regulations of gene expression is a new research field. In this dissertation, we constructed a comprehensive web server, named composite regulatory signature database (CRSD), that integrates UniGene, miRBase, promoter, TRANSFAC, pathway, gene ontology, and genome databases. To accomplish the data analysis of microarray at one go, several methods including the microarray data pretreatment, statistic and clustering analysis, iterative enrichment analysis, and motif discovery were closely integrated in the web server. We further extended and enhanced the framework of CRSD to develop the plant composite regulatory signature database (PlantCRSD), which includes comprehensive annotations of plant miRNAs for Arabidopsis thaliana and Oryza sativa. This study is a framework to provide integrated and comprehensive knowledge for computational systems biology, which contains multiple levels and various fields in bioinformatics. The novel method of microarray probe design can help biological experiments to obtain the microarray data. The topology-based cancer classification and pathway analysis contribute a new way to the application of gene networks, and they may provide a criterion to evaluate the accuracy of gene networks. Our comprehensive web server is an integrative platform for systems biology study. This server has the predictions of genome-wide biological behaviors and the user-friendly interface, which may constribute to worldwide academic interconnection, knowledge base establishment, and mutual validation.
URI: http://hdl.handle.net/11455/19355
其他識別: U0005-2001200703323100
文章連結: http://www.airitilibrary.com/Publication/alDetailedMesh1?DocID=U0005-2001200703323100

