標題: Prediction of Ovarian Cancer Stages and Construction of Gene Network by Integrating Machine Learning and Bioinspired Algorithms
作者: 林守羿
Shou-Yi Lin
關鍵字: Ovarian cancer
Support vector machine
摘要: Ovarian cancer is the common gynecological diseases. According to the statistics of Ministry of Health and Welfare; ovarian cancer is one of ten leading causes of women death. The main reason is because of no appreciable at the early stage of ovarian cancer and there are usually in the terminal stage when patient has been diagnosed with ovarian cancer. Furthermore, the biomarker CA125 is lack of specificity and sensitivity when selected ovarian cancer; it's also the reason that result high mortality rate of ovarian cancer. Hence, finding the biomarker which could detected ovarian cancer precisely is a crucial topic. The method of microarray analysis and data mining has been wildly used in the cancer research recently and achieved great result. Therefore, this research propose a method which use data mining algorithms build a model that can detect the phase of cancer research. This model has three steps, in the first step, this research used ovarian microarray as sample; and preliminary screening the target gene that could detected the ovarian cancer through the Information Gain which were ID3 and C4.5 algorithms. The second step is test the classification rate of target gene through four methods, GA-SVN、 PSO-SVM 、 ABC-SVM and DFABC-SVM which were meta-heuristic algorithms combine SVM, and then select the target gene which has top one classification rate. The final step is use the online biological database and analysis software, IPA, to build the genetic network of ovarian cancer gene; which help us understand the connection of each target gene in the different phase. In this study, we select PAPPA、STAT2、BCL2 gene, and we expect the result of this research can be the basis of clinical trial of ovarian cancer and help doctor diagnosis and find ovarian cancer in the early stage to increase the survival rate of patients.
卵巢癌為中西方常見的婦癌疾病。根據衛生署福利部統計,卵巢癌為 101 年台灣婦女十大癌症死因之一。主要原因在於卵巢癌在發病初期並沒有明顯特徵,病人檢測出罹癌時,大多已是癌症末期。此外腫瘤血清標誌 CA125 篩選卵巢癌之特異性與敏感性之不足,導致卵巢癌致死率居高不下。因此找出能夠精準檢測卵巢癌之標靶物是個迫切的議題。近年來,微陣列資料分析與資料探勘方法被廣泛應用在癌症研究領域中,並有不錯的成效。故本研究旨在利用資料探勘演算法,建立一組能夠檢測卵巢癌期別之模型。本研究以卵巢癌微陣列資料為樣本,先利用資訊獲利、ID3 演算法與 C4.5 演算法,初步篩選出足以檢測卵巢癌之標靶基因。再透過由仿生演算法所改良的四種 SVM 演算法:GA-SVN、PSO-SVM、ABC-SVM、DFABC-SVM,來對標靶基因進行分類準確度之檢測,並挑選出分類準確度高之標靶基因。最後使用線上生物資料庫與分析軟體 IPA,來建構卵巢癌標靶基因之基因網路,以了解在卵巢癌於不同病理期別中每個標靶基因之相互關係。本研究篩選出PAPPA、STAT2、BCL2 等標靶基因,並期望經由生物實驗驗證後,能作為未來卵巢癌篩檢以及臨床相關研究之依據,並輔助醫生進行卵巢癌診斷,以進行卵巢癌之早期治療,來提升卵巢癌病人之存活率。
