Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/18161
標題: 利用選取的顯著基因去做預測
Prediction by using selected top genes
作者: 林雨農
Lin, Yu-Nong
關鍵字: DNA microarray;DNA 微陣列矩陣;Cox scores;principal component analysis;supervised;Cox 分數;主成分分析;監督主成分分析;偏最小平方法
出版社: 應用數學系所
引用: 1. Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., Powell, J.I., Yang, L., Marti, G.E., Moore, T., Hudson, J., Lu, L., Lewis, D.B., Tibshirani, R., Sherlock, G., Chan, W.C., Greiner, T.C., Weisenburger, D.D., Armitage, J.O., Warnke, R., Staudt, L.M., Levy, R., Wilson, W., Grever, M.R., Byrd, J.C., Botstein, D. and Brown, P.O. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403, 503-511. 2. Bair, E. and Tibshirani, R. (2004). Semi-supervised methods to predict patient survival from gene expression data, PLoS Biology, 2, 511-522. 3. Boulesteix, A.L. (2004). PLS Dimension reduction for classification with mi- croarray data. Statistical applications in genetics and molecular biology, 3, 1:33. 4. DeRisi, J., Penland, L., Brown, P.O., Bittner, M.L., Meltzer, P.S., Ray, M., Chen, Y., Su, Y., Trent, J. M. (1996). Use of a cDNA microarray to analyse gene expression patterns in human cancer, Nature Genetics, 14, 457-460. 5. Dettling, M. and Buhlmann, P. (2003). Boosting for tumor classification with gene expression data. Bioinformatics, 19, 1061-1069. 6. Dudoit, S., Fridlyard, J. and Speed, T.P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97, 77-87. 7. Frank, I. and Friedman, J. (1993). A Statistical View of Some Chemometrics Regression Tools. Technometrics, 35, 109-148 8. Ghosh, D. (2002). Singular value decomposition regression modeling for clas- sification of tumors from microarray experiments. Proceedings of the Pacific Symposium on Biocomputing, 11462-11467. 9. Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M., Raffeld, M., Yakhini, Z., Ben- Dor, A., Dougherty, E., Kononen, J., Bubendorf, L., Fehrle, W., Pittaluga, S., Gruvberger, D., Loman, N., Johannsson, O., Olsson, H., Wilfond, B, Sauter, G., Kallioniemi, O., Borg, A. and Trent, J. (2001). Gene expression profiles in hereditary breast cancer, New England Journal of Medicine, 344, 539-548. 10. Huang, X. and Pan, W. (2003). Linear regression and two-class classification with gene expression data. Bioinformatics, 19, 2072-2078. 11. Lockhart, D.J., Dong, H., Byrne, M.C., Follettie, M.T., Gallo, M.V., Chee, M.S., Mittmann, M., Wang, C., Kobayashi, M. Horton, H., Brown, E.L. (1996). Expression monitoring by hybridization to high-density oligonucleotide arrays, Nature Biotechnology, 14, 1675-1680. 12. Lusa, L., McShane, L.M., Radmacher, M.D., Shih, J.H., Wright, G., and Simon, R. (2007). Appropriateness of some resampling-based inference proce- dures for assessing performance of prognostic classifiers derived from microar- ray data, Statistics in Medicine, 26, 1102-1113. 13. Nguyen, D.V. and Rocke, D.M. (2002a). Tumor classification by partial least squares using microarray gene expression data. Bioinformatics, 18, 39-50. 14. Nguyen, D.V. and Rocke, D.M. (2002b). Multi-class cancer classification via partial least squares with gene expression profiles. Bioinformatics, 18, 1216- 1226. 15. Park, P.J., Tian, L. and Kohane, I. S. (2002). Linking gene expression data with patient survival times using partial least squares. Bioinformatics, 18, 120-127. 16. Perou, C.M., S
摘要: 
DNA微陣列矩陣在生物學的研究上是一個很常被使用而且很重要的工具, 特別是針對癌症的研究。除了存活資料以外, 我們也可以使用DNA 微陣列矩陣的基因表現數據去找出與癌症相關的可能因子。在基因的表現數據上面, 通常基因的個數十分的龐大可是樣本的個數卻非常少。因此, 當我們在建構一個預測模型並且要估計參數的時候, 會面臨到非常困難的計算。為了解決這樣的問題, 我們提出了評估顯著的基因個數會造成的影響。我們使用了兩個基因排序的方法,p值和Cox 分數, 以及使用了三個降低維度的方法, 主成分分析, 監督主成分分析和偏最小平方法, 去使用訓練資料的顯著基因建構出預測模型, 然後我們去比較訓練資料和測試資料的結果。最後我們再去比較上面所提到的方法之間的差異。

DNA microarray is a useful and important tool in biology research, especially in cancer research. Except for survival data, we can also use the gene expression data
from DNA microarray to identify the possible factors associated with the cancer. In gene expression data, there are large number of genes and small number of samples.
Therefore, when we construct a prediction model and estimate the parameters, we will face with difficulty in computation. In order to solve this problem, we evaluated
the effect of the number of top ranked genes. We used two gene ranking methods, the statistics p-values and Cox scores, and used three dimension reduction method, the principal component analysis, the supervised principal component analysis and the partial least squares approach, to build the prediction model with the top ranked gene of training data, then we used the same top genes to compare the results of training data with test data. Lastly, we compared the performance of the above methods.
URI: http://hdl.handle.net/11455/18161
其他識別: U0005-2907200918580900
Appears in Collections:應用數學系所

Show full item record
 

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.