Please use this identifier to cite or link to this item:
標題: 最佳基因個數的評估
Evaluation of the top gene number
作者: 李佳瑾
Li, Chia-Chin
關鍵字: Gene expression profiles
Gene ranking
Dimension reduction
Proportional hazards model.
出版社: 應用數學系所
引用: Bair E., and Tibshirani R. (2004), “Semi-supervised methods to predict patient survival from gene expression data,” PLoS Biology, 2: 511-522. Bair E., Hastie T., Paul D., and Tibshirani R. (2006),“Prediction by supervised principal components,” Journal of the American Statistical Association, 101: 119-137. Bhattacharjee A., Richards W.G., Staunton J., Li C., Monti S., Vasa P., Ladd C., Beheshti J., Bueno R., Gillette M., Loda M., Weber G., Mark E.J., Lander E.S., Wong W., Johnson B.E., Golub T.R., Sugarbaker D.J., and Meyerson M. (2001), “Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses,” PNAS, 98: 13790-13795. Boulesteix A.L., and Strimmer K. (2007), “Partial least squares: a versatile tool for the analysis of high-dimensional genomic data,” Bioinformatics, 8: 32-44. Bøvelstad H.M., Nyg˚ard S., Størvold H.L., Aldrin M., Borgan Ø., Frigessi A., and Lingjærde O.C. (2007),“Predicting survival from microarray data -a comparative study,” Bioinformatics, 23: 2080-2087. Chen D.T., Schell M.J., Chen J.J., Fulp W.J., Eschrich S., and Yeatman T. (2008), “A predictive risk probability approach for microarray data with survival as the endpoint,” Journal of Biopharmaceutical Statistics, 18: 841-852. Cox D.R. (1972), “Regression models and life tables (with discussion),” Journal of the Royal Statistical Society. Series B (Methodological), 34: 187-220. Efron B., Hastie T., Johnstone I., and Tibshirani R. (2004), “Least angle regression,” Annals of Statistics, 32: 407-499. Gui J., and Li H. (2005a), “Penalized Cox regression analysis in the highdimensional and low-sample size settings, with applications to microarray gene expression data,” Bioinformatics, 21: 3001-3008. Gui J., and Li H. (2005b), “Threshold gradient descent method for censored data regression with applications in pharmacogenomics,” Pacific Symposium on Biocomputing, 10: 272-283. Nguyen D.V., and Rocke D.M. (2002), “Partial least squares proportional hazard regression for application to DNA microarray survival data,” Bioinformatics, 18: 1625-1632. Nguyen D.V. (2005), “Partial least squares dimension reduction for microarray gene expression data with a censored response,” Mathematical Biosciences, 193: 119-137. Park P.J., Tian L., and Kohane I.S. (2002), “Linking gene expression data with patient survival times using partial least squares,” Bioinformatics, 18: S120-S127. Sha N., Tadesse M.G., and Vannucci M. (2006), “Bayesian variable selection for the analysis of microarray data with censored outcomes,” Bioinformatics, 22: 2262-2268. Tan Q., Thomassen M., Jochumsen K.M., Mogensen O., Christensen K., and Kruse T.A. (2008), “Gene selection for predicting survival outcomes of cancer patients in microarray studies,” In: Tarek Sobh (ed.), Advances in Computer and Information Sciences and Engineering, Springer Netherlands, 405-409. van't Veer L.J., Dai H., van de Vijver M.J., He Y.D., Hart A.A., Mao M., Peterse H.L., van der Kooy K., Marton M.J., Witteveen A.T., Schreiber G.J., Kerkhoven R.M., Roberts C., Linsley P.S., Bernards R., and Friend S.H. (2002), “Gene expression profiling predicts clinical outcome of breast cancer,” Nature, 415: 530-536. Verweij P.J.M., and van Houwelingen H.C. (1993), “Cross-validation in survival analysis,” Statistics in medicine, 12: 2305-2314. Wu T., Sun W., Yuan S.,Chen C.H., and Li K.C. (2008), “A method for analyzing censored survival phenotype with gene expression data,” BMC Bioinformatics, 9: 417.
摘要: 微陣列基因表現資料在統計分析上的重要應用是能夠被用來預測病人臨床上的結果。為了建立良好的預測模型,準確的挑選出顯著基因是很重要的步驟。本篇研究中,我們分別採用p值和Cox分數兩個統計數值將肺癌病人的基因作排序,經由主成分分析、監督式主成分分析以及部分最小平方法結合比例風險模型探討基因個數對預測結果的響,篩選出最佳基因數。最後,將選出的顯著基因重新建立預測模型與其他文獻中方法作比較,並且依據三種不同評估標準來評估預測結果。研究結果可知經由我們基因篩選過程的方法確實能達到比較好的預測結果。
One important application of microarray gene expression data in the statistical analysis is used to predict diseased patients'' clinical outcomes. Accurate selection of significant genes is a crucial step for building a good performance prediction model. In this study, we adopt the statistics p-value and Cox score separately to rank the lung cancer patients'' genes, and then pick out the optimal number of top genes via exploring the effect of the top ranked gene number on prediction with principal component analysis, supervised principal components and partial least squares methods combined with Cox proportional hazards model. Finally, we use the selected significant genes to re-build a predictive model for different methods and compare with other reference''s methods. Furthermore, we assess the predictive performance by three different evaluation criteria. The results show that our predictive methods through gene selection procedure really achieve better predictive performances.
其他識別: U0005-2307200917254000
Appears in Collections:應用數學系所



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.