Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/17728
標題: 具遺失訊息下混合高斯分佈的精簡建模
Parsimonious Gaussian Mixture Modelling With Missing Information
作者: 陳甲樺
Chen, Chia-Hua
關鍵字: GMM;高斯混合模型;Missing;Parsimonious;EM algorithms;BIC;遺失;精簡;EM演算法;貝氏訊息準則
出版社: 應用數學系所
引用: Banfield, J. D. and Raftery, A. E. (1993) “Model-based Gaussian and non-Gaussian clustering,” Biometrics, 49, 803-821. Celeux, G. and Govaert, G. (1995) “Gaussian parsimonious clustering models,” Pattern Recognition, 28, 781-793. Day, N. E. (1969) Estimating the Component s of a Mixture of Normal Dist ributions. Biomet rika , 56 (3) :463-474 Dempster, A. ,Laird, N. , Rubin, D. (1977) Maximum Likehood Estimation f rom Incomplete Data via the EM Algo2 rithm. J . Royal Statistical Soc. B , 39 :1-38 Diebolt, J., Robert, C.P. (1994) Estimation of finite mixture distributions through Bayesian sampling. Journal of the Royal Statistical Society. Series B 56, 363-375. Fraley, C. and Raftery, A.E. (1998) How many clusters? Which clustering methods? Answers via model-based cluster analysis. Computer Journal, 41, 578-588. Fraley, C., Raftery, A.E. (2002) Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association 97, 611-612. Ghahramani, Z., Hinton, G.E. (1997) The EM algorithm for mixtures of factor analyzers (Tech. Report No. CRG-TR-96-1), University of Toronto. Healy, M.J.R. (1968) Multivariate normal plotting. Applied Statistics 17, 157-161. Jain, A.K. , Duin, R.P.W. , Mao, J. (2000) Statistical Pattern Rec2 cognition :A Review. IEEE Transactions on Pattern A2 nalysis and Machine Intelligence , 22 (1) :4-48 Kim, J. O., and Curry, J. (1977) The treatment of missing data in multivariate analysis. Social. Meth. Res. 6, 215-240. Lin, T.I. (2009) Maximum likelihood estimation for multivariate skew normal mixture models. Journal of Multivariate Analysis 100, 257-265. Lin, T.I., Lee, J.C., Ho, H.J. (2006) On fast supervised learning for normal mixture models with missing information. Pattern Recognition 39, 1177-1187. Lin, T.C., Lin, T.I. (2010) Supervised learning of multivariate skew normal mixture models with missing information. Computational Statistics 25, 183-201. Liu, C.H., Rubin D.B. (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81, 633-648. Liu C.H., Rubin D.B. (1995) ML estimation of the t distribution using EM and its extensions, ECM and ECME. Statistica Sinica 5, 19-39. Little, R. J. A. and Rubin, D. B. (1987) Statistical analysis with missing data. New York: Wiley. M. Nishida and T. Kawahara (2005) “Speaker Model Selection Based on the Bayesian Information Criterion Applied to Unsupervised Speaker Indexing”, IEEE Trans. On Speech and Audio Processing, Vol. 13, No. 4. McLachlan, G. J. and Basford, K. E. (1988) Mixture models: Inference and applications to clustering, New York: Marcel Dekker Inc. McLachlan, G.J. and D. Peel. (2000) Finite Mixture Models. New York: John Wiley and Sons INC. McLachlan, G.J., Krishnan, T. (2008) The EM Algorithm and Extensions, 2nd edn, John Wiley and Sons, New York. McNicholas P.D., Murphy T.B. (2008) Parsimonious Gaussian mixture models. Statistics and Computing 18, 285-296. McNicholas, P.D. (2010) Model-based classification using latent Gaussian mixture models. Journal of Statistical Planning and Inference 140, 1175-1181. McNicholas, P.D., Murphy, T.B., McDaid, A.F., Frost, D. (2010) Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Computational Statistics and Data Analysis 54, 711-723. Meng, X.L., Rubin, D.B. (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80, 267-78. Meng, X.L., van Dyk, D. (1997) The EM algorithm-an old folk-song sung to a fast new tune. Journal of the Royal Statistical Society. Series B 59, 511-567. Schwarz, G. (1978) Estimating the dimension of a model. The Annuals of Statistics, 6:461-464. Tipping, M.E., Bishop, C.M. (1999) Mixtures of probabilistic principal component analyzers. Neural Computation 11, 443-482. Ueda, N., Nakano, R., Ghahramani, Z., Hinton, G.E. (2000) SMEM algorithm for mixture models. Neural Computation 12, 2109-2128. Zhao, J.H., Yu, P.L.H. (2008) Fast ML Estimation for the Mixture of Factor Analyzers via an ECM Algorithm. IEEE Transactions on Neural Networks 19, 1956-1961. Zhao, J.H., Yu, P.L.H., Jiang Q. (2008) ML estimation for factor analysis: EM or non-EM? Statistics and Computing 18, 109-123.
摘要: 
Celeux and Govaert (1995, Pattern Recognition, 28, pp. 781-793)提出了一個新型的高斯混合模型(GMM),其中群組內的共變異數矩陣是在幾何的解釋方式下去作精簡地架構,此概念原創於Banfield and Raftery (1993, Biometics, 49, pp. 803-821)。在隨機遺失訊息下,本文建立一些具計算彈性的EM演算法來估計十種精簡的GMM之模型參數。為了計算上的便利與理論的發展,在估計過程中,我們引入兩個輔助指標矩陣來正確地選取觀察到與遺失成份的位置。此外,我們也討論起始值的選擇及聚集評估的不確定性等計算方面的問題。在此研究中,我們以貝氏訊息準則(BIC)為基礎對可能的模型進行選擇,其中BIC乃為貝氏因子的一個可靠近似。最後,我們藉由實例分析及不同遺失比例下之模擬研究來闡述所提出方法的實用性。

Celeux and Govaert (1995, Pattern Recognition, 28, pp. 781-793) presented a new class of Gaussian mixture models (GMM) in which the within-group covariance matrices are structured parsimoniously in a geo -metrically interpretable way as originally introduced by Banfield and Raftery (1993, Biometics, 49, pp. 803-821). In this thesis, we establish computation -ally flexible EM-type algorithms for parameter estimation of ten parsi -monious forms of GMM under missing at radom mechanism. For the ease of computation and theoretical developments, two auxiliary indicator matrices are incorporated into the estimating procedure for exactly extracting the location of observed and missing components of each observation. Computational aspects including the choice of starting values as well as the uncertainties of clustering assessment are also discussed. In this approach, the probable models are selected based on the Bayesian information criterion, which is a reliable approximation to the Bayes factors. The practical usefulness of the proposed methodology is illustrated with real examples and simulation studies with varying proportions of missing values.
URI: http://hdl.handle.net/11455/17728
其他識別: U0005-1906201120480700
Appears in Collections:應用數學系所

Show full item record
 

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.