標題: 具遺失訊息下混合高斯分佈的精簡建模
Parsimonious Gaussian Mixture Modelling With Missing Information
作者: 陳甲樺
Chen, Chia-Hua
關鍵字: GMM;高斯混合模型;Missing;Parsimonious;EM algorithms;BIC;遺失;精簡;EM演算法;貝氏訊息準則
出版社: 應用數學系所
Celeux and Govaert (1995, Pattern Recognition, 28, pp. 781-793)提出了一個新型的高斯混合模型(GMM),其中群組內的共變異數矩陣是在幾何的解釋方式下去作精簡地架構,此概念原創於Banfield and Raftery (1993, Biometics, 49, pp. 803-821)。在隨機遺失訊息下,本文建立一些具計算彈性的EM演算法來估計十種精簡的GMM之模型參數。為了計算上的便利與理論的發展,在估計過程中,我們引入兩個輔助指標矩陣來正確地選取觀察到與遺失成份的位置。此外,我們也討論起始值的選擇及聚集評估的不確定性等計算方面的問題。在此研究中,我們以貝氏訊息準則(BIC)為基礎對可能的模型進行選擇,其中BIC乃為貝氏因子的一個可靠近似。最後,我們藉由實例分析及不同遺失比例下之模擬研究來闡述所提出方法的實用性。

Celeux and Govaert (1995, Pattern Recognition, 28, pp. 781-793) presented a new class of Gaussian mixture models (GMM) in which the within-group covariance matrices are structured parsimoniously in a geo -metrically interpretable way as originally introduced by Banfield and Raftery (1993, Biometics, 49, pp. 803-821). In this thesis, we establish computation -ally flexible EM-type algorithms for parameter estimation of ten parsi -monious forms of GMM under missing at radom mechanism. For the ease of computation and theoretical developments, two auxiliary indicator matrices are incorporated into the estimating procedure for exactly extracting the location of observed and missing components of each observation. Computational aspects including the choice of starting values as well as the uncertainties of clustering assessment are also discussed. In this approach, the probable models are selected based on the Bayesian information criterion, which is a reliable approximation to the Bayes factors. The practical usefulness of the proposed methodology is illustrated with real examples and simulation studies with varying proportions of missing values.
