Please use this identifier to cite or link to this item:
標題: 赤池信息準則與循序的靴環概度比測驗應用於檢驗稻米之新/舊米混合狀態的比較
Comparison Between Akaike Information Criteria and Sequential Bootstrap Likelihood Ratio Test in Examining the Blending Status of Fresh/Old Grains
作者: 莊凱恩
Chuang, Kai-En
關鍵字: freshness;新鮮度;mixture distribution;Akaike information criterion;sequential bootstrap likelihood ratio test;混合分布;赤池信息準則;循序的靴環概度比測驗
出版社: 農藝學系所
引用: Aitkin M., D. Anderson, and J. Hinde. 1981. Statistical modeling of data on teaching styles(with discussion). J. R. Statist. Soc. B. 144: 419-461. Akaike, H.1974. A new look at the statistical model identification. IEEE Trans. Autom. Contr. 19: 716-723. Bulter, R. W. 1986. Predictive likelihood inference with applications. J. R. Statist. Soc. B 48: 1-38. Bhattacharya, C. G.. 1967. A simple method for resolution of a distribution into its Gaussian componenets. Biometrics 23: 115-135. Bozdogan, H. 1987. Model selection and Akaike information criterion(AIC): the general theory and its analytical extensions. Psychometrika 52: 345-370. Burnham, K. P., and D. R. Anderson. 2004. Multimodel inference: understanding AIC and BIC in model selection. Sociol. Methods Res. 33: 261-304. Celeux G.. and G.. Soromenho.1996. An entropy criterion for assessing the number of clusters in a mixture model. J. Classification 13: 195-212. Chen T. F. and C. L. Chen, 2003. Analysing the freshness of intact rice grains by color determination of peroxidase activity. J. Sci. Food Agric. 81: 1214-1218. Chernoff, H. 1954. On the distribution of the likelihood ratio. Ann. Math. Statist. 25: 573-578. Cohen, A. C., 1967. Estimation in mixtures of two normal distributions. Technometrics 9: 15-28. Dempster, A. P., N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood estimation from incomplete data via the EM algorithm(with discussion). J. R. Statist. Soc. B. 39: 1-38. Efron, B. 1979. Bootstrap methods: another look at the jackknife. Ann. Statist. 7: 1-26. Feng, Z. and C. E. McCulloch. 1992. Statistical inference using maximum likelihood estimation and the generalized likelihood ratio when the true parameter is on the boundary of the parameter space. Statist. Probab. Lett. 13: 325-332. Fowlkes, E. B. 1979. Some methods for studying the mixture of two normal (lognormal) distribution. J. Amer. Statist. Assoc. 74: 561-575. Fryer, J. G.. and C. A. Robertson. 1972. A comparison of some methods for estimating mixed normal distributions. Biometrika 59:639-648. Ghosh, J. K. and P. K. Sen. 1985. On the asymptotic performance of the log-likelihood ratio statistic for the mixture model and related result. In proceedings of the Berkeley Conference in honor of Jerzy Neyman and Jack Kiefer. Ed. L. M. Cam and R. A. Olshen. 2: 789-806. Wadsworth. Monterey. Harding, J. P. 1949. The use of probability paper for graphical analysis of polymodal frequency distribution. J. marine biological Assoc. 28: 141-153. Hathaway. R. J. 1986. A constrained EM algorithm for univariate normal mixtures. Comput. Statist. Simul. 23: 211-230 Hope, A. C. 1968. A simplified Monte Carlo significance test procedure. J. Roy. Stat. Soc. 30: 582-598. Hurvich. C. M. and C. L. Tsai. 1989. Regression and time series model selection in small samples. Biometrika 76: 297-307. Leroux, M. 1992. Consistent estimation of a mixing distribution. Annals. Stat., 20: 1350-1360. Lii, L. J., C. Y. Wang, and H. S. Lur. 1999. A novel means of analyzing the soluble acidity of rice grains. Crop Sci. 39: 1160-1164. Lin J. L. and C. L. Chen. 2008. Statistical analysis of the freshness measurements at single-grain level when examining the blending status of fresh/old grains in a batch of rice grains. Crop, Environment & Bioinfomatics 5: 281-296. McLachlan, G. J. 1987. On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Biometrica 88: 767-778. McLachlan, G. J. and K. E. Basford, 1988. Mixture Models: Inference rind Applications to Clustering. Marcel Dekker, New York. Meng, X. -L. and D. van Dyk. 1997. The EM Algorithm─ an Old Folk-song Sung to a Fast New Tune. J. R. Stastist. Soc. B 59: 511-567. Newcomb, S. 1886. A generalized theory of the combination of observations so as to obtain the best result. Amer. J. Math. 8: 343-366. Ng, S.-K. and G. J. McLachlan. 2003. On some variants of the EM Algorithm for fitting finite mixture models. Aust. J. Stat. 32: 143-161. Nityasuddhi, D. and D. Böhning. 2003. Asymptotic properties of the EM algorithm estimate for normal mixture models with component specific variances. Comput. Statist. Data Anal. 41: 591-601. Oliveira-Brochado A. and F.V. Martins. 2005. Assessing the number of components in mixture models: a review, FEP Working Papers, 194. Pearson, K. 1894. Contributions to the mathematical theory of evolution. Phil. Trans. R. Soc. A 185: 71-110. Rao, C. R. 1948. The utilization of multiple measurements in problems of biological classification. J. R. Statist. Soc. B 10:159-203. Redner, R. A. and H. F. Walker. 1984. Mixture densities, maximum likelihood and the EM algorithm. SIAM Review 26: 195-239. Sclove, S. L. 1983. Application of the conditional population-mixture model to image segmentation. IEEE Trans. Patt. Anal. Mach. Intell. PAMI-5: 428-233. Self, S. G. and K. Y. Liang. 1987. Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J. Amer. Stast. Assoc. 82: 605-610. Shono, H. 2005. Is model selection using Akaike's information criterion appropriate for catch per unit effort standardization in large samples? Fish. Sci. 71: 978-986. Tan, W. Y. and W. C. Chang. 1972. Some comparison of the method of moments and the method of maximum likelihood in estimating parameters of a mixture of two normal densities. J. Amer. Statist. Assoc. 67: 702-708. Titterington, D. M., A. F. M. Smith and U. E. Makov. 1985. Statistical analysis of finite mixture distribution. Wiley. New York. Wolfe, J. H. 1971. A Monte Carlo study of the sampling distribution of the likelihood ratio for mixture of multinromal distributions. Technical STB 72-2. San Diego: U. S. Naval Personnel and Training Research Laboratory. Wu, C. F. 1983. On the convergence properties of the EM algorithm. Ann. Statist. 11: 95-103.
檢驗一批米是否有新/舊米混合的狀態,在統計上的意義即是在測驗一個混合分布的成分數。Lin and Chen (2008)使用循序的靴環概度比測驗來測驗成分數,但是循序的靴環概度比測驗計算過程繁複,電算速率低下。因此,我們希望以算程較為簡便的赤池信息準則來簡化分析程序。
我們以模擬試驗比較兩個方法的表現。模擬的範圍包括純新米(成分數g = 1)與有舊米混入(成分數g = 2, 3, 4),以及不同舊米混入率等數種情境。結果顯示,在純新米的情形下,赤池信息準則的經驗第一型錯誤率隨抽驗米粒數增加而降低;而循序的靴環概度比測驗則是與預設的顯著水準沒有顯著差異。而在混有舊米的情形,兩個方法的經驗檢驗力皆良好;並且均隨抽驗米粒數增加而上升。在正確估計成分數的部份,赤池信息準則在抽驗米粒數增加時,容易高估成分數。而循序的靴環概度比測驗在只有前一期舊米混入時(成分數g = 2),正確估計成分數的經驗機率高於赤池信息準則;但在有兩期舊米以上混入時(成分數g = 3, 4),則表現不如赤池信息準則。
若是只關心一批米是否有舊米混入的情形(H0: g = 1 vs. H1: g ≥ 2),赤池信息準則在抽驗米粒數n > 672時,經驗檢驗力便高達95%。而循序的靴環概度比測驗則在抽驗米粒數n > 480就可達95%以上。此外赤池信息準則無法估計各成分混入率的信賴區間,必須仰賴循序的靴環概度比測驗。並且在估計成分數的部份,赤池信息準則也有高估的疑慮。雖然赤池信息準則的電算時間較短,但是實際時間的差距不大,無法成為取代循序的靴環概度比測驗的理由。

Examining the blending status of old/fresh rice grains amounts to testing the number of components of a mixture distribution in statistical sense. Lin and Chen (2008) employed the sequential bootstrap likelihood ratio test (sequential bootstrap LRT) to test the number of components of different storage duration in a rice batch. The complicated computing process of the sequential bootstrap LRT method, however, entails to a declined efficiency of computer computation. Therefore, the current study, in hope of searching a more efficient tool, employed the Akaike information criteria (AIC) method, which is much simpler in terms of computing process, to simplify the analytical procedures.
We utilized simulations to compare the performances of the two methods under evaluation. We simulated a range of different scenarios covering from the case of pure fresh rice, i.e. number of component g = 1, to cases of some mixtures of old/fresh rice, i.e. number of component g = 2, 3, 4, with varied mixing proportions. The result shows that, in case of pure fresh rice, the empirical type I error of the AIC method decreases as the number of grains examined increases, while that of sequential bootstrap LRT method remains insignificantly varied with the preset significance levels. In case of mixed old/fresh rice, the empirical test powers of the two methods are both good and both ascend as the number of grains examined increase. Concerning the performance on correct estimation for number of components, the AIC method tends to overestimate the number of components when the number of grains examined increases. Comparing the two methods, the probabilities for the sequential bootstrap LRT to correctly estimate the number of components are higher than those of the AIC method in the case of two components (g = 2), while in other cases of mixtures with more components (g = 3 or 4), the AIC method tends to perform higher hits than the sequential bootstrap LRT.
If we are concerned with only the existence of blending, i.e. whether a batch of rice is purely fresh or not (H0: g = 1 vs. H1: g ≥ 2), then the empirical test power of the AIC method can reach a high level of 95% when the number of grains examined is greater than 672. However, the empirical test power of the sequential bootstrap LRT can get to the same level of 95% with number of grains examined only exceeding 480. In addition, the confidence intervals of mixing proportions can not be estimated by the AIC method, but can be only done by the sequential bootstrap LRT. The AIC method also bears the risk of overestimating the number of components. While the computing time for the AIC method is shorter than that of the sequential bootstrap LRT, the actual time saved in operation is however negligible. Thus there exist no good reasons to substitute the sequential bootstrap LRT with the AIC method.
其他識別: U0005-2501201112350000
Appears in Collections:農藝學系

Show full item record

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.