Please use this identifier to cite or link to this item:
標題: 改善自我組織圖群聚分析效率,準確度與視覺化的新方法
New Schemes on Improving Clustering Efficiency, Accuracy, and Visualization of Self-Organizing Maps.
作者: 謝淑玲
Shieh, Shu-Ling
關鍵字: 類神經網路
Neural Network
Self-Organizing Maps
Unsupervised Learning
Reference Point
Clustering Method
Clustering Validity Index, Visualization.
出版社: 資訊科學與工程學系所
引用: 1. T. Abe, et al., "Self-Organizing Map unveils and visualizes hidden sequence," Gene, 2006, vol.365, pp. 27-34. 2. T. Abe, et al. "A large-scale genomics studies conducted with batch-learning SOM utilizing high-performance supercomputers." in Proceedings of the 10th international Work-Conference on Artificial Neural Networks: Part I: Bio-inspired systems: Computational and Ambient Intelligence, 2009, Salamanca, Spain, pp. 829-836. 3. E. Anderson, "The irises of the Gaspe peninsula," Bulletin of the American Iris Society59, 2-5, 1935. 4. E.H. Baehrecke, et al. "Visualization and analysis of microarray and gene ontology data with treemaps," Bioinformatics, 2004, vol.5, pp. 84-89. 5. H. Bauer and T. Villmann, "Growing a hypercubical output space in a self-organizing feature map," IEEE Transactions on Neural Networks, 1997, vol.8, pp. 226-233. 6. A.M. Bensaid, et al., "Validity-guided (Re) clustering with applications to image segmentation," IEEE Transactions on Fuzzy Systems, 1996, vol.4(2), pp. 112-123. 7. M.J.A. Berry and G. Linoff, Data mining techniques for marketing, sales and customer support. 1996: John Wiley & Sons, Inc. 8. J.C. Bezdek and N.R. Pal, "Some new indexes of cluster validity," IEEE Transactions on Systems, Man, and Cybernetics Part B, 1998, vol.28(3), pp. 301-315. 9. C.L. Blake and C.J. Merz. UCI repository of machine learning databases. mlearn /MLRepository.html, Department of Information and Computer Science, University of California at Irvine, CA 1998. 10. D. Brugger, M. Bogdan, and W. Rosenstiel, "Automatic cluster detection in Kohonen''s SOM," IEEE Transactions on Neural Networks, 2008, vol.19(3), pp. 442-459. 11. T. Calinski and J. Harabasz, "A dendrite method for cluster analysis," Communications in Statistics, 1974, vol.3(1), pp. 1-27. 12. T. Can, et al., "FPV: fast protein visualization using Java 3D," Bioinformatics, 2003, vol.19, pp. 913-922. 13. S.K. Card and J. Mackinlay. "The structure of the information visualization design space." in Processing of Information Visualization, 1997, pp. 92-99. 14. J. Chakma and K. Umemura, "Factor controlled hierarchical SOM visualization for large set of data," IEICE Transactions on Information and Systems, 2003 vol.86(9), pp. 1796-1803. 15. C.C. Chen. Computational Mathematics, Univ. Tsing Hua, Institute of Information Systems & Applications. Data Available at 2005. 16. E.H. Chi. "A taxonomy of visualization techniques using the data state reference model." in proceedings of the IEEE Symposium on Information Visualization, 2000, pp. 69-75. 17. R.N. Dave. "New measures for evaluation fuzzy partitions induced through c-shells clustering." in SPIE Conference Intelligence Robot Computer Vision X, 1991, Boston, pp. 406-414. 18. R.N. Dave, "Validation fuzzy partitions obtained through c-shells clustering," Pattern Recognition Letters, 1996, vol.17, pp. 613-623. 19. D.L. Davies and D.W. Bouldin, "A cluster separation measure," IEEE Transactions on Pattern Analysis and Machine Intelligence, 1979, vol.1(4 ), pp. 224-227. 20. D. Desieno, "Adding a conscience to competitive learning," IEEE International Conference on Neural Networks, 1988, pp. 117-124. 21. B. Diri and S. Albayrak, "Visualization and analysis of classifiers performance in multi-class medical data," Expert Systems with Applications, 2008, vol.34, pp. 628-634. 22. A. Dragut and C.M. Nichitiu, "A monotonic on-line linear algorithm for hierarchical agglomerative classification," Information Technology and Management, 2004, vol.5(1-2), pp. 111-141. 23. J.C. Dunn, "Well separated clusters and optimal fuzzy partitions," J.Cybern, 1974, vol.4, pp. 95-104. 24. Ester M., et al. "A density-based algorithm for discovering clusters in large spatial databases with noise." in Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, 1996, Portland, pp. 226-231. 25. E.A. Fernandez and M. Balzarini, "Improving cluster visualization in self-organizing maps: Application in gene expression data analysis," Computers in Biology and Medicine, 2007, vol.37(12), pp. 1677-1689. 26. B. Fritzke. "Let it grow-Self-organizing feature maps with problem dependent cell structure." in Proceedings of the International Conference on Artificial Neural Networks, 1991, pp. 403-408. 27. B. Fritzke, "Growing cell structures-a self-organizing network for unsupervised and supervised learning," Neural Networks, 1994, vol.7(9), pp. 1441-1460. 28. I. Gath and A.B. Geva, "Unsupervised optimal fuzzy clustering," IEEE Transactions on Pattern Analysis and Machine Intelligence, 1989, vol.11, pp. 773-781. 29. E. Gokcay, "Information theoretic clustering," PAMI, 2002, vol.24, pp. 158-171. 30. J. Han and M. Kamber, Data Mining: Concepts and Techniques. 2000: Morgan Kaufmann. 31. J. Han and M. Kamber, Data Mining: Concepts and Techniques, 2nd. 2006: Morgan Kaufmann. 32. A. Hardy, "On the number of clusters," Computational Statistics and Data Analysis, 1996, vol.23, pp. 83-96. 33. J. Iivarinen, et al., Visualizing the Clusters on the Self-Organizing Map, in Proceeding of Finnish Artificial Intelligence Society. 1994: Finland, pp. 122-126. 34. A.K. Jain, M.N. Murt, and P.J. Flynn, " Data clustering: A Review," ACM computing surveys, 1999, vol.31(3), pp. 264-323. 35. R.A. Johnson and D.W. Wichern, Applied Multivariate Statistical Analysis. 1992: Englewood Cliffs, NJ: Prentice-Hall. 36. Y.P. Jun, H. Yoon, and J.W. Cho, "L learning: A fast self-organizing feature map learning algorithm based on incremental ordering," IEICE Transactions on Information and Systems 1993, vol.76 (6), pp. 698-706. 37. J.A. Kangas, T.K. Kohonen, and J.T. Laaksonen, "Variants of self-organizing maps," IEEE Transactions on Neural Networks, 1990, vol.1, pp. 93-99. 38. M.Y. Kiang, "Extending the Kohonen self-organizing map networks for clustering analysis," Computational Statistics & Data Analysis, 2001, vol.38(2), pp. 161-180. 39. M.Y. Kiang, et al. "Improving the effectiveness of self-organizing map networksusing a circular Kohonen layer." in Proceedings of the 30th Hawaii International conference on System Sciences, 1997, pp. 521-529. 40. B. King, "Step-wise clustering procedures," J. Am. Stat. Assoc, 1967, vol.69, pp. 86-101. 41. T. Kohonen, Self-organization and associative memory. 3rd ed. 1989, New York: Berlin: Springer-Verlag. 42. T. Kohonen, "The self-organizing feature map," Proceedings of the IEEE, 1990, vol.78(9), pp. 1464-1480. 43. T. Kohonen, Self Organization and Associative Memory, ed. 2nd. 1995, New York: Springer. 44. T. Kohonen, Self-Organizing Map. 3rd ed. 2001: Springer, Berlin. 45. K. Koike, S. Kato, and T. Horiuchi. "A two-stage self-organizing map algorithm with threshold operation for data classifcation." in Proceedings of the Society of Instrument and Control Engineers, 2002, Osaka, pp. 3097-3099. 46. P. Koikkalaninen and E. Oja. "Self-organizing hierarchical feature maps." in Proceed-ings of the International Joint Conference on Neural Network (IJCNN), 1990, pp. 279-284. 47. H. Kusumoto and Y. Takefuji, "O(log2M) self-organizing map algorithm without learning of neighborhood vectors," IEEE Transactions on Neural Networks, 2006, vol.17(6), pp. 1656-1661. 48. J. Lampinen and E. Oja, "Clustering of hierarchical self-organizing maps," Journal of Mathematical Imaging and Vision 1992, vol.2, pp. 261-272. 49. L. Leinonen, et al., "Self-organized acoustic feature map in detection of fricative-vowel coarticulation," The Journal of the Acoustical Society of America, 1993, vol.93(6), pp. 3468-3474. 50. Z.Y. Liu and L. Xu, "Topological local principal component analysis," Neurocomputing, 2003, vol.55, pp. 739-745. 51. Z.P. Lo and B. Bavarian, "On the rate of convergence in topology preserving neural networks," Biological Cybernetics, 1991, vol.65 pp. 55-63. 52. C.N. Manikopoulos. "Finite state vector quantization with neural network classification of states." in IEEE Proceedings of Radar and Signal Processing, 1993, pp. 153-161. 53. C. Martin, et al., "Hyperbolic SOM-based clustering of DNA fragment features for taxonomic visualization and classification," Bioinformatics, 2008, vol.24(14), pp. 1568-1574. 54. F. Murtagh, "A survey of recent advances in hierarchical clustering algorithms which use cluster centers," Computer Journal, 1984, vol.26, pp. 354-359. 55. F. Murtagh, "Interpreting the Kohonen self-organizing feature map using contiguity-constrained clustering," Pattern Recognition Letter, 1995, vol.16, pp. 399-408. 56. J. Nikkila, et al., "Analysis and visualization of gene expression data using self-organizing maps," Neural Networks, 2002, vol.15, pp. 953-966. 57. E. Pampalk, A. Rauber, and D. Merkl. "Using Smoothed Data Histograms for Cluster Visualization in Self-Organizing Maps." in Proceedings of the International Conference of Artificial Neural Networks, 2002, Madrid, Spain Springer Lecture Notes in Computer Science, pp.871-876. 58. G. PÄolzlbauer, A. Rauber, and M. Dittenbach, Graph projection techniques for Self-Organizing Maps, in proceedings-European Symposium on Artificial Neural Networks Bruges 2005. 59. H.B. Perex and F.G. Nocetti, "Fault classification based upon self organizing feature maps and dynamic principal component analysis for inertial sensor drift," International Journal of Innovative Computing, Information and Control, 2007, vol.3(2), pp. 257-276. 60. D. Pfitzner, et al., A unified taxonomic framework for information visualization, in Conferences in Research and Practice in Information Technology. 2003: Adelaide, Australia, pp. 57-66. 61. A. Rauber, D. Merkl, and M. Dittenbach, "The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data," IEEE Transactions on Neural Networks, 2002, vol.13( 6), pp. 1331-1341. 62. H. Ressom, D. Wang, and P. Natarajanx, "Clustering gene expression data using adaptive double self-organizing map," Physiol. Genomics, 2003, vol.14, pp. 35-46. 63. H. Romesburg, Cluster analysis for researchers. 1984, Lifetime Learning Publications. 64. D.A. Ruths, et al., "Arbor 3D: an iteractive environment for examining phylogenetic and taxonomic trees in multiple dimensions," Bioinformatics, 2000, vol.16, pp. 1003-1009. 65. R. Shepard and J.D. Carrorll. "Parametric representation of nonlinear data structures." in proceeding international Symp. Multivariate Anal., 1965, P. R. Krishnaiah, Ed. New York: Academic, pp. 561-592. 66. S.L. Shieh and I.E. Liao, "A New Clustering Validity Index for Cluster Analysis Based on a Two-Level SOM," IEICE Transaction on Information and Systems, 2009, vol.E92-D(9), pp. 1668-1674. 67. S.L. Shieh, et al., "An efficient initialization scheme for SOM algorithm based on reference point and filters," IEICE Transactions on Information and Systems, 2009, vol.E92-D(3), pp. 422-432. 68. P.H.A. Sneate, C.L. Chang, and R.R. Sokal, eds. Numerical Taxonomy. 1973, Freeman: London, UK. 69. M.C. Su and H.T. Chang, "Fast self-organizing feature map algorithm," IEEE Transactions on Neural Networks, 2000, vol.11(3), pp. 721-733. 70. M.C. Su and H.T. Chang, "A new model of Self-organizing neural networks and its application in data projection," IEEE Transactions on Nural Network, 2001, vol.12(1), pp. 153-158. 71. M.C. Su, T.K. Liu, and H.T. Chang. "An efficient initialization scheme for the self-organizing feature map algorithm." in Proceedings of the International Joint Conference on Neural Networks, 1999, Washington, pp. 1906-1910. 72. P. Tamayo, et al. "Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentitaion." in Proc. Natl Acad. Sci., 1999, USA, pp. 2907-2912. 73. K. Tasdemir and E. Merenyi, "Cluster analysis in remote sensing spectral imagery through graph representation and advanced SOM visualization," Lecture NOotes in Computer Science, 2008, vol.5255 LNAI, pp. 259-271. 74. S. Theodoridis and K. Koutroumbas, Pattern Recognition. 1999, New York: Academic Press. 75. K.K. Truong. "Multilayer Kohonen image codebooks with a logarithmic search com-plexity." in Proceedings of the IEEE International Conference Acoustics, Speech, and Signal, 1991, Canada, pp. 2789-2792. 76. U. Maulik and S. Bandyopadhyay, "Performance evaluation of some clustering algorithms and validity indices," IEEE Transactions on Pattern Analysis and Machine Intelligence 2002, vol.24(12), pp. 1650-1654. 77. A. Ultsch, U*-Matrix: a Tool to visualize Clusters in high dimensional Data. 2004, Technical report 36, Computer Science Department, Philipps-University Marburg , Germany. 78. A. Ultsch and H.P. Siemon. "Kohonen''s Self Organizing Feature Maps for Exploratory Data Analysis." in Proceeding of International Neural Network Conference, 1990, Dordrecht, Netherlands, Kluwer, pp. 305-308. 79. I. Valova, "A parallel growing architecture for self-organizing maps with unsupervised learning," Neurocomputing, 2005, vol.68, pp. 177-195. 80. J. Vesanto, "SOM-based data visualization methods," Intelligent Data Analysis, 1999, vol.3(2), pp. 111-126. 81. J. Vesanto and E. Alhoniemi, "Clustering of the self-organizing map," IEEE Transactions on Neural Networks, 2000, vol.11(3), pp. 586-600. 82. J.A. Walter and K.J. Schulten, "Implementation of self-organizing neural networks for visuo-motor control of an industrial robot," IEEE Transactions on Neural Networks, 1993, vol.4(1), pp. 86-95. 83. W. Wang, Y. Yang, and R. Muntz. "STING: A Statistical Information grid Approach to Spatial Data Mining." in Proceedings of the 23rd VLDB Conference, 1997, pp. 186-195. 84. L. Wetmore, M.I. Heywood, and A.N. Zincir-Heywood, "Speeding up the self-organizing feature map using dynamic subset selection," Neural Processing Letters, 2005, vol.22, pp. 17-32. 85. S. Wu and T.W.S. Chow, "Clustering of the self-organizing map using a clustering validity index based on inter-cluster and intra-cluster density," Pattern Recognition, 2004, vol.37, pp. 175-188. 86. S. Wu and T.W.S. Chow, "PRSOM: A new visualization method by hybridizing multidimensional scaling and self-organizing map," IEEE Transactions on Neural Networks, 2005, vol.16(6), pp. 1362-1380. 87. X.L. Xie and G. Beni, "A validity measure for fuzzy clustering," IEEE Transactions on Pattern Analysis and Machine Intelligence, 1991, vol.13(8), pp. 841-847. 88. P. Xu and C.H. Chang. "Self-organizing topological tree." in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), 2004, pp. 732-735. 89. H. Yin, et al., "ViSOM: A novel method for multivariate data projection and structure visualization," IEEE Transactions on Neural Networks, 2002, vol.13(1), pp. 237-243. 90. L. Yun and K. Uchimura, "Using self-organizing map for road network extraction from Ikonos imagery," International Journal of Innovative Computing, Information and Control, 2007, vol.3(3), pp. 641-656. 91. L. Zhang, et al., "VizStruct: exploratory visualization for gene expression profiling," Bioinformatics, 2004, vol.20, pp. 85-92. 92. X. Zhang, et al., "Independent component analysis for image recovery using SOM based noise detection," IEICE Transaction on Fundamentals, 2007, vol.90(6), pp. 1125-1132.
摘要: 自我組織特徵映射圖(Self-Organizing Map, SOM)是一種運用非監督式的類神經網路,自我組織特徵映射圖能夠將高維度的資料藉由映射的方式對映至二維或一維的座標空間上,以利資料分群及視覺化。自我組織特徵映射圖是一種強而有力的資料分群的探勘工具。分群可說是非監督式學習領域裡最重要的一環,而分群效度(Clustering Validity) 為分群分析中重要的議題之一。在本論文中,我們將提出三個以自我組織映射圖為基礎的演算法,用來提升分群效率、準確度及視覺化的新方法。 在第一個演算法中,我們提出一個以參考點為基準的自我組織映射方法(Reference Point SOM,簡稱為RPSOM),利用參考點與兩個門檻值過濾計算的資料後,可得到較佳的運算時間。我們利用第一個門檻值當作利用輸入向量找優勝神經元的搜尋邊界值的參數,另一個門檻值則用來限制優勝神經元尋找鄰近區域神經元的有效範圍。這個方法將原來尋找初始神經元過程費時為O(n2) 的計算時間降為 O(n)。RPSOM在計算的複雜度上獲得戲劇性的改善,特別是計算大量資料集效果更顯著。從實驗結果的數據中,我們發現比起傳統的方法,利用 RPSOM可得到一個較好的初始映射圖,並且有較佳的計算效率。 在第二個演算法中,我們提出一個新的分群效度指標來得到一個準確度高的分群結果。這個分群效度指標包括各群間的分離度,各群間相對密度,與各群內部的緊密度。利用分群效度指標來找到最佳群聚數,並利用階層式的分群技術來判斷哪兩個群聚可將之合併成單一群。實驗數據顯示本方法產生的分群結果比傳統的自我組織圖產生的分群準確度高。 在第三個的方法中,是以自我組織圖為基礎,發展出一套新的資料分群與視覺化方法。此方法的主要步驟為:當資料集為未知資料類別標籤時,我們可以事先計算出最佳的群數,再將分群後的類別標籤設定給原先的各神經元。設定完畢之後,可利用此方法加強視覺化的效果,另一方面,當資料集為已被標示的資料集時,我們的視覺化方法可直接將分群的結構以加強的形式表示出來。實驗結果顯示出這個視覺化方法的效率與效果皆可呈現出資料的分佈狀況、神經元之間的距離,與分群的界線。因此我們提出的視覺化方法不只直覺上易於瞭解分群的結果,且對於未知資料類別標籤的資料集,有很好的視覺化效果。
A self-organizing map (SOM) is a very popular unsupervised neural network that uses the similarity of high-dimensional data in a two-dimensional or one-dimensional coordinate space to explore data clustering and visualization. SOMs are powerful tools for the exploratory of clustering methods. Clustering is the most important task in unsupervised learning and clustering validity is a major issue in cluster analysis. In this dissertation, we propose three SOM-based algorithms to improving clustering efficiency, accuracy, and visualization of SOM. In the first proposed SOM algorithm, we propose an efficient new Self-Organizing Map algorithm based on reference point and filters. A strategy called Reference Point SOM (RPSOM) is proposed to improve SOM execution time by means of filtering with two thresholds. We use a threshold value as the search boundary parameter used to search for the winner neuron with respect to input vectors. Another threshold value is used as the search boundary within which the winner neuron finds its neighbors. The proposed algorithm reduces the time complexity from O(n2) to O(n) in finding the initial neurons. The RPSOM achieves dramatically improvement in time complexity of computation, especially in the computation for large data set. From the experimental results, we find that it is better to construct a good initial map and then to improved execution time of the RPSOM is much better than traditional methods. In the second proposed SOM algorithm, a new clustering validity index is proposed to generate the clustering result of a two-level SOM. This clustering validity index includes the separation rate of inter-cluster, the relatively density of inter-cluster and the cohesion rate of intra-cluster. The clustering validity index is proposed to find the optimal number of clusters and determine which two neighboring clusters can be merged in a hierarchical clustering of a two-level SOM. Experiments show that, the proposed algorithm is able to cluster data more accurately than the classical clustering algorithms which is based on SOMs and is better able to find an optimal number of clusters by maximizing the clustering validity index. In the third proposed method, we develop a novel methodology for applications of data clustering and visualization, which is based on the SOM approach. The main process of our approach can be summarized as following. If the dataset is unlabeled, we calculate the best number of clusters in advance, and then assign the cluster labels to the neurons. After completed the cluster labels assignment, we apply the proposed method to enhance the effects of visualization. On the other hand, when the dataset is labeled, our visualization method will directly represent cluster structures in an enhanced visual form. The experimental results show that the proposed visualization method efficiently and effectively demonstrates the data distribution, inter-neuron distances, and cluster boundary. Therefore, our proposed visualization scheme is not only intuitively easy understanding of the clustering results, but also having good visualization effects on unlabeled data set.
其他識別: U0005-1901201013132900
Appears in Collections:資訊科學與工程學系所



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.