Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/98251
標題: 於Spark上實作一個有效率的具有基底層的珊瑚礁最佳化演算法並使用在分群問題上
A High Performance Coral Reefs Optimization with Substrate Layers for Clustering Problem on Spark
作者: 王翊仲 
Yi-Chung Wang 
關鍵字: 資料分群;超啟發式演算法;珊瑚礁最佳化演算法;data clustering;metaheuristic algorithm;coral reef optimization
引用: [1] R. G. Baraniuk, 'More is less: Signal processing and the data deluge,' Science, vol. 331, no. 6018, pp. 717–719, 2011. [2] D. Laney, '3-D Data Management: Controlling Data Volume, Velocity, and Variety,' META Group Research Note, Available at https://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf, 2001 [3] R. Xu and D. Wunsch, 'Survey of clustering algorithms,' IEEE Transactions on Neural Networks, vol. 16, no. 3, pp. 645–678, 2005. [4] A. K. Jain, 'Data clustering: 50 years beyond k-means,' Pattern Recognition Letters, vol. 31, no. 8, pp. 651–666, 2010. [5] Fred Glover, 'Future paths for integer programming and links to artificial intelligence,' Computers & Operations Research, vol. 13, no. 5, pp. 533–549, 1986. [6] K. F. Man, K. S. Tang, and S. Kwong, 'Genetic algorithms: Concepts and applications [in engineering design],' IEEE Transactions on Industrial Electronics, vol. 43, no. 5, pp. 519–534, 1996. [7] M. Dorigo, M. Birattari and T. Stutzle, 'Ant colony optimization,' IEEE Computational Intelligence Magazine, vol. 1, no. 4, pp. 28-39, 2006. [8] K. Krishna and M. N. Murty, 'Genetic k-means algorithm,' IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 29, no. 3, pp. 433–439, 1999. [9] D. W. van der Merwe and A. P. Engelbrecht, 'Data clustering using particle swarm optimization,' in Proceedings of the Evolutionary Computation, vol. 1, pp. 215–220, 2003. [10] T. Sarazin, H. Azzag, and M. Lebbah, 'SOM clustering using spark-mapreduce,' in Proceedings of the IEEE International Parallel Distributed Processing Symposium Workshops, pp. 1727–1734, 2014. [11] B. Wang, J. Yin, Q. Hua, Z. Wu, and J. Cao, 'Parallelizing k-means-based clustering on spark,' in Proceedings of the International Conference on Advanced Cloud and Big Data, pp. 31–36, 2016. [12] 'Apache Hadoop,' 2018, available at http://hadoop.apache.org/. [13] 'Apache Spark,' 2018, available at https://spark.apache.org/. [14] S. Salcedo-Sanz, C. Camacho-Gmez, D. Molina, and F. Herrera, 'A coral reefs optimization algorithm with substrate layers and local search for large scale global optimization,' in Proceedings of the IEEE Congress on Evolutionary Computation, pp. 3574–3581, 2016. [15] S. Salcedo-Sanz, J. D. Ser, S. Gil-L´opez, I. Landa-Torres, and J. A. Portilla-Figueras, 'The coral reefs optimization algorithm: An efficient meta-heuristic for solving hard optimization problems,' in Proceedings of the Applied Stochastic Models and Data Analysis International Conference, pp. 751–758, 2013. [16] S. Salcedo-Sanz, J. D. Ser, I. Landa-Torres, S. Gil-Lpez, and J. A. Portilla-Figueras, 'The coral reefs optimization algorithm: A novel meta-heuristic for efficiently solving optimization problems,' The Scientific World Journal, vol. 2014, pp. 1–15, 2014. [17] I. G. Medeiros, J. C. Xavier and A. M. P. Canuto, 'Applying the Coral Reefs Optimization algorithm to clustering problems,' in Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp. 1-8, 2015. [18] C.-W. Tsai, K.-W. Huang, C.-S. Yang, and M.-C. Chiang, 'A fast particle swarm optimization for clustering,' Soft Computing, pp. 1–18, 2014. [19] Ashish Kumar Tripathi, Kapil Sharma, Manju Bala, 'A Novel Clustering Method Using Enhanced Grey Wolf Optimizer and MapReduce,' Big Data Research, 2018 (Accepted). [20] Anan Banharnsakun, 'A MapReduce-based artificial bee colony for large-scale data clustering,' Pattern Recognition Letters, vol. 93, pp. 78–84, 2018. [21] Y. Lu, B. Cao, C. Rego, and F. Glover, 'A tabu search based clustering algorithm and its parallel implementation on spark,' Applied Soft Computing, vol. 63, pp. 97 – 109, 2018. [22] M. Daoudi, S. Hamena, Z. Benmounah, and M. Batouche, 'Parallel diffrential evolution clustering algorithm based on mapreduce,' in Proceedings of the International Conference of Soft Computing and Pattern Recognition, pp. 337–341, 2014. [23] T. Ashish, S. Kapil, and B. Manju, 'Parallel bat algorithm-based clustering using mapreduce,' in Proceedings of the Networking Communication and Data Knowledge Engineering. Springer Singapore, pp. 73–82, 2018. [24] C.-W. Tsai, S.-J. Liu, and Y.-C. Wang, 'A parallel metaheuristic data clustering framework for cloud,' Journal of Parallel and Distributed Computing, vol. 116, pp. 39–49, 2018. [25] C.-W. Tsai, H.-C. Chang, K.-C. Hu, and M.-C. Chiang, 'Parallel coral reef algorithm for solving JSP on spark,' in Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, pp. 001 872–001 877, 2016. [26] D. W. Huang and J. Lin, 'Scaling populations of a genetic algorithm for job shop scheduling problems using mapreduce,' in Proceedings of the IEEE Second International Conference on Cloud Computing Technology and Science, pp. 780–785, 2010. [27] E.-S. M. El-Alfy and M. A. Alshammari, 'Towards scalable rough set based attribute subset selection for intrusion detection using parallel genetic algorithm in mapreduce,' Simulation Modelling Practice and Theory, vol. 64, pp. 18 – 29, 2016. [28] J. Chang, 'Modified particle swarm optimization for solving traveling salesman problem based on a Hadoop MapReduce framework,' in Proceedings of the International Conference on Applied System Innovation (ICASI), pp. 1–4, 2016. [29] B. Wu, G. Wu, and M. Yang, 'A mapreduce based ant colony optimization approach to combinatorial optimization problems,' in Proceedings of the International Conference on Natural Computation, pp. 728–732, 2012. [30] D. Gaifang, F. Xueliang, L. Honghui, and X. Pengfei, 'Cooperative ant colony-genetic algorithm based on spark,' Computers and Electrical Engineering, vol. 60, pp. 66 – 75, 2017. [31] D. Teijeiro, X. C. Pardo, P. Gonz´alez, J. R. Banga, and R. Doallo, 'Implementing parallel differential evolution on spark,' in Proceedings of the Applications of Evolutionary Computation. Springer International Publishing, pp. 75–90, 2016. [32] 'User locations until 2012 (finland),' 2018, available at http://cs.uef.fi/mopsi/data/. [33] D. Dheeru and E. Karra Taniskidou, 'UCI machine learning repository,' 2017. [Online]. Available: http://archive.ics.uci.edu/ml [34] 'Radviz,' 2018, available at https://cran.r-project.org/web/packages/Radviz/vignettes/singlecellprojections.html.
摘要: 
近幾年中有許多研究結果顯示,從日常所搜集到的大量資料中,可能找出有價值的資訊。人類可藉此得到更多的輔助,以做出更適當的決策。所以發展一個巨量資料分析系統,在現代成為一個熱門的研究議題。在資料探勘 (data mining) 的領域中,如何分析這些數據被認為是非常有前景的研究。資料分群 (data clustering) 為其中代表性的研究之一,它常被用在將未知的資料做分類。近期許多研究嘗試使用超啟發式演算法 (metaheuristic algorithm) 來解決資料分群問題,進而得到比傳統分群方法更好的結果。在本篇論文中,我們基於具有基底層的珊瑚礁最佳化演算法 (coral reefs optimization with substrate layers) 進行改良,進一步提出了一個高性能分群演算法,並將其實作在Apache Spark平台之上,以減少該演算法的計算時間。在本篇論文的實驗中,我們使用平方誤差總和 (sum of squared errors) 當作衡量標準,並將該演算法與k-means演算法、基因k-means演算法 (genetic k-means algorithm)、粒子群最佳化演算法 (particle swarm optimization),以及一般的珊瑚礁最佳化演算法 (coral reefs optimization) 進行比較,比較結果表明,本篇論文所提出的方法,可以有效減少演算法的計算時間,以及找到比其他分群演算法更佳的分群結果。

Since many successful results in recent years show that it is possible to find out valuable information from daily data. Human can make more appropriate decisions by these valuable information. To develop a 'good' data analysis system for data deluge has become a popular research topic. How to analyze such data has been a promising research in data mining. Data clustering is a representative research topic because its solution can be used to classify the unknown data without prior knowledge. Several recent studies attempted to use metaheuristic algorithms to solve clustering problems, and most of them provide a high-quality result than traditional clustering algorithms. In this paper, we present a high performance clustering algorithm based on coral reefs optimization with substrate layers (CRO-SL). To reduce the computation time of the proposed algorithm, we also have implemented it on Apache Spark. In experimental results, we compare the proposed algorithm with k-means algorithm, genetic k-means algorithm (GKA), particle swarm optimization (PSO), and simple CRO algorithm in terms of the sum of squared errors (SSE). The simulation results show that the proposed algorithm can reduce the computation time significantly and also can provide a better clustering result than the other clustering algorithms.
URI: http://hdl.handle.net/11455/98251
Rights: 同意授權瀏覽/列印電子全文服務,2018-11-29起公開。
Appears in Collections:資訊科學與工程學系所

Files in This Item:
File SizeFormat Existing users please Login
nchu-107-7105056095-1.pdf779.29 kBAdobe PDFThis file is only available in the university internal network    Request a copy
Show full item record
 

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.