Please use this identifier to cite or link to this item:
標題: 一個有效的由上而下探勘頻繁項目的方法
An Efficient Top-Down Approach for Mining Frequent Patterns
作者: 林克仲
Lin, Ke-Chung
關鍵字: data mining
decomposition method
frequent itemset
出版社: 資訊科學與工程學系
引用: [References] [1]. Ramesh C. Agarwal, Charu C. Aggarwal, and V. V. V. Prasad, “Depth First Generation of Long Patterns,” Proceedings of the 6th ACM SIGKDD international conference on Knowledge discovery and data mining, 2000, pp. 108-118. [2]. R. Agrawal, T. Imielinski and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proceedings of the ACM SIGMOD Conference on Management of Data, 1993, pp. 207-216. [3]. R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proceedings of the 20th VLDB Conference, 1994, pp. 487-499. [4]. R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proceedings of the 11th International Conference on Data Engineering, 1995, pp. 3-14. [5]. R. Agrawal and R. Srikant, “Mining Sequential Patterns: Generalizations and Performance Improvements,” Proceedings of 5th International Conference on Extending Database Technology, 1996, pp. 3-17. [6]. R. Agrawal and J. C. Shafer, “Parallel Mining of Association Rules,” IEEE Transactions on Knowledge and Data Engineering, Vol. 8, Issue 6, 1996, pp. 962-969. [7]. J. Berry, “Database Marketing: A Poent New Tools for Selling”, Business Week, September 5, 1994, pp. 56. [8]. D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu, “MAFIA: A Maximal Frequent Itemset Algorithm,” IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 11, 2005, pp. 1490-1504. [9]. P. Cabeba, P. Hadjinian, R. Stadler, J. Verhees, and A. Zanasi, “Discovering Data Mining from Concept to Implementation,” Prentice Hall PTR, 1998. [10]. C. J. Chen, “A Pre-fetch Mechanism for Proxy Servers: Using Association Rules,” Master Thesis, Institute of Computer Science, National Chung-Hsing University, 1998. [11]. Y. L. Chen, S. S. Chen, and P. Y. Hsu, “Mining Hybrid Sequential Patterns and Sequential Rules,” Information Systems, Vol. 27, No. 5, 2002, pp. 345-362. [12]. D. Cheung, “A Fast Distributed Algorithm for Mining Association Rules,” Proceedings of the 4th International Conference on Parallel and Distributed Information Systems, 1996, pp. 48-60. [13]. A. Ghoting, G. Buehrer, S. Parthasarathy, D. Kim, A. Nguyen, Y. Chen, and P. Dubey, “Cache-Conscious Frequent Pattern Mining on Modern and Emerging Processors,” The International Journal on Very Large Data Bases, Vol. 16, Issue 1, 2007, pp. 77-96. [14]. B. Goethals, “Memory Issues in Frequent Itemset Mining,” Proceedings of the ACM Symposium on Applied Computing, 2004, pp. 530-534. [15]. Karam Gouda, Mosab Hassaan, and Mohammed J. Zaki, “PRISM: An Effective Approach for Frequent Sequence Mining via Prime-Block Encoding,” Journal of Computer and Systems Sciences, 2010, 76(1):88-102. [16]. Go¨sta Grahne and Jianfei Zhu, “Fast Algorithms for Frequent Itemset Mining Using FP-trees,” IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 10, 2005, pp. 1347-1362. [17]. J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns Without Candidate Generation,” Proceedings of the 2000 ACM SIGMOD International Conference on Management of data, 2000, pp. 1-12. [18]. J. Han, J. Pei, Y. Yin, and R. Mao, “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach,” Data Mining and Knowledge Discovery, Vol. 8, Issue 1, 2004, pp. 53-87. [19]. J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. -C. Hsu, “Freespan: Frequent Pattern-Projected Sequential Pattern Mining,” Proceedings of International Conference on Knowledge Discovery and Data Mining, 2000, pp. 355-359. [20]. E. H. Han, G. Karypis, and V. Kumar, “Scalable Parallel Data Mining for Association Rules,” IEEE Transactions on Knowledge and Data Engineering, Vol. 8, Issue 6, 2000, pp. 337-352. [21]. Jen-Wei Huang, Su-Chen Lin, and Ming-Syan Chen, “DPSP: Distributed Progressive Sequential Pattern Mining on the Cloud,” Proceedings of PAKDD (2), 2010, pp. 27-34. [22]. Kuen-Fang Jea, Ke Chung Lin, and I-En Liao, “Mining Hybrid Sequential Patterns by Hierarchical Mining Technique,” International Journal of Innovative Computing, Information and Control, Vol. 5, No. 8, August 2009, pp.2351-2367. [23]. H. F. Li and S. Y. Lee, “Mining Frequent Itemsets over Data Streams using Efficient Window Sliding Techniques,” Expert Systems with Applications, 2009, pp. 1466-1477. [24]. I-En Liao, Ke-Chung Lin, and Hong-Bin Chen, “Mining Frequent Itemsets by Transaction Decomposition with Itemset Clustering,” Proceedings of DMIN''2009, pp. 18~24. [25]. Dao-I Lin and Zvi M. Kedem, “Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set,” IEEE Transactions on Knowledge and Data Engineerin, Vol. 14, No. 3, 2002, pp. 553-566. [26]. Ke-Chung Lin, I-En Liao, and Zhi-Sheng Chen, “An Improved Frequent Pattern Growth Method for Mining Association Rules,” Expert Systems with Applications, 38, 2011, pp. 5154-5161. [27]. Ming-Yen Lin and Suh-Yin Lee, “Fast discovery of sequential patterns through memory indexing and database partitioning,” Journal Information Science and Engineering, Vol. 21, No. 1, 2005, pp. 109-128. [28]. J. Liu, Y. Pan, K. Wang, and J. Han, “Mining frequent item sets by opportunistic projection,” Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, 2002, pp. 229-238. [29]. J. S. Park, M. S. Chen, and P. S. Yu, “An Effective Hash Based Algorithm for Mining Association Rules,” Proceedings of the ACM SIGMOD Conference, 1995, pp. 175-186. [30]. J. S. Park, M. S. Chen, and P. S. Yu, “Efficient Parallel Data Mining for Association Rules,” Proceedings of the ACM Information and Knowledge Management International Conference, 1995, pp. 31-36. [31]. J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal and M.-C. Hsu, Mining sequential patterns by pattern-growth: the prefixspan approach, IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 11, 2004, pp. 1424-1440. [32]. S. Orlando, C. Lucchese, P. Palmerini, R. Perego, and F. Silvestri, “kDCI: a Multi-Strategy Algorithm for Mining Frequent Sets,” Proceedings of IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2003. [33]. B. Racz, “nonordfp: An FP-growth variation without rebuilding the FP-tree,” Proceedings of IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2004. [34]. G. T. Raju, P. S. Satyanarayana and L. M. Patnaik, Knowledge discovery from web usage data:extraction and applications of sequential and clustering patterns - a survey, International Journal of Innovative Computing, Information and Control, Vol. 4, No. 2, 2008, pp. 381-389. [35]. Mingjun Song and Sanguthevar Rajasekaran, “Finding Frequent Itemsets by Transaction Mapping,” Proceedings of the 2005 ACM symposium on Applied computing, 2005, pp. 498-492. [36]. Ja-Hwung Su and Wen-Yang Lin, “CBW: An Efficient Algorithm for Frequent Itemset Mining,” Proceedings of the 37th Annual Hawaii International Conference on System science, 2004, pp. 30064c. [37]. Y. J. Tsay and Y. W. Chang-Chien, “An efficient cluster and decomposition algorithm for mining association rules,” Information Sciences, Vol. 160, Issue 1-4, 2004, pp. 161-171. [38]. Y. K. Woon and W. K. Ng, “A Support-Ordered Trie for Fast Frequent Itemset Discovery,” IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 7, 2004, pp. 875-879. [39]. Y. K. Woon, W. K. Ng, and A. Das, “Fast Online Dynamic Association Rule Mining,” Proceeding of the Second Int'l Conf. Web Information System Engineering, 2001, pp. 278-287. [40]. S. Y. Wru and Y. Leu, “An effective boolean algorithm for mining association rules in large databases,” Proceedings of the 6th International Conference on Database System for Advanced Applications, 1999, pp. 179-186. [41]. D. L. Yang, C. T. Pan, and Y. C. Chung, “An Efficient Hash-Based Method for Discovering the Maximal Frequent Set,” Proceedings of the 25th Annual International Computer Software and Applications Conference, 2001, pp. 511-516. [42]. M. J. Zaki, “Scalable Algorithms for Association Mining,” IEEE Transactions on Knowledge and Data Engineering, Vol. 12, Issue 3, 2000, pp. 372-390. [43]. M. J. Zaki, “SPADE: An Efficient Algorithm for Mining Frequent Sequences,” Machine Learning Journal, Vol. 42, No. 1, 2001, pp. 31-60. [44]. [45].
摘要: 在此篇論文中我們提出兩個重要的方式去解決混合式高頻序列探勘及高頻項目集探勘的問題。混合式高頻序列及高頻項目集這兩種技術的重要性在於可以提供市場分析裡所需要的重要資訊。而當所需要處理的資料及複雜度增加時,高效率的計算則成為了這兩種技術不可或缺的考量,故在此論文中,我們首先提出一個階層式的探勘技術來解決效率上的問題。在這個階層式探勘技術中主要包含兩種技術,第一是利用計數一個序列來取代計數一個群組中所有序列的概念,第二是利用交易分解的方式來計數支持度。而後,我們又提出另一種改良交易分解的演算法來探勘高頻項目集,利用分群的方式將資料庫內的所有項目集分成不同的群組,再在這些群組內執行交易分解的方法來探勘出所有的高頻項目集。從實驗數據顯示,我們提出的方法在執行效率上都有很優良的表現。 我們在此論文中一共有四個貢獻,第一,我們提出利用一個序列來取代一整個群組中的所有序列,當此一序列被計數時,同時也計數到了群組中的所有序列;第二,我們提出了一種交易分解的探勘方式,利用此種探勘方式,我們可以降低I/O的成本輸出;第三,我們在論文中証明了交易分解的正確性;最後,我們提出了改良交易分解的另一演算法,並實作其他演算法來和我們所提出的演算法進行效率上的比較。
In this thesis, we propose two important methods to deal with the problems of mining hybrid sequential patterns and mining frequent itemsets. Hybrid sequential patterns and frequent itemsets are two mining techniques which provide useful information for the improvement of the analysis of marketing strategies. High-performance computing is crucial for these two techniques in order to ensure system scalability when the size and complexity of the database increase. We first propose a hierarchical mining technique to deal with the performance issue. The unique features of this new technique include the following: counting hybrid sequential patterns by class and examining database transactions in a top-down manner. Further, in order to improve the space problem in the transaction decomposition method, we propose another algorithm that combines the transaction decomposition method and the clustering technique. The improved algorithm separates the lattice into small pieces so that each portion can be solved by the decomposition method independently. Experimental results show that this improved algorithm has a fast execution time with respect to the mining of a transaction database. In summary, this thesis makes four major contributions to the field of data mining. First, a new method to count a group of patterns simultaneously is provided by the proposed pattern-class concept. Second, a novel decomposition model to lower the I/O cost for counting the patterns from a large database is proposed. Third, the correctness of counting the patterns in the decomposition method is proved. Lastly, an algorithm for improving the decomposition method is introduced, and the performance comparison of the proposed algorithm with the existing algorithms is presented in this thesis.
其他識別: U0005-0902201109083300
Appears in Collections:資訊科學與工程學系所



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.