Please use this identifier to cite or link to this item:
標題: 一個有效的由上而下探勘頻繁項目的方法
An Efficient Top-Down Approach for Mining Frequent Patterns
作者: 林克仲
Lin, Ke-Chung
關鍵字: data mining;資料探勘;decomposition method;frequent itemset;交易分解;高頻項目集
出版社: 資訊科學與工程學系
引用: [References] [1]. Ramesh C. Agarwal, Charu C. Aggarwal, and V. V. V. Prasad, “Depth First Generation of Long Patterns,” Proceedings of the 6th ACM SIGKDD international conference on Knowledge discovery and data mining, 2000, pp. 108-118. [2]. R. Agrawal, T. Imielinski and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proceedings of the ACM SIGMOD Conference on Management of Data, 1993, pp. 207-216. [3]. R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proceedings of the 20th VLDB Conference, 1994, pp. 487-499. [4]. R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proceedings of the 11th International Conference on Data Engineering, 1995, pp. 3-14. [5]. R. Agrawal and R. Srikant, “Mining Sequential Patterns: Generalizations and Performance Improvements,” Proceedings of 5th International Conference on Extending Database Technology, 1996, pp. 3-17. [6]. R. Agrawal and J. C. Shafer, “Parallel Mining of Association Rules,” IEEE Transactions on Knowledge and Data Engineering, Vol. 8, Issue 6, 1996, pp. 962-969. [7]. J. Berry, “Database Marketing: A Poent New Tools for Selling”, Business Week, September 5, 1994, pp. 56. [8]. D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu, “MAFIA: A Maximal Frequent Itemset Algorithm,” IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 11, 2005, pp. 1490-1504. [9]. P. Cabeba, P. Hadjinian, R. Stadler, J. Verhees, and A. Zanasi, “Discovering Data Mining from Concept to Implementation,” Prentice Hall PTR, 1998. [10]. C. J. Chen, “A Pre-fetch Mechanism for Proxy Servers: Using Association Rules,” Master Thesis, Institute of Computer Science, National Chung-Hsing University, 1998. [11]. Y. L. Chen, S. S. Chen, and P. Y. Hsu, “Mining Hybrid Sequential Patterns and Sequential Rules,” Information Systems, Vol. 27, No. 5, 2002, pp. 345-362. [12]. D. Cheung, “A Fast Distributed Algorithm for Mining Association Rules,” Proceedings of the 4th International Conference on Parallel and Distributed Information Systems, 1996, pp. 48-60. [13]. A. Ghoting, G. Buehrer, S. Parthasarathy, D. Kim, A. Nguyen, Y. Chen, and P. Dubey, “Cache-Conscious Frequent Pattern Mining on Modern and Emerging Processors,” The International Journal on Very Large Data Bases, Vol. 16, Issue 1, 2007, pp. 77-96. [14]. B. Goethals, “Memory Issues in Frequent Itemset Mining,” Proceedings of the ACM Symposium on Applied Computing, 2004, pp. 530-534. [15]. Karam Gouda, Mosab Hassaan, and Mohammed J. Zaki, “PRISM: An Effective Approach for Frequent Sequence Mining via Prime-Block Encoding,” Journal of Computer and Systems Sciences, 2010, 76(1):88-102. [16]. Go¨sta Grahne and Jianfei Zhu, “Fast Algorithms for Frequent Itemset Mining Using FP-trees,” IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 10, 2005, pp. 1347-1362. [17]. J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns Without Candidate Generation,” Proceedings of the 2000 ACM SIGMOD International Conference on Management of data, 2000, pp. 1-12. [18]. J. Han, J. Pei, Y. Yin, and R. Mao, “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach,” Data Mining and Knowledge Discovery, Vol. 8, Issue 1, 2004, pp. 53-87. [19]. J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. -C. Hsu, “Freespan: Frequent Pattern-Projected Sequential Pattern Mining,” Proceedings of International Conference on Knowledge Discovery and Data Mining, 2000, pp. 355-359. [20]. E. H. Han, G. Karypis, and V. Kumar, “Scalable Parallel Data Mining for Association Rules,” IEEE Transactions on Knowledge and Data Engineering, Vol. 8, Issue 6, 2000, pp. 337-352. [21]. Jen-Wei Huang, Su-Chen Lin, and Ming-Syan Chen, “DPSP: Distributed Progressive Sequential Pattern Mining on the Cloud,” Proceedings of PAKDD (2), 2010, pp. 27-34. [22]. Kuen-Fang Jea, Ke Chung Lin, and I-En Liao, “Mining Hybrid Sequential Patterns by Hierarchical Mining Technique,” International Journal of Innovative Computing, Information and Control, Vol. 5, No. 8, August 2009, pp.2351-2367. [23]. H. F. Li and S. Y. Lee, “Mining Frequent Itemsets over Data Streams using Efficient Window Sliding Techniques,” Expert Systems with Applications, 2009, pp. 1466-1477. [24]. I-En Liao, Ke-Chung Lin, and Hong-Bin Chen, “Mining Frequent Itemsets by Transaction Decomposition with Itemset Clustering,” Proceedings of DMIN''2009, pp. 18~24. [25]. Dao-I Lin and Zvi M. Kedem, “Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set,” IEEE Transactions on Knowledge and Data Engineerin, Vol. 14, No. 3, 2002, pp. 553-566. [26]. Ke-Chung Lin, I-En Liao, and Zhi-Sheng Chen, “An Improved Frequent Pattern Growth Method for Mining Association Rules,” Expert Systems with Applications, 38, 2011, pp. 5154-5161. [27]. Ming-Yen Lin and Suh-Yin Lee, “Fast discovery of sequential patterns through memory indexing and database partitioning,” Journal Information Science and Engineering, Vol. 21, No. 1, 2005, pp. 109-128. [28]. J. Liu, Y. Pan, K. Wang, and J. Han, “Mining frequent item sets by opportunistic projection,” Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, 2002, pp. 229-238. [29]. J. S. Park, M. S. Chen, and P. S. Yu, “An Effective Hash Based Algorithm for Mining Association Rules,” Proceedings of the ACM SIGMOD Conference, 1995, pp. 175-186. [30]. J. S. Park, M. S. Chen, and P. S. Yu, “Efficient Parallel Data Mining for Association Rules,” Proceedings of the ACM Information and Knowledge Management International Conference, 1995, pp. 31-36. [31]. J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal and M.-C. Hsu, Mining sequential patterns by pattern-growth: the prefixspan approach, IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 11, 2004, pp. 1424-1440. [32]. S. Orlando, C. Lucchese, P. Palmerini, R. Perego, and F. Silvestri, “kDCI: a Multi-Strategy Algorithm for Mining Frequent Sets,” Proceedings of IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2003. [33]. B. Racz, “nonordfp: An FP-growth variation without rebuilding the FP-tree,” Proceedings of IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2004. [34]. G. T. Raju, P. S. Satyanarayana and L. M. Patnaik, Knowledge discovery from web usage data:extraction and applications of sequential and clustering patterns - a survey, International Journal of Innovative Computing, Information and Control, Vol. 4, No. 2, 2008, pp. 381-389. [35]. Mingjun Song and Sanguthevar Rajasekaran, “Finding Frequent Itemsets by Transaction Mapping,” Proceedings of the 2005 ACM symposium on Applied computing, 2005, pp. 498-492. [36]. Ja-Hwung Su and Wen-Yang Lin, “CBW: An Efficient Algorithm for Frequent Itemset Mining,” Proceedings of the 37th Annual Hawaii International Conference on System science, 2004, pp. 30064c. [37]. Y. J. Tsay and Y. W. Chang-Chien, “An efficient cluster and decomposition algorithm for mining association rules,” Information Sciences, Vol. 160, Issue 1-4, 2004, pp. 161-171. [38]. Y. K. Woon and W. K. Ng, “A Support-Ordered Trie for Fast Frequent Itemset Discovery,” IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 7, 2004, pp. 875-879. [39]. Y. K. Woon, W. K. Ng, and A. Das, “Fast Online Dynamic Association Rule Mining,” Proceeding of the Second Int'l Conf. Web Information System Engineering, 2001, pp. 278-287. [40]. S. Y. Wru and Y. Leu, “An effective boolean algorithm for mining association rules in large databases,” Proceedings of the 6th International Conference on Database System for Advanced Applications, 1999, pp. 179-186. [41]. D. L. Yang, C. T. Pan, and Y. C. Chung, “An Efficient Hash-Based Method for Discovering the Maximal Frequent Set,” Proceedings of the 25th Annual International Computer Software and Applications Conference, 2001, pp. 511-516. [42]. M. J. Zaki, “Scalable Algorithms for Association Mining,” IEEE Transactions on Knowledge and Data Engineering, Vol. 12, Issue 3, 2000, pp. 372-390. [43]. M. J. Zaki, “SPADE: An Efficient Algorithm for Mining Frequent Sequences,” Machine Learning Journal, Vol. 42, No. 1, 2001, pp. 31-60. [44]. [45].

In this thesis, we propose two important methods to deal with the problems of mining hybrid sequential patterns and mining frequent itemsets. Hybrid sequential patterns and frequent itemsets are two mining techniques which provide useful information for the improvement of the analysis of marketing strategies. High-performance computing is crucial for these two techniques in order to ensure system scalability when the size and complexity of the database increase. We first propose a hierarchical mining technique to deal with the performance issue. The unique features of this new technique include the following: counting hybrid sequential patterns by class and examining database transactions in a top-down manner. Further, in order to improve the space problem in the transaction decomposition method, we propose another algorithm that combines the transaction decomposition method and the clustering technique. The improved algorithm separates the lattice into small pieces so that each portion can be solved by the decomposition method independently. Experimental results show that this improved algorithm has a fast execution time with respect to the mining of a transaction database.
In summary, this thesis makes four major contributions to the field of data mining. First, a new method to count a group of patterns simultaneously is provided by the proposed pattern-class concept. Second, a novel decomposition model to lower the I/O cost for counting the patterns from a large database is proposed. Third, the correctness of counting the patterns in the decomposition method is proved. Lastly, an algorithm for improving the decomposition method is introduced, and the performance comparison of the proposed algorithm with the existing algorithms is presented in this thesis.
其他識別: U0005-0902201109083300
Appears in Collections:資訊科學與工程學系所

Show full item record

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.