Please use this identifier to cite or link to this item:
標題: 基於項目集分群的交易拆解方式進行高頻項目集探勘
Mining Frequent Itemsets by Transaction Decomposition with Itemset Clustering
作者: 陳宏賓
Chen, Hong-Bin
關鍵字: Frequent Itemsets Mining
Transaction Decomposition
Top-Down Mining
Itemset Clustering
出版社: 資訊科學與工程學系所
引用: [1] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. of the ACM SIGMOD Conference on Management of Data, pp. 207-216, 1993. [2] R. Agrawal and R. Srikant, “Fast Algorithm for Mining Association Rules in Large Databases,” Proc. of 20th VLDB Conference, pp. 487-499, 1994. [3] R. Agrawal and R. Srikant, “Mining Sequential Patterns: Generalizations and Performance Improvements,” Proc. of the 5th International Conference on Extending Database Technology, pp. 3-14, 1996. [4] D. Lin and Z. M. Kedem, “Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set,” IEEE Transactions on Knowledge and Data Engineering, Vol. 14, No. 3, pp. 553-566, 2002. [5] Kuen-Fang Jea, Ke-Chung Lin, and I-En Liao, “Mining Hybrid Sequential Patterns by Hierarchical Mining Technique,” International Journal of Innovative Computing, Information and Control (IJICIC), Vol. 5, No.8, August 2009 [6] Jiawei Han, Jian Pei, and Yiwen Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. of the ACM-SIGMOD Conference Management of Data, pp. 1-12, 2000. [7] Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao, “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach,” Data Mining and Knowledge Discovery, Vol. 8, Issue 1, pp. 53-87, 2004. [8] J. S. Park, M. S. Chen, and P.S. Yu, “Using a Hash-based Method with Transaction Trimming for Mining Association Rules,” IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 5, pp. 813-825, 1997. [9] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” ACM-SIGMOD Conference Management of Data, pp. 255-264, 2005. [10] Mingjun Song and Sanguthevar Rajasekaran, “Finding Frequent Itemsets by Transaction Mapping,” Proc. of the 2005 ACM symposium on Applied computing, pp. 498-492, 2005. [11] S. Orlando, P. Palmerini, R. Perego, and F. Silvestri, “Adaptive and Resource-Aware Mining of Frequent Sets,” In Proc. The 2002 IEEE International Conference on Data Mining(ICDM' 02), pp.338-345, 2002. [12] S. Orlando, C. Lucchese, P. Palmerini, R. Perego, and F. Silvestri, “kDCI: a Multi-Strategy Algorithm for Mining Frequent Sets,” Proc. of IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2003. [13] B. Racz, “nonordfp: An FP-growth variation without rebuilding the FP-tree,” Proc. of IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2004. [14] A. Ghoting, G. Buehrer, S. Parthasarathy, D. Kim, A. Nguyen, Y. Chen, and P. Dubey, “Cache-Conscious Frequent Pattern Mining on Modern and Emerging Processors,” The International Journal on Very Large Data Bases, Vol. 16, Issue 1, pp. 77-96, 2007. [15] Li Liu, Eric Li, Yimin Zhang, and Zhizhong Tang, “Optimization of Frequent Itemset Mining on Multiple-Core Processor,” In Proc. of VLDB'2007, pp 1275-1285, 2007. [16] Gosta Grahne and Jianfei Zhu, “Fast Algorithms for Frequent Itemset Mining Using FP-trees,” IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 10, pp. 1347-1362, 2005. [17] Y. J. Tsay and Y. W. Chang-Chien, “An efficient cluster and decomposition algorithm for mining association rules,” Information Sciences-Informatics and Computer Science,Vol. 160, Issue 1-4, pp. 161-171, 2004. [18] G. D. Mulligan and D. G. Corneil, “Corrections to Bierstone's Algorithm for Generating Cliques,” In J. Association of Computing Machinery, Vol. 19, No. 2, pp. 244-247, 1972. [19] M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, “New Algorithms for Fast Discovery of Association Rules,” Proc. of 3rd Knowledge Discovery & Data Mining Conference, pp. 283-286, 1997. [20] D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu, “MAFIA: A Maximal Frequent Itemset Algorithm,” IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 11, pp. 1490-1504, 2005. [21], Accessed 10 Nov. 2008 [22] , Accessed 10 Nov. 2008
摘要: 關聯規則探勘(Association Rule Mining),是目前最重要的資料挖掘問題之一,它的目的是要從銷售的交易資料庫中,發現商品項目間的關聯性,進而提供決策者作為參考的依據。而在探勘關聯規則的問題中最重要且最複雜的一個步驟是從資料庫中找出所有的高頻項目集。過去本實驗室曾經提出一個探勘高頻項目集的演算法,稱為TDM(Top-Down Mining)。其TDM主要特性是由上而下地利用交易拆解及傳遞支持度的方式來探勘高頻項目集,可以將掃瞄資料庫的次數降低到只需要掃瞄兩次,並有效率的探勘出高頻項目集。然而TDM在拆解交易時可能會拆解出過多的候選項目集而造成空間上的問題,而導致TDM無法有效地在資料庫中進行支持度傳遞方式的探勘。 在本論文中,我們提出了一個改進TDM的演算法,稱為TDC(Transaction Decomposition with Itemset Clustering)。其主要的特色是將原始的絡(lattice)切割成較小的部分,使其每個較小的部分可以被獨立的利用項目集拆解方式所處理,如此可以改進TDM所造成的空間問題,進而提高整體探勘高頻項目集的效率。 根據我們的實驗結果顯示,TDC能解決TDM中所產生的空間問題,且TDC在最小支持度較小的情況下與資料庫的交易筆數龐大時,仍能有效地從資料庫中探勘出所有的高頻項目集。
Association rules mining is one of the most important data mining techniques. It is useful for discovering relationships among different items to provide information for policy-maker. The most important and complex step in mining association rules is to discover all frequent itemsets form transaction database. In our previous studies, we had proposed an association rules mining algorithm called TDM (Top-Down Mining). TDM has a unique feature that examines database itemsets in a top-down manner using transaction decomposition. Moreover, the number of database scanning is reduced to at most two to efficiently discover frequent itemsets. However, the transaction decomposition method may incur a space problem while it decomposes too many sub-patterns. Thus, TDM can't efficiently discover frequent itemsets if the length of a transaction is too long. In this thesis, we propose an algorithm called TDC (Transaction Decomposition with Itemset Clustering) to remedy the problem of TDM. The major feature of TDC is that it divides the original transaction lattice into smaller pieces such that each portion can be solved by decomposition method independently. Thus, it can alleviate the space problem to discover frequent itemsets efficiently. According to our experimental results, TDC does alleviate the space problem incurred in TDM, and it can discover all frequent itemsets efficiently when the minimum support is low and database is huge.
其他識別: U0005-2707200913090000
Appears in Collections:資訊科學與工程學系所



Show full item record
TAIR Related Article

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.