Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/19695
標題: 一個改善利用排容原理探勘高頻項目集的演算法
An Algorithm for Improving the Efficiency of Frequent Itemset Mining Based on Principle of Inclusion-Exclusion
作者: 戴昱銓
Tai, Yu-Chuan
關鍵字: Frequent Itemsets Mining;高頻項目集探勘;Principle of Inclusion and Exclusion;Coded Pattern Tree;Ordered tree;排容原理;樣式編碼樹;已排序樹
出版社: 資訊科學與工程學系所
引用: [1] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. of the ACM SIGMOD Conference on Management of Data, pp. 207-216, 1993. [2] R. Agrawal and R. Srikant, “Fast Algorithm for Mining Association Rules in Large Databases,” Proc. of 20th VLDB Conference, pp. 487-499, 1994. [3] R. Agrawal and R. Srikant, “Mining Sequential Patterns: Generalizations and Performance Improvements,” Proc. of the 5th International Conference on Extending Database Technology, pp. 3-14, 1996. [4] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” ACM-SIGMOD Conference Management of Data, pp. 255-264, 2005. [5] D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu, “MAFIA: A Maximal Frequent Itemset Algorithm,” IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 11, pp. 1490-1504, 2005. [6] A. Ghoting, G. Buehrer, S. Parthasarathy, D. Kim, A. Nguyen, Y. Chen, and P. Dubey, “Cache-Conscious Frequent Pattern Mining on Modern and Emerging Processors,” The International Journal on Very Large Data Bases, Vol. 16, Issue 1, pp. 77-96, 2007. [7] Gösta Grahne and Jianfei Zhu, “Fast Algorithms for Frequent Itemset Mining Using FP-trees,” IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 10, pp. 1347-1362, 2005. [8] Jiawei Han, Jian Pei, and Yiwen Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. of the ACM-SIGMOD Conference Management of Data, pp. 1-12, 2000. [9] Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao, “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach,” Data Mining and Knowledge Discovery, Vol. 8, Issue 1, pp. 53-87, 2004. [10] Kuen-Fang Jea, Ke-Chung Lin, and I-En Liao, “Mining Hybrid Sequential Patterns by Hierarchical Mining Technique,” International Journal of Innovative Computing, Information and Control (IJICIC), Vol. 5, No.8, pp.1349-4198 2009 [11] D. Lin and Z. M. Kedem, “Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set,” IEEE Transactions on Knowledge and Data Engineering, Vol. 14, No. 3, pp. 553-566, 2002. [12] Ke-Chung Lin, I-En Liao, Yu-Chuan Tai, “A Novel Method for Mining Assiciation Rules”, in preparation. [13] Li Liu, Eric Li, Yimin Zhang, and Zhizhong Tang, “Optimization of Frequent Itemset Mining on Multiple-Core Processor,” Proc. of VLDB'2007, pp 1275-1285, 2007. [14] G. D. Mulligan and D. G. Corneil, “Corrections to Bierstone's Algorithm for Generating Cliques,” In J. Association of Computing Machinery, Vol. 19, No. 2, pp. 244-247, 1972. [15] S. Orlando, C. Lucchese, P. Palmerini, R. Perego, and F. Silvestri, “kDCI: a Multi-Strategy Algorithm for Mining Frequent Sets,” Proc. of IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2003. [16] S. Orlando, P. Palmerini, R. Perego, and F. Silvestri, “Adaptive and Resource-Aware Mining of Frequent Sets,” Proc. The 2002 IEEE International Conference on Data Mining(ICDM' 02), pp.338-345, 2002. [17] J. S. Park, M. S. Chen, and P.S. Yu, “Using a Hash-based Method with Transaction Trimming for Mining Association Rules,” IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 5, pp. 813-825, 1997. [18] B. Racz, “nonordfp: An FP-growth variation without rebuilding the FP-tree,” Proc. of IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2004. [19] Ashok Savasere, Edward Omiecinski, Shamkant Navathe, “An Efficient Algorithm for Mining Association Rules in Large Database,” Proc. of the 21st VLDB Conference, Zurich, Swizerland, 1995. [20] Mingjun Song and Sanguthevar Rajasekaran, “Finding Frequent Itemsets by Transaction Mapping,” Proc. of the 2005 ACM symposium on Applied computing, pp. 498-492, 2005. [21] Y. J. Tsay and Y. W. Chang-Chien, “An efficient cluster and decomposition algorithm for mining association rules,” Information Sciences-Informatics and Computer Science,Vol. 160, Issue 1-4, pp. 161-171, 2004. [22] M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, “New Algorithms for Fast Discovery of Association Rules,” Proc. of 3rd Knowledge Discovery & Data Mining Conference, pp. 283-286, 1997. [23] http://www.cs.rpi.edu/~zaki/software/IBM-datagen.tar.gz [24] http://dmlab.cs.nchu.edu.tw/modules/wfdownloads/visit.php?cid=5&lid=9
摘要: 
關聯規則探勘(Association Rule Mining),是目前最重要的資料挖掘問題之一,它的目的是要從銷售的交易資料庫中,發現商品項目間的關聯性,進而提供決策者作為參考的依據。而在探勘關聯規則的問題中最重要且最複雜的一個步驟是從資料庫中找出所有的高頻項目集。先前,我們曾提出一個新的探勘高頻項目集的演算法,稱做Principle of Inclusion-Exclusion Method(PIE) 演算法。PIE 演算法的特色在於其利用排容原理的公式,並結合了Aprior方法的簡易性及FP-growth演算法的樹狀結構探勘以完成所有的高頻項目集探勘。然而,在PIE演算法的探勘過程中,樹狀結構裡會採用編碼方式進行搜尋,當樹中的節點過多時,有可能造成記憶過使用量過多的問題。
因此,本論文提出了一種新的演算法來改進PIE演算法,我們稱為PIE+(Principle of Inclusion-Exclusion Method plus)演算法,其主要特色有二,第一,PIE+演算法提出利用節點排序的特性,而建立一新的樹狀結構稱為Ordered tree,與PIE演算法的CP-tree相比,Ordered tree建樹的成本更小且建立速度也更快。第二,PIE+演算法只需在Ordered tree中進行簡易的DFS搜尋,便可取代PIE演算法中利用編碼方式來進行搜尋比對的方式,在進行探勘時其記憶體空間可大幅減少。在分析實驗數據後,我們提出的PIE+演算法可達到和PIE演算法相同的探勘速率,但記憶體使用量只需PIE演算法的約二分之一。由此可知,PIE+演算法確實可以解決PIE演算法中記憶體使用量過多的問題,當記憶體空間有限時,PIE+演算法將會比PIE演算法更適合從資料庫中進行高頻項目集探勘。

Association rules mining is one of the most important data mining techniques. It is useful for discovering relationships among different items to provide information for policy-maker. The most important and complex step in mining association rules is to discover all frequent itemsets form transaction database. In our previous studies, we had proposed an association rules mining algorithm called Principle of Inclusion-Exclusion Method(PIE) algorithm. PIE employs the Principle of Inclusion and Exclusion to calculate the support of each candidate itemset. PIE combines Apriori's buttom-up mining and one new tree structure inspired by FP-tree called Coded Pattern (CP) tree. However, CP-tree consumes more memory space than FP-tree due to the addition of Dewey code for each node.
In this thesis, we propose an improved algorithm called PIE+ to alleviate the memory space problem of PIE. In PIE+, a new tree structure called Ordered tree is used instead. Ordered tree takes less time and memory space to build, and it uses depth first search (DFS) to traverse the tree. The experimental results show that the memory space consumed by PIE+ is about one half of that of PIE, and the performance is almost the same.
URI: http://hdl.handle.net/11455/19695
其他識別: U0005-1808201017021300
Appears in Collections:資訊科學與工程學系所

Show full item record
 

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.