Please use this identifier to cite or link to this item:
標題: 一個基於排容原理與交易映射的頻繁項目集探勘演算法
A Frequent Itemset Mining Algorithm based on Principle of Inclusion-Exclusion and Transaction Mapping
作者: 林書帆
Lin, Shu-Fan
關鍵字: Association Rule Mining
Frequent Itemset Mining
Principle of Inclusion-Exclusion
Transaction Mapping
Interval List
出版社: 資訊網路多媒體研究所
引用: [1] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. of the ACM SIGMOD Conf. on Management of Data, 1993, pp. 207-216. [2] R. Agrawal and R. Srikant, “Fast Algorithm for Mining Association Rules in Large Databases,” Proc. of 20th VLDB Conf., 1994, pp. 487-499. [3] R. Agrawal and R. Srikant, “Mining Sequential Patterns: Generalizations and Performance Improvements,” Proc. of the 5th International Conference on Extending Database Technology, pp. 3-14, 1996. [4] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” ACM-SIGMOD Conference Management of Data, pp. 255-264, 2005. [5] D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu, “MAFIA: A Maximal Frequent Itemset Algorithm,” IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 11, pp. 1490-1504, 2005. [6] A. Ghoting, G. Buehrer, S. Parthasarathy, D. Kim, A. Nguyen, Y. Chen, and P. Dubey, “Cache-Conscious Frequent Pattern Mining on Modern and Emerging Processors,” The International Journal on Very Large Data Bases, Vol. 16, Issue 1, pp. 77-96, 2007. [7] Gosta Grahne and Jianfei Zhu, “Fast Algorithms for Frequent Itemset Mining Using FP-trees,” IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 10, pp. 1347-1362, 2005. [8] Jiawei Han, Jian Pei, and Yiwen Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. of the ACM-SIGMOD Conf. Management of Data, 2000, pp. 1-12. [9] Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao, “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach,” Data Mining and Knowledge Discovery, Vol. 8, Issue 1, 2004, pp. 53-87. [10] Kuen-Fang Jea, Ke-Chung Lin, and I-En Liao, “Mining Hybrid Sequential Patterns by Hierarchical Mining Technique,” International Journal of Innovative Computing, Information and Control (IJICIC), Vol. 5, No.8, pp.1349-4198 2009 [11] D. Lin and Z. M. Kedem, “Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set,” IEEE Transactions on Knowledge and Data Engineering, Vol. 14, No. 3, pp. 553-566, 2002. [12] Ke-Chung Lin, I-En Liao and Zhi-Sheng Chen, “An improved frequent pattern growth method for mining association rules,” Expert Systems with Applications, Volume 38, Issue 5, May 2011, pp. 5154-5161. [13] G. D. Mulligan and D. G. Corneil, “Corrections to Bierstone's Algorithm for Generating Cliques,” In J. Association of Computing Machinery, Vol. 19, No. 2, pp. 244-247, 1972. [14] S. Orlando, P. Palmerini, R. Perego, and F. Silvestri, “Adaptive and Resource-Aware Mining of Frequent Sets,” In Proc. The 2002 IEEE International Conference on Data Mining(ICDM' 02), pp.338-345, 2002. [15] S. Orlando, C. Lucchese, P. Palmerini, R. Perego, and F. Silvestri, “kDCI: a Multi-Strategy Algorithm for Mining Frequent Sets,” Proc. of IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2003. [16] J. S. Park, M. S. Chen, and P. S. Yu, “Using a Hash-based Method with Transaction Trimming for Mining Association Rules,” IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 5, 1997, pp. 813-825. [17] B. Racz, “nonordfp: An FP-growth variation without rebuilding the FP-tree,” Proc. of IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2004. [18] Ashok Savasere, Edward Omiecinski, Shamkant Navathe, “An Efficient Algorithm for Mining Association Rules in Large Database,” Proc. of the 21st VLDB Conference, Zurich, Swizerland, 1995. [19] Mingjun Song and Sanguthevar Rajasekaran, “A Transaction Mapping Algorithm for Frequent Itemsets Mining,” IEEE Transactions on Knowledge and Data Engineering, Vol. 18, No. 4, 2006, pp. 472-481. [20] Mingjun Song and Sanguthevar Rajasekaran, “Finding Frequent Itemsets by Transaction Mapping,” Proc. of the 2005 ACM symposium on Applied computing, 2005, pp. 498-492. [21] M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, “New Algorithms for Fast Discovery of Association Rules,” Proc. of 3rd Knowledge Discovery & Data Mining Conference, pp. 283-286, 1997. [22] Shiwei Zhu, Renqian Zhang and Guoping Xia, “APT-Structure: Efficient Mining of Frequent Patterns,” Proc. of 2010 International Conference on E-Business and E-Government, 2010, pp. 1395-1398. [23] IBM Synthetic Data Generator, [24] Frequent Itemset Mining Implementations Repository,
摘要: 關聯規則探勘(Association Rule Mining)是目前最重要的資料挖掘問題之一,它的目的是要從銷售的交易資料庫中發現商品項目間的關聯性,進而提供決策者作為參考的依據。而在探勘關聯規則的問題中最重要且複雜的一個步驟是從資料庫中找出所有的頻繁項目集。 本論文提出了一種嶄新的探勘頻繁項目集演算法,我們稱為PIETM(Principle of Inclusion-Exclusion and Transaction Mapping)演算法。其特色為有效地結合了知名演算法Apriori與FP-growth的優點,並導入排容原理與交易映射來計算項目集支持度。其中我們也利用FP-Growth所提出的樹狀結構之特性有效地輔助排容原理的計算。此外,我們更提出了重組排容原理公式提升計算效能的方法,可以提升排容原理的計算速度約四倍。實驗結果顯示,PIETM能在屬性複雜的資料庫中有效地探勘,當最小支持度極小時,PIETM仍能有效地從資料庫中探勘出所有的頻繁項目集,其原因在於PIETM演算法在探勘時不需重覆建立與追蹤條件樹,另外也不需重複掃描資料庫便可以得到候選項目集之支持度。
Association rules mining is one of the most important data mining techniques. It is useful for discovering relationships among different items to provide information for policy-maker. The most important and complex step in mining association rules is to discover all frequent itemsets from transaction database. In this paper, we present a novel algorithm for mining frequent itemsets, and this algorithm is referred as the PIETM(Principle of Inclusion-Exclusion and Transaction Mapping) algorithm. The characteristics of this algorithm are that it combines the advantages of two famous algorithms - Apriori and FP-Growth, and that it caculates the support of itemset using by Principle of Inclusion-Exclusion and Transaction Mapping. In this algorithm, we also employ the characteristics of tree structure in FP-Tree to effectively help Principle of Inclusion-Exclusion Theory for caculating support of itemset. Besides, we reformulate the Principle of Inclusion-Inclusion and present a method that can speed up the caculation of Principle of Inclusion-Exclusion 4 times compared to the original formulation. The experimental results show that the PIETM algorithm is effcient even in the database that has very large number of differnt items. This is because we are able to mine frequent itemset without scaning database repeatly, nor traversing any tree structure recursively.
其他識別: U0005-2207201120200300
Appears in Collections:資訊網路與多媒體研究所



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.