Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/19490
標題: 一個改進頻繁樣式成長法的關聯規則探勘方法
An Improved Frequent Pattern Growth Method for Mining Association Rules
作者: 陳志勝
Chen, Chih-Sheng
關鍵字: Association Rules
關聯規則探勘方法
FP-growth Algorithm
FP-tree
FP-growth演算法
FP-tree
出版社: 資訊科學與工程學系所
引用: 一、中文部分 (一) 期刊論文 [1] 何達明, “排序對關聯演算法速度的影響,” 中原大學,碩士論文, 2002. [2] 范明,李川, “在FP—tree中挖掘頻繁模式而不生成條件FP—tree,” 計算機研究與發展, Vol.40, No.8, pp.1216-1222, Aug 2003. [3] 陳彥良,趙書榮,陳禹辰, “幾個快速挖掘關聯規則的資料探勘方法,” 電子商務學報, Vol. 5, No.2, September 2003. [4] 趙豔鐸,宋斌恒,”基於逆向FP—tree的頻繁模式挖掘演算法,” 計算機應用,Vol.25, No.6, pp.1385-1387, June 2005. 二、西文部分 (一) Books [5] Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, “Introduction to Data Mining,” Addison-Wesley, 2006. (二) Journal Articles [6] Rakesh Agrawal, Tonasz Imielinski, and Arun Swami, “Mining Association Rules Between Sets of Items in Large Databases,” ACM SIGMOD Conf. Management of Data, pp. 207-216, May 1993. [7] Rakesh Agrawal and Ramakrishnan Srikant, “Fast Algorithm for Mining Association Rules in Large Databases,” In Proc. 1994 Int’l Conf. VLDB, pp. 487-499, 1994. [8] Sergey Brin, Rajeev Motwani, Jeffery D. Ullman, and Shalom Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” ACM SIGMOD Conference on Management of Data, pp. 255-264, 1997. [9] Christian Borgelt, “An Implementation of the FP-growth Algorithm,” ACM Press, New York, NY, USA, August 21, 2005. [10] Amol Ghoting, Gregory Buehrer, Srinivasan Parthasarathy, Daehyun Kim, Anthony Nguyen, Yen-Kuang Chen, and Pradeep Dubey, “Cache-Conscious Frequent Pattern Mining on Modern and Emerging Processors,” In International Journal on Very Large Data Bases (VLDBJ), pp. 77–96, 2007. [11] Go¨sta Grahne and Jianfei Zhu, “Fast Algorithms for Frequent Itemset Mining Using FP-trees,” IEEE Transactions on Knowledge and Data Engineering, Vol.17, No.10, pp.1347-1362, October 2005. [12] Jiawei Han, Jian Pei, and Yiwen Yin, “Mining Frequent Patterns without Candidate Generation,” In Proc. ACM-SIGMOD Int’l Conf. Management of Data, pp. 1-12, May 2000. [13] Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao, “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach,” Data Mining and Knowledge Discovery, 8, pp. 53-87, 2004. [14] Jochen Hipp, Ulrich Güntzer, and Gholamreza Nakhaeizadeh, “Mining Association Rules: Deriving a Superior Algorithm by Analyzing Today’s Approaches,” In Proc. of the 4th European Symposium on Principles of Data Mining and Knowledge Discovery, pp. 159-168, 2000. [15] Li Liu, Eric Li, Yimin Zhang, and Zhizhong Tang, “Optimization of Frequent Itemset Mining on Multiple-Core Processor,” In Proc. of VLDB''2007, pp. 1275-1285, 2007. [16] Salvatore Orlando, Paolo Palmerini, Raffaele Perego, Claudio Lucchese and Fabrizio Silvestri, “kDCI: a Multi-Strategy Algorithm for Mining Frequent Sets,” In Proc. of ICDM Workshop on Frequent Itemset Mining Implementations, 2003. [17] S. Orlando, P. Palmerini, R. Perego, and F. Silvestri, “Adaptive and Resource-Aware Mining of Frequent Sets,” In Proc. The 2002 IEEE International Conference on Data Mining (ICDM’02), pp. 338–345, 2002. [18] Jong Soo Park, Ming-Syan Chen, and Philip S. Yu, “An Effective Hash Based Algorithm for Mining Association Rules,” ACM SIGMOD Int''l Conf. Management of Data, pp. 175-186, May 1995. [19] Balázs Rácz, “nonordfp: An FP-growth variation without rebuilding the FP-tree,” In Proc. of IEEE ICDM ’04.Workshop on Frequent Itemset Mining Implementations, 2004. [20] Ashok Savasere, Edward Omiecinski, and Shamkant Navathe, “An Efficient Algorithm for Mining Association Rule in Large Databases,” In Proc. of 21th VLDB, pp. 432-444, 1995. [21] Hannu Toivonen, “Sampling Large Databases for Association Rules,” In 22nd VLDB Conference, pp. 134-145, 1996. [22] Mohammed Javeed Zaki, Srinivasan Parthasarathy, Mitsunori Ogihara, and Wei Li, “New Algorithms for Fast Discovery of Association Rules,” The 3rd Int’l. Conf. On Knowledge Discovery & Data Mining(KDD), pp. 283-286, 1997. (三) Electronic Resources [23] Frequent Itemset Mining Implementations Repository, http://fimi.cs.helsinki.fi/ [24] IBM generator, http://www.almaden.ibm.com/cs/projects/iis/hdb/Projects/data_mining/datasets/syndata.html
摘要: 關聯規則探勘是目前最重要的資料挖掘問題之一,在過去已經有相當多的關聯規則探勘演算法被提出來,其中FP-growth為最主要的演算法之一,其高執行效率令其成為關聯規則探勘領域中重要的技術。它的主要概念為不產生候選項目集(candidate itemsets),將資料壓縮於FP-tree的結構中,以避免多次的高成本資料庫掃瞄。然而,時代的進步,電腦設備不斷的提高,資料也不斷的增長,故需要一個更有效利用新環境且更快的挖掘方法來配合目前的需求。因此,我們針對原本的演算法,提出更有效率的方法以提高挖掘效率。 在本研究中,我們提出IFPG(Improved Frequent Pattern Growth)演算法用以改進FP-growth演算法,其主要特點有二,第一,我們利用輔助資料結構的方法,以加速建立FP-tree;第二,我們提出改良FP-growth的探勘方式以節省記憶體空間,我們保留原來的FP-tree結構,並以不產生條件FP-tree的混合式方法進行資料挖掘。經由我們的實驗數據顯示,我們提出的改良演算法在不同最小支持度門檻值的條件下,可以提升FP-growth的執行速度約一至三百倍,此外,我們的演算法可有效地使用記憶體空間而不造成浪費,故採用我們的方式能更有效率地從資料庫中挖掘出所有的頻繁項目集。
Association rules mining is an important issue in data mining. In the past, many algorithms are proposed for efficiently mining association rules and one of the most important approaches is FP-growth. Without using a candidate generation method, FP-growth constructs the FP-tree to store frequent 1-items and recursively counts all itemsets in the tree for mining all frequent itemsets. However, as the database size increases, the performance of FP-growth may become worse. In the paper, we propose IFPG(Improved Frequent Pattern Growth) algorithm to improve the performance of FP-growth. There are two major features in IFPG: first, it employs an Address-Table structure to speed up building the whole FP-tree. Secondly, it uses a hybrid method in FP-growth to discover all frequent itemsets efficiently. The experimental results show that our method outperforms FP-growth in execution time ranging from one to three hundred times under different minimal supports. Besides, our algorithm costs relatively little memory space as mining process proceeds, which, makes it suitable for high performance applications.
URI: http://hdl.handle.net/11455/19490
其他識別: U0005-0107200815200000
文章連結: http://www.airitilibrary.com/Publication/alDetailedMesh1?DocID=U0005-0107200815200000
Appears in Collections:資訊科學與工程學系所

文件中的檔案:

取得全文請前往華藝線上圖書館



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.