Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/19770
標題: 基於Time-fading Model的資料串流頻繁項目集探勘方法
A Method for Mining Frequent Itemsets over Data Streams Based on Time-fading Model
作者: 徐偉誠
Hsu, Wei-Cheng
關鍵字: 資料串流
資料挖掘
頻繁項目集
時間衰退模型
出版社: 資訊科學與工程學系所
引用: [1] C.J. van Rijsbergen, Information Retrieval, Butterworths, London, 1979. [2] C. Giannella, J. Han, J. Pei, X. Yan, and P. S. Yu, “Mining frequent patterns in data streams at multiple time granularities,” Data mining: Next Generation Challenges and Future Directions: AAAI/MIT Press, 2004, pp. 191-212. [3] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” Proceedings of the 20th International Conference on Very Large Data Bases, San Francisco, CA, USA, 1994, pp. 487-499. [4] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, “Dynamic itemset counting and implication rules for market basket data,” Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, Tucson, AZ, USA, 1997, pp. 255-264. [5] J. H. Chang and W. S. Lee, “A sliding window method for finding recently frequent itemsets over online data streams,” Journal of Information Science and Engineering, vol. 20, no. 4, 2004, pp. 753-762. [6] J. H. Chang and W. S. Lee, “Finding recent frequent itemsets adaptively over online data streams,” Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 2003, pp. 487-492. [7] Y. Chi, H. Wang, P. S. Yu, and R. R. Muntz, “Moment: maintaining closed frequent itemsets over a stream sliding window,” Proceedings of the 4th IEEE International Conference on Data Mining, Washington, DC, USA, 2004, pp. 59-66. [8] Y. Chi, H. Wang, P. S. Yu, and R. R. Muntz, “Catch the moment: maintaining closed frequent itemsets over a data stream sliding window,” Knowledge and Information Systems, vol. 10, no. 3, 2006, pp. 265-294. [9] R. Cooley, B. Mobasher, and J. Srivastava, “Data preparation for mining world wide web browsing patterns,” Knowledge and Information Systems, vol. 1, no. 1, 1999, pp. 5-32. [10] C. Hidber, “Online association rule mining,” Proceeding of the ACM SIGMOD International Conference on Management of Data, vol. 28, no. 2, 1999, pp. 145-156. [11] N. Jiang and L. Gruenwald, “Research issues in data stream association rule mining,” ACM SIGMOD Record, vol. 35, no. 1, 2006, pp. 14-19. [12] N. Jiang and L. Gruenwald, “CFI-Stream: mining closed frequent itemsets in data streams,” Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 2006, pp. 592-597. [13] H.-F. Li, S.-Y. Lee, and M.-K. Shan, “An efficient algorithm for mining frequent itemsets over the entire history of data streams,” Proceedings of the 1st International Workshop on Knowledge Discovery in Data Streams, Pisa, Italy, 2004, pp. 20-24. [14] G. S. Manku and R. Motwani, “Approximate frequency counts over data streams,” Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, China, 2002, pp. 346-357. [15] E. R. Omiecinski, “Alternative interest measures for mining associations in databases,” IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 1, 2003, pp. 57-69. [16] J. Pei, J. Han, B. Mortazavi-asl, and H. Zhu, “Mining access patterns efficiently from Web logs,” Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications, London, UK, 2000, pp. 396-407. [17] J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan, “Web usage mining: discovery and applications of usage patterns from Web data,” ACM SIGKDD Explorations Newsletter, vol. 1, no. 2, 2000, pp. 12-23. [18] J. X. Yu, Z. Chong, H. Lu, and A. Zhou, “False positive or false negative: mining frequent itemsets from high speed transactional data streams,” Proceedings of the 30th International Conference on Very Large Data Bases, Toronto, Canada, 2004, pp. 204-215. [19] M. J. Zaki, “Generating non-redundant association rules,” Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, USA, 2000, pp. 34-43. [20] Y. Zhu and D. Shasha, “StatStream: statistical monitoring of thousands of data streams in real time,” Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, China, 2002, pp. 358-369. [21] IBM Research - Almaden, Quest data mining synthetic data generation code. http://www.almaden.ibm.com/cs/projects/.
摘要: 資料串流挖掘相較於在靜態資料庫中的挖掘受到更多挑戰,因為資料串流的流入資料量沒有限制,挖掘系統的記憶體有限,並且挖掘演算法對串流資料僅能掃瞄一次。在依據Time-fading model的資料串流挖掘領域,目前的Lossy Counting演算法採用誤差參數的概念以減少記憶體的負擔,然而誤差參數卻會產生false positive,造成挖掘準確度下降的問題。為改善此一問題,本研究依於Time-fading model提出一個利用夾擊策略的挖掘頻繁項目集演算法,稱作TFSM,以提高挖掘準確度。夾擊策略透過項目集之間的向下封閉性 (downward closure)估算項目集計數值最大的損失,並且降低false positive。實驗結果顯示,採用夾擊策略的TFSM演算法確實可以提高串流資料挖掘的準確度1%~7%,並且提前刪除資料結構的節點數,加速後面的批次處理。
URI: http://hdl.handle.net/11455/19770
Appears in Collections:資訊科學與工程學系所

文件中的檔案:

取得全文請前往華藝線上圖書館



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.