Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/19800
標題: 在資料串流中預測序列樣式變化的模型
A Model for Predicting Sequential Pattern Changes in Data Streams
作者: 黃俊堯
Huang, Jyun-Yao
關鍵字: Data Streams
資料串流
Sequential Pattern Mining
Change Pattern
Prediction
序列樣式探勘
序列樣式變化
預測
出版社: 資訊網路多媒體研究所
引用: [1] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pp. 487-499 , 1994 [2] R. Agrawal and R. Srikant, “Mining sequential patterns,” Proceedings of the Eleventh International Conference on Data Engineering, pp. 3-14 , 1995 [3] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, “Models and issues in data stream systems,” Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, ACM, pp. 1-16, 2002 [4] J.H. Chang and W.S. Lee, “Finding recent frequent itemsets adaptively over online data streams,” Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM New York, NY, USA, pp. 487-492, 2003 [5] J.H. Chang and W.S. Lee, “Efficient mining method for retrieving sequential patterns over online data streams,” Journal of Information Science, vol. 31, pp. 420-432, 2005 [6] G. Chen, X. Wu, and X. Zhu, “Mining sequential patterns across data streams,” Computer Science Technical Report CS-05-04, University of Vermont, 2005 [7] G. Cormode and S. Muthukrishnan, “An improved data stream summary: The count-min sketch and its applications,” Journal of Algorithms, vol. 55, pp. 58-75, 2005 [8] C. Ezeife and M. Monwar, “SSM: A Frequent Sequential Data Stream Patterns Miner,” Computational Intelligence and Data Mining, 2007. CIDM 2007. IEEE Symposium on, pp. 120-126, 2007 [9] C. Giannella, J. Han, J. Pei, X. Yan, and P.S. Yu, “Mining Frequent Patterns in Data Streams at Multiple Time Granularities,” Next Generation Data Mining, vol. 212, 2003. [10] L. Golab and M.T. Ozsu, “Issues in data stream management,” ACM SIGMOD Record, vol. 32, pp. 5-14, 2003 [11] C.C. Ho, H.F. Li, F.F. Kuo, and S.Y. Lee, “Incremental Mining of Sequential Patterns over a Stream Sliding Window,” Proceedings of the Sixth IEEE International Conference on Data Mining-Workshops, IEEE Computer Society Washington, DC, USA, pp. 677-681, 2006 [12] J.W. Huang, “A General Model for Sequential Pattern Mining with a Progressive Database,” IEEE Transactions on Knowledge and Data Engineering, pp. 1153-1167, 2008 [13] H. Kim, J. Shin, Y. Jang, G. Kim, and H. Bae, “RSP-DS: Real Time Sequential Pattern Analysis over Data Streams,” LECTURE NOTES IN COMPUTER SCIENCE, vol. 4537, pp. 99-110, 2007 [14] H. Li and H. Chen, “GraSeq: A novel approximate mining approach of sequential patterns over data stream,” Lecture Notes in Computer Science, vol. 4632, p. 401-411, 2007 [15] C.H. Lin, D.Y. Chiu, Y.H. Wu, A.L.P. Chen, and T. Hsinchu, “Mining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding Window,” Proceedings of the 5th International Conference on Data Mining, Society for Industrial Mathematics, 2005 [16] G.S. Manku and R. Motwani, “Approximate frequency counts over data streams,” Proceedings of the 28th international conference on Very Large Data Bases, VLDB Endowment, pp. 346-357, 2002 [17] L.F. Mendes, B. Ding, and J. Han, “Stream Sequential Pattern Mining with Precise Error Bounds,” Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, IEEE Computer Society Washington, DC, USA, pp. 941-946, 2008 [18] S. Muthukrishnan, E. Berg, and Y. Wu, “Sequential change detection on data streams,” Seventh IEEE International Conference on Data Mining Workshops, 2007. ICDM Workshops 2007, pp. 551-550, 2007 [19] J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.C. Hsu, “Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach,” IEEE Transactions on Knowledge and Data Engineering, pp. 1424-1440, 2004 [20] M. Plantevit, B. Cremilleux, and C.C. de Nacre, “Condensed Representation of Sequential Patterns According to Frequency-Based Measures,” Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII, Springer, pp. 155-166, 2009 [21] Z. Su, Q. Yang, Y. Lu, and H. Zhang, “Whatnext: A prediction system for web requests using n-gram sequence models,” Proceedings of the First International Conference on Web Information Systems Engineering, Citeseer, p. 214, 2000 [22] C. Tsai and Y. Shieh, “A change detection method for sequential patterns,” Decision Support Systems, vol. 46, pp. 501-511, 2009 [23] V.S. Tseng and K.W. Lin, “Efficient mining and prediction of user behavior patterns in mobile web systems,” Information and software technology, vol. 48, pp. 357-369, 2006 [24] P. Zhang, X. Zhu, and Y. Shi, “Categorizing and mining concept drifting data streams,” 2008 [25] “IBM Quest Market-Basket Synthetic Data Generator,” http://www.cs.rpi.edu/~zaki/software/IBM-datagen.tar.gz; http://dmlab.cs.nchu.edu.tw/modules/wfdownloads/visit.php?cid=5&lid=9
摘要: 隨著資訊科技的發展,串流形式的資料也快速增加。所謂串流資料是指資料以連續且大量的串流形式流通而且通常只能被讀取一次。這種資料使得傳統的序列樣式探勘方法,在受限於有限的硬體設備下,變得不太適用於資料串流的環境。近年來已有些研究對於資料串流環境的探勘做了許多探討與分析,但是並沒有針對序列樣式變化加以預測,也缺少整合資料串流環境、序列樣式分析與序列樣式支持度變化類型預測的研究成果。 在本論文中,我們發展了一套在資料串流環境下的序列樣式探勘模型,除了能夠進行序列樣式探勘,也利用均方根平均值定義序列樣式改變度(Change Degree),並依據序列樣式支持度改變度之大小,調整預測方法,預估序列樣式累積支持度在兩個相鄰批次之變化情形。從實驗結果得知,我們提出之預測方法,比起傳統的線性迴歸分析法更準確,且能夠提供序列樣式變化類型資訊給使用者。
With the development of information technology, stream data grow rapidly in many applications. Unlike traditional data sets, stream data are temporally ordered, fast changing, and massive. Due to its tremendous volume, multiple scans of the entire stream data may not be possible. As a result, traditional sequential pattern mining algorithms are not suitable for data streams. In this thesis, we proposed a sequential pattern mining model for stream data. The proposed model provides functionalities such as mining sequential patterns and predicting pattern changes based on change degree. The experimental results show that the proposed model has high accuracy, about 90%, in terms of predicting pattern changes, and the accuracy is about 5% higher than that of linear regression.
URI: http://hdl.handle.net/11455/19800
其他識別: U0005-2407201021484800
文章連結: http://www.airitilibrary.com/Publication/alDetailedMesh1?DocID=U0005-2407201021484800
Appears in Collections:資訊網路與多媒體研究所

文件中的檔案:

取得全文請前往華藝線上圖書館



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.