Please use this identifier to cite or link to this item:
標題: 一個以序列樣式長度為考量的序列資料分類模型
A Sequence Data Classification Based on Sequential Pattern Length
作者: 林孟秋
Lin, Meng-Chiu
關鍵字: 序列樣式探勘
出版社: 資訊科學與工程學系所
引用: [1] N. Lesh, M. J. Zaki, and M. Ogihara, “Scalable feature mining for sequential data,” IEEE Intelligent Systems 15, pp. 48-56, 2000. [2] K. Wang, Y. Hu and J. H. Yu, “Scalable sequential pattern mining for biological sequences,” Proceedings of 13th ACM Conference on Information and Knowledge Management, 2005. [3] 曾憲雄, 蔡秀滿, 蘇東興, 曾秋蓉, 王慶堯, “資料探勘, ” 旗標出版社, 2008. [4] L. Rabiner, “A tutorial on hidden Markov models and selected application in speech recognition,” Proceedings of IEEE 77, pp. 257-286 , 1989. [5] O. Yakhnenko, A. Silvescu, and V. Honvar, “Discriminatively trained Markov model for sequence classification,” Proceedings of 5th IEEE International Conference on Data Mining, pp. 498-505, 2005. [6] P. N. Tan, M. Steinbach, and V. Kumar, “Introduction to Data Mining, ” 培生教育出版集團, 2008. [7] R. Agrawal and R. Srikant, “Mining sequential patterns,” Proceedings of the Eleventh International Conference on Data Engineering, pp. 3-14 , 1995. [8] J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.C. Hsu, “Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach,” IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 11, pp. 1424-1440, 2004. [9] I. Steinwart, D. Hush, and C. Scovel, “A Classification Framework for Anomaly Detection,” Machine Learning Research 6, pp. 211-232, 2005. [10] G. Fernandes, and P. F. Owezarski, “Automated Classification of Network Traffic Anomalies,” Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, Vol. 19, pp. 91-100, 2009. [11] E. Tuzun, and J. Dalmau, “Limbic Encephalitis and Variants: Classification, Diagnosis and Treatment,” The neurologist, Vol.13, No. 5, pp. 261-271, 2007. [12] Y. Zhao, H. Zhang, S. Wu, J. Pei, L. Cao, C. Zhang, and H. Bohlsheid, “Debt Detection in Social Security by Sequence Classification Using Both Positive and Negative Patterns,” Lecture Notes in Computer Science, Vol. 5782, pp. 648-663, 2009. [13] Z. Syed, J. Guttag and P. Indyk, “Learning Approximate Sequential Patterns for Classification,” Machine Learning Research 10, pp. 1913-1936, 2009. [14] Y. Peng, Z. Wu, and J. Jiang, “A novel feature selection approach for biomedical data classification,” Biomedical Informatics 43, pp. 15-23, 2010. [15] B. Liu, W. Hsu, and Y. Ma, “Integrating classification and association rule mining,” Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-98), pp. 80-86, 1998. [16] N. Lesh, M. J. Zaki, and M. Ogihara, “Mining features for Sequence Classification,” Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), pp. 242-246, 1999. [17] Y. Yang, L. Cao, and L. Liu, “Time-Sensitive Feature Mining for Temporal Sequence Classification” Lecture Notes in Computer Science 2010, Vol. 6230, pp. 315-326, 2010. [18] V. S. Tseng, and C. H. Lee, “CBS: A New Classification Method by Using Sequential Patterns,”Proceedings of SIAM International Conference on Data Mining, pp. 596-600, 2005. [19] V. S. Tseng, and C. H. Lee, “Effective temporal data classification by integrating sequential pattern mining and probabilistic induction,” Expert Systems with Applications 36, pp. 9524-9532, 2009. [20] T. P. Exarchos, M. G. Tsipouras, C. Papaloukas, and D. I. Fotiadis, “A two-stage methodology for sequence classification based on sequential pattern mining and optimization,” Data & Knowledge Engineering 66, pp. 467-487, 2008. [21] R. Agrawal, and R. Srikant, “Fast algorithms for mining association rules,” Proceedings of 20th Int. Conf. Very Large Data Bases, pp. 487-499 , 1994. [22] “IBM Quest Market-Basket Synthetic Data Generator,”; [23] 林宇健, “資料探勘技術應用於慢性疾病健康照護管理系統,” 碩士論文, 靜宜大學資訊管理學系, 2008. [24] P. Pereira, F. silva, and N. A. Fonseca, “BIORED – A Genetic Algorithm for Pattern Detection in Biosequences,” IWPACBB 2008, pp. 156-165 , 2009. [25] W. Liu, and L. Chen, “An Efficient and Fast Algorithm for Mining Frequent Patterns on Multiple Biosequences,” IFIP Advances in Information and Communication Technology 2011, Vol. 344, pp. 178-194 , 2011.
摘要: 藉由分類的技術,我們可以將感興趣的資料劃分到所屬的類別當中,隨著資訊科技的發展,序列形態的資料也快速增加。在現實生活中有許多的決策行為預測,是採用這種序列形態的資料,所謂序列資料是指資料內容彼此之間有前後次序的關係。這種序列形式的資料型態使得傳統的分類方法,並不適合用來解決這類資料的分類問題。 因此,在本論文中,我們發展了一套基於序列樣式長度的序列資料分類模型,除了能夠整合序列樣式探勘與分類模型之外,也利用序列樣式篩選的機制,及透過樣式分數的計算,來預估序列資料所屬的類別。從實驗結果得知,我們提出之序列分類方法,無論在虛擬的或是實際的資料分析上都有不錯的效果。
The technique of classification can classify data into different categories. With the development of information technology, the demand for sequence data classification increases. Many interesting applications involve decision prediction based on the sequence data. A sequence is an ordered list of elements. The traditional classification methods are not suitable for sequence data. Therefore, this thesis proposed a sequence data classifier model based on the sequential patterns' length. In addition to integrating sequential pattern mining and classification techniques, this study also proposed a classification rule selection mechanism, that predicts the class label of sequence data based on pattern scores. From the experimental results, the proposed sequence data classifier model shows good performance on the synthetic and real sequence data.
Appears in Collections:資訊科學與工程學系所



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.