Please use this identifier to cite or link to this item:
標題: 利用雲端計算於不確定性序列之型樣探勘
Uncertainty Sequence Patterns Mining by Using Cloud Computing
作者: 孫子云
Sun, Zi-Yun
關鍵字: 軌跡探勘;Cloud Computing;雲端計算;機率後置樹;Hadoop/MapReduce;Probabilistic Suffix Tree (PST);Uncertainty Data Mining
出版社: 電機工程學系所
引用: [1] V. R. Jain, R. Bagree, A. Kumar, P. Ranjan, "wildCENSE: GPS based Animal Tracking System," ISSNIP, 2008 [2] Journey North''s Monarch Butterfly Migration Tracking Project, [3] C.Roux, R.T.F.Bernard, "Home range size, spatial distribution and habitat use of elephants in two enclosed game reserves in the eastern cape province, south Africa," African J. of Ecology, 2007. [4] G. Shannon, B. Page, K. Duffy, R. Slotow, "African elephant home range and habitat selection in Pongola game reserve, south Africa,” African Zoology, vol. 41, no. 1, pp.37–44,2006. [5] "黑面琵鷺衛星追蹤計畫始末," 台灣濕地90 年5 月號第24 期, [6] J.-F. JIN, B.-F. LIU, X. YU, C.-H. LU, "Wintering and Migration of Black-faced Spoonbill in Xinghua Bay,Fujian Province," Chinese Journal of Zoology, issue 1, pp. 47-53,2009. [7] H. Nakahara, S. Ohsumi, H. Maeda, T. Hayashi, "Development on advanced whale migration tracking system," MTS/IEEE OCEANS, 1995. [8] Whale migration, [9] W.-C. Peng, M.-S. Chen, "Developing data allocation schemes by incremental mining of user moving patterns in a mobile computing system," IEEE TKDE, vol. 15, pp. 70-85, 2003. [10] Y. Wang, E.-P. Lim, S.-Y. Hwang, "Efficient mining of group patterns from user movement data," Data Knowledge Engineering, vol. 57, no. 3, pp. 240-282, 2006. [11] Y. Li, J. Han, J. Yang, “Clustering moving objects,” ACM SIGKDD, pp. 617-622, 2004. [12] R. Agrawal and R. Srikant, "Mining sequential patterns," ICDE, 1995. [13] M.-S. Chen, J. S. Park, P. S. Yu, "Efficient data mining for path traversal patterns,"Knowledge and Data Engineering, vol. 10, no. 2, pp. 209-221, 1998. [14] C. C. Aggarwal, P. S. Yu, "A Survey of Uncertain Data Algorithms and Applications," IEEE TKDE, 2008 [15] S. Gezici, "A Survey on Wireless Position Estimation," Wireless Personal Communications, 2007. [16] H. Liu, H. Darabi, P. Banerjee, J. Liu, "Survey of Wireless Indoor Positioning Techniques and Systems," IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 37, no. 6, pp. 1067-1080, 2007. [17] G. Sun, J. Chen, W. Guo, K.J.R. Liu, "Signal processing techniques in network-aided positioning: a survey of state-of-the-art positioning designs," IEEE Signal Processing Magazine, vol. 22, no. 4, pp. 12-23, 2005. [18] Y. Gu, A. Lo, I. Niemegeers, "A survey of indoor positioning systems for wireless personal networks," IEEE Communications Surveys & Tutorials, vol. 11, no. 1, pp. 13-32,2009. [19] F. Dime, B. Music, R. Osredkar, "Attaining Required Positioning Accuracy in Archeo-Geophysical Surveying by GPS," EPE-PEMC, 2006. [20] F. Giannotti, M. Nanni, F. Pinelli, D. Pedreschi, "Trajectory pattern mining," CM SIGKDD, pp. 330-339, 2007. [21] M. Morzy, "Mining frequent trajectories of moving objects for location prediction," MLDM, pp. 667-680, 2007. [22] M. Nanni, D. Pedreschi, "Time-focused clustering of trajectories of moving objects," J. Intelligent Information Systems, vol. 27, no. 3, pp. 267-289, 2006. [23] V. S. Tseng, K. W. Lin, "Energy efficient strategies for object tracking in sensor networks: A data mining approach," J. Systems and Software, vol. 80, no. 10, pp. 1678-1698, 2007. [24] J.-G. Lee, J. Han, K.-Y. Whang, "Trajectory clustering: a partition-and-group framework," ACM SIGMOD, pp. 593-604, 2007. [25] J.-G. Lee, J. Han, X. Li, "Trajectory Outlier Detection: A Partition-and-Detect Framework",ICDE, 2008. [26] Y. Wang, E.-P. Lim, S.-Y. Hwang, "Efficient mining of group patterns from user movement data," Data Knowledge Engineering, vol. 57, no. 3, pp. 240-282, 2006. [27] H.-Y. Kang, J.-S. Kim, K.-J. Li, “Similarity measures for trajectory of moving objects in cellular space”, ACM symposium on Applied Computing, 2009. [28] G. Trajcevski, O. Wolfson, S. Chamberlain, F. Zhang, "The Geometry of Uncertainty in Moving Objects Databases," EDBT, 2002. [29] G. Trajcevski, O. Wolfson, K. Hinrichs, S. Chamberlain, "Managing uncertainty in moving objects databases, " ACM Trans. Database Syst., vol. 29, no. 3, pp. 463-507, 2004. [30] B. Kuijpers, W. Othman, “Modeling uncertainty of moving objects on road networks via space-time prisms”, Int. J. of Geographical Information Science, vol. 23, no. 9, pp. 1095-1117,2009. [31] B. Kuijpers, W. Othman, "Trajectory databases: data models, uncertainty and complete query languages," accepted by J. of Computer and System Sciences, 2009. [32] J. Yang, M. Hu, "Trajpattern: Mining sequential patterns from imprecise trajectories of mobile objects,” EDBT, pp. 664-681, 2006. [33] N. Pelekis, I. Kopanakis, E. E. Kotsifakos, E. Frentzos, Y. Theodoridis, "Clustering Trajectories of Moving Objects in an Uncertain World,” ICDM, 2009. [34] S. Qiao, C. Tang, H. Jin, T. Long, S. Dai, Y. Ku, M. Chau, "PutMode: Prediction of Uncertain Trajectories in Moving Objects Databases,” Applied Intelligence, 2009. [35] H.-P. Kriegel, M. Pfeifle, "Clustering moving objects via medoid clusterings," SSDBM, 153–162, 2005. [36] Zhiming Ding, "UTR-Tree: An Index Structure for the Full Uncertain Trajectories of Network-Constrained Moving Objects," MDM, 2008. [37] C. K.-S. Leung and D. A. Brajczuk, "Efficient algorithms for mining constrained frequent patterns from uncertain data," ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data, 2009. [38] C. K.-S. Leung, C. Carmichael, B. Hao, “Efficient Mining of Frequent Patterns from Uncertain Data,” ICDM-DUNE, 2007 [39] S. Papadimitriou, Jimeng Sun, "DisCo: Distributed Co-clustering with Map-Reduce," ICDM, 2008. [40] Hadoop, [41] Dana Ron, Yoram Singer, and Naftali Tishby. Learning probabilistic automata with variable memory length. In Proceedings of the 7th Annual Conference on Computational learning theory, pages 35–46, New York, NY, USA, 1994. ACM. [42] H.-P. Tsai, D.-N. Yang, M.-S. Chen,, "Exploring Group Moving Pattern for an Energy-Constrained Object Tracking Sensor Network," PAKDD, 2007. [43] H.-P. Tsai, D.-N. Yang, M.-S. Chen, "Mining Group Relationship in an Object Tracking Sensor Network," TJCCT, 2007. [44] H.-P. Tsai, D.-N. Yang, M.-S. Chen, "Mining Group Movement Patterns for Tracking Moving Objects Efficiently," accepted by IEEE TKDE, 2009. [45] H.-P. Tsai, D.-N. Yang, M.-S. Chen, "Exploring Application Level Semantics for Data Compression," accepted by IEEE TKDE, 2009. [46] D. Katsaros, Y. Manolopoulos, "A suffix tree based prediction scheme for pervasive computing environments," Panhellenic Conf. on Informatics, 2005. [47] "Privacy-preserving trajectory data publishing by local suppression" Rui Chen , Benjamin C.M. Fung , Noman Mohammed, Bipin C. Desai , Ke Wang, 2011 [48] "Hiding Sensitive Trajectory Patterns" Osman Abul, Maurizio Atzori, Francesco Bonchi, Fosca Giannotti, 2007 [49] "Pattern-Preserving k-Anonymization of Sequences and its Application to Mobility Data Mining" Ruggero G. Pensa, Anna Monreale, Fabio Pinelli, and Dino Pedreschi, 2008 [50] A Performance Bound for Manoeuvring Target Tracking Using Best-Fitting Gaussian Distrbutions, 2005 [51] "Poisson Versus Gaussian Distribution for Object Tracking in Wireless Sensor Networks", Yun Wang, Fei Li, Fang Fang, 2010 [52] "Study on Nerve-Cell Based Real-time Estimation Method for GPS Multi-path Eerror", ZHUANG-SHENG ZHU, DE-JIAN LU, QING WANG, 2008 [53] "Determination of GPS Positioning Errors due to Multi-Path in Civil Aviation" A. Franchois, L. Roelens Department of Information Technology, 2005 [54] "An Implementation of A Geolocation Information-sharing System", Takanobu Otsuka, Ryo Suzuki, Shogo Kawaguchi, and Takayuki Ito, 2011 [55] [56] [57] " A Survey on Privacy Preserving Data Mining" Jian Wang, Yongcheng Luo, Yan Zhao, Jiajin Le, 2009
序列資料在我們的日常生活中隨處可見,像是生物的基因序列、蛋白質序列、動物季節性的遷徙路徑、車輛移動軌跡等,而序列型樣探勘(sequence pattern mining) 主要是挖掘隱藏在序列資料中特殊、重要、具代表性的特徵(feature)。近來,序列型樣探勘吸引了大量的關注,尤其是在生物資訊領域與時空(spatio-temporal)軌跡探勘領域中。許多序列天生具有一些不確定性,其不確定性可能由許多原因所造成,例如:測量技術的限制、取樣的誤差、隱私保護等等。本論文主要針對這些不確定性序列資料加以研究,以探勘隱藏其中的規律性或特徵。Probabilistic Suffix Tree (PST) 是Variable-length Markov的一種實作,廣泛被應用於各種序列資料的規律性之探勘。由於傳統PST建構演算法處理的是一般確定性序列資料,不適用於探勘不確定性之序列,且由於不確性序列資料的特性,其計算法複雜度更高,而傳統的PST建構演算為集中式、單機環境的演算法,無法負荷大量資料的計算。因此,我們提出適用於雲端Hadoop平台的演算法,利用雲端的平行運算能力以應付大量不確定性資料的型樣探勘。
本論文提出了兩種適用於雲端Hadoop平台的平行分散式之PST建構演算法,分別為CloudPST+、CloudPST+_OneScan,具體來說,此兩種演算法是使用Hadoop/MapReduce的程式模型來建構PST。而我們所提出的CloudPST+ 演算法以漸進、一次多層、迭代的方式建構PST,因此可以避免過度探勘型樣,同時能平衡分散式計算的overhead。此外,為避免多次掃描整個序列資料而拖累整體效能,我們進一步採取以空間換取時間的策略,利用一個新設計的資料結構以暫存中間統計的結果,因此每次迭代只需要掃描整個序列一次,即可建構該次迭代所需建構的PST層數。
為驗證CloudPST+ 和CloudPST+_OneScan的效能,我們進行多項實驗,試驗結果顯示CloudPST+_OneScan除了會耗費多一點記憶體,其他方面的表現都比CloudPST+ 來得好。而且CloudPST+_OneScan擁有較好的效能,且具有良好的可擴充性與穩定性。

Sequence data are ubiquitous in our daily life, such as animals’ seasonal migration, DNA/protein sequences, Web browsing sequences. Sequence pattern mining is to discover special, important, and representative features hidden in sequence data. It attracts a lot of attention especially in the domains of bioinformatics and spatio-temporal trajectory data mining. Sequence data are inherently of some uncertainty, and the uncertainty may be caused by many reasons, such as limitations of the measuring technology, sampling error, privacy preserving. In this thesis, we focus on the mining of uncertain sequences to discover hidden patterns by using Probabilistic Suffix Tree (PST). PST is an implementation of Variable-length Markov Model (VMM) that is wildly used in sequence pattern mining in many domains. However, traditional PST building algorithm is designed to mine certain data and inapplicable of mining uncertain sequences. In addition, traditional PST building algorithm is a centralized algorithm such that it is incapable of handling huge amounts of accumulated uncertain data. Therefore, we propose a distributed and parallel algorithm on Hadoop platform to fully utilize the computing power of cloud computing in the uncertain sequence mining.
In the thesis, we propose two distributed and parallel PST building algorithms, named CloudPST+ and CloudPST+_OneScan respectively on the Hadoop platform to speed up the learning process. Specifically, CloudPST+ is of Map/Reduce framework that builds a PST in an iterative and levels by levels manner to avoid learning excessive patterns and trade off the overhead of distributed computing. CloudPST+_OneScan extends CloudPST+ and involves a new data structure to store the intermediate statistics so that the One-Scan algorithm only scans the entire sequence data once in each iteration.
To evaluate the performance of CloudPST+ and CloudPST+_OneScan, we implement a naïve approach derived from the well-known Wordcount example of Hadoop/MapReduce and conduct several experiments with a synthetic dataset that is re-generated from a real trajectory dataset. According to our experimental results, CloudPST+ and CloudPST+_OneScan significantly outperform the naïve approach. In addition, sacrificing an extra memory cost, CloudPST+_OneScan shows better efficiency, scalability, and stability than CloudPST+.
其他識別: U0005-0402201317294100
Appears in Collections:電機工程學系所

Show full item record

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.