Please use this identifier to cite or link to this item:
標題: 一個改善發佈/訂閱系統效能的XML串流文件過濾方法
An Improved XML Document Filtering Method for Publish/Subscribe Systems
作者: 吳東烈
Wu, Dung-Lie
關鍵字: 發佈/訂閱系統
publish / subscribe systems
XML Stream
XML Filtering
Twig query
出版社: 資訊科學與工程學系所
引用: 一、 中文部份 [1] 李嘉芳, “一個有效支援分支結構查詢的XML文件過濾系統,”中興大學資訊科學與工程研究所,碩士論文, 2012。 二、 西文部份 [2] M. Altinel, and M. J. Franklin, “Efficient filtering of XML Documents for selective dissemination of information,” Proc. 26th International Conference on Very Large Data Bases (VLDB ''00), pp. 53-64. 2000, [3] P. Antonellis, and C. Makris, “XFIS: An XML filtering system based on string representation and matching,” International Journal of Web Engineering and Technology, Vol. 4:1, pp. 70-94. 2008. [4] L. Dai, C.-H. Lung, and S. Majumdar, “BFilter- A XML Message Filtering and Matching Approach in Publish/Subscribe Systems,” Proceedings of the Global Telecommunications Conference (GLOBECOM 2010), IEEE, pp. 1-6, 6-10 Dec. 2010. [5] Y. Diao, and M. J. Franklin, “High-performance XML filtering: an overview of YFilter,” IEEE Data Engineering Bulletin, Vol. 26, pp.41-48. 2003. [6] Y. Diao, P. Fischer, M. Franklin, and R. To., “YFilter: Efficient and scalable filtering of xml documents,” Proc. 18th International Conference on Data Engineering, pp.341-342. 2002. [7] E. Grummt, “Fine-grained Parallel XML Filtering for Content-based Publish/Subscribe Systems,” Proc. the 5th ACM international conference on Distributed event-based system. ACM, p. 219-228, 2011. [8] J. Kwon, P. Rao, B. Moon, and S. Lee, “FiST: Scalable XML Document Filtering by Sequencing Twig Patterns,” Proceedings of the 31st International Conference on Very Large Data Bases (VLDB ''05), pp. 217-228, 2005. [9] A. Mitra, M. R. Vieira, P. Bakalov, V. J. Tsotras, and W. A. Najjar, “Boosting XML Filtering with a Scalable FPGA-based Architecture,” Proc. 4th Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA, 2009. [10] R. Moussalli, R. Halstead, M. Salloum,W.Najjar, and V. J. Tsotras. “Efficient XML Path Filtering Using GPUs,” In ADMSVLDB Workshops, 2011. [11] B. Ning, and C. Liu, “XML filtering with XPath expressions containing parent and ancestor axes, ” Information Sciences, Vol. 210, pp. 41-54, 2012. [12] A. Nizar M, G .S. Babu, and P S. Kumar, “SFilter: A Simple and Scalable Filter for XML Streams,” Proc. 15th International Conference on Management of Data (COMAD), December 2009. [13] M. Salloum , and V. J. Tsotras, “Efficient and Scalable Sequence-based XML Filtering,” Proceedings of the 12th International Workshop on the Web and Databases (WebDB), 2009. [14] Yu Xiaochuan; Alvin, CHAN.T.S., “A time/space efficient XML filtering system for mobile environment, ” Mobile Data Management (MDM), 2011 12th IEEE International Conference on , vol.1, pp.184,193, 6-9 June 2011 [15] Y. Zhang, Y. Psm, K. Chiu, “A Parallel XPath Engine Based on Concurrent NFA Execution, ” Proc. Parallel and Distributed Systems (ICPADS),IEEE 16th International Conference on , vol., no., pp.314,321, 8-10 Dec. 2010 [16] International Press Telecommunications Council. Accessed: 19, December 2012. [17] SAX, Accessed 19 December 2011. [18] W3C, Extensible Markup Language (XML) 1.0 (Fifth Edition),, Accessed: 10,July 2012. [19] W3C Recommendation, “XML Path Language(XPath) Version 2.0,” , 2005. Accessed: 24, December 2012. [20] YFilter 1.0 release. release.htm. Accessed 19 December 2011.
摘要: 現今網際網路的快速發展以及3C產品的普及化,使得資料在網路上傳遞與交換變得越來越頻繁,所以統一交換格式成為必然的趨勢。例如,許多網路的應用,如發佈/訂閱系統,使用XML作為標準的格式。在發佈/訂閱系統中,出版商的訊息以XML文件儲存,而使用者的喜好則是以XML查詢語句(如XPath)來表示。隨著文件和查詢的數目日益增長,如何從大量的文件中過濾出符合使用者的喜好,而衍生出XML串流文件過濾的議題。其中如何減少查詢索引所需要的空間,以及提升過濾的性能,是此議題中最重要的課題。 本實驗室於2012年提出一個XML文件過濾方法,稱為EFilter[1],實驗結果顯示,相對於FiST[8]與SFilter[12],EFilter確實增進了文件過濾的速度,但EFilter主要有以下幾個缺點:(1)雖然透過分支對照表(BranchIDset)可以處理分支查詢,但最後必須經過後處理才能確定最後的結果,增加了查詢所需的時間;(2)為了支援由下而上的比對方式,EFilter使用查詢對照表(QLinkedList)記錄查詢索引中的所有查詢葉節點,增加了查詢索引所需空間。 有鍳於此,本研究提出了一個改善EFilter的方法,稱為E+-Filter。E+-Filter沿用了EFilter的查詢索引架構,並參考BFilter[4]處理分支查詢的概念。E+-Filter捨棄使用QLinkedList,改為採用由上而下的比對程序,以縮小查詢索引所需的空間。同時過濾時,在遇到分支點時即檢查各分支符合的情況,以避免後處理的步驟。另一項特點是,E+-Filter在比對時只需使用兩個暫時的stack記錄XML目前解析的標籤名稱以及查詢索引相對應的節點位置,簡化了過濾比對流程,進一步提升過濾效率。透過實驗的結果顯示,E+-Filter在查詢索引的空間和執行過濾時間上,均優於與EFilter 和BFilter。
With the rapid growth of Internet and the popularization of 3C products, data is transferred and exchanged more frequently across the Internet. Thus, unifying exchange format has become an inevitable trend. For example, many Internet-based applications use XML as the standard format, such as publish/subscribe systems. In publish/subscribe systems, the messages generated by publishers are encoded as XML documents, and the query patterns of subscribers are defined as XML query statements. As the number of documents and query requests grow, the question of how to find out the appropriate documents that match users’ preferences drives the issue of streaming XML document filtering methods. Basically, the common goal of XML document filtering methods is to improve the space required for query index and the performance of the matching phase. Our research team has proposed an XML document filtering method, called EFilter, in 2012. Experimental results showed that EFilter is more efficient than FiST and SFilter in terms of filtering speed. However, there are some drawbacks of EFilter. Firstly, although the BranchIDset records the connectedness property of branch nodes in twig patterns, post-processing is needed for final matching which inevitably increases the execution time of matching process. Secondly, an additional structure, the QLinkedList, is used to record the position of leaves with the same tag type clustered in a linked list for quickly determining the positions of leaf nodes of each query. The efficiency of EFilter is improved by increasing some space for QLinkedList. In this paper, we overcome aforementioned problems by proposing an improved XML document filtering method, called the E+-Filter, which is based on the Query Guide of E-Filter and adopts the concept of branch-point matching policy of BFilter. E+-Filter abandons the use of QLinkedList by a top-down search process. To avoid the post-processing step, matching result is checked while the branch point is met during filtering phase. Another feature of E+-Filter is that only two temporary stacks are required to simplify the matching process of filtering phase. The experimental evaluation shows that E+-Filter performed better than both EFilter and BFilter in terms of the space of query index and the execution time of filtering.
其他識別: U0005-2607201316110700
Appears in Collections:資訊科學與工程學系所



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.