Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/19455
標題: 一個支援串流查詢之XML文件壓縮技巧
An XML Document Compression Technique Supporting Stream Query
作者: 丁正文
Ding, Jen-Wen
關鍵字: XML;XML;stream;stream XML query;compression;XPath;XML encoding;串流;串流XML文件查詢;壓縮;XPath;編碼
出版社: 資訊科學系所
引用: [1] V. Apparao et al., Document Object Model (DOM) Level 2 Specification, W3C Recommendation, November, 2000, http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/ [2] A. Berglund et al., XML Path Language (XPath) 2.0, W3C Recommendation, January, 2007, http://www.w3.org/TR/xpath20/ [3] T. Bray et al., Extensible Markup Language (XML) 1.1, W3C Recommendation, August, 2006, http://www.w3.org/TR/2006/REC-xml11-20060816/ [4] N. Bruno, N. Koudas, and D. Srivastava, “Holistic Twig Joins: Optimal XML Pattern Matching,” Proceedings of ACM SIGMOD Conference, 2002, pages 310 - 321. [5] J. Chen, D. J. DeWitt, F. Tian, and Y. Wang, “NiagaraCQ: A Scalable Continuous Query System for Internet Databases,” Proceedings of ACM SIGMOD Conference, 2000, pages 379 - 390. [6] Y. Chen, G. A. Mihaila, S. B. Davidson, and S. Padmanabhan, “EXPedite: a System for Encoded XML Processing,” Proceedings of the 13th ACM International Conference on Information and Knowledge Management, 2004, pages 108 - 117. [7] Y. Chen, G. A. Mihaila, S. B. Davidson, and S. Padmanabhan, “Efficient Path Query Processing on Encoded XML,” Proceedings of International Workshop on High Performance XML Processing, 2004. [8] Y. Diao, P. Fischer, M. J. Franklin, and R. To, “YFilter: Efficient and Scalable Filtering of XML Documents,” Proceedings of 18th International Conference on Data Engineering, 2002, pages 341 - 342. [9] T. J. Green, G. Miklau, M. Onizuka, and D. Suciu, “Processing XML Streams with Deterministic Automata,” Proceedings of the 9th International Conference on Database Theory, 2003, pages 173 - 189. [10] D. A. Huffman, “A Method for the Construction of Minimum-Redundancy Codes,” Proceedings of the Institute of Radio Engineers, 40(9), September 1952, pages 1098 - 1101. [11] H. Jagadish et al., “TIMBER: A Native XML Database,” Very Large Data Bases, 11(4), 2002, pages 274-291. [12] Q. Li and B. Moon, “Indexing and Querying XML Data for Regular Path Expressions,” Proceedings of the 27th International Conference on Very Large Data Bases, 2001, pages 361 - 370. [13] J. Lu et al., “From Region Encoding to Extended Dewey: On Efficient Processing of XML Twig Pattern Matching,” Proceedings of the 31st International Conference on Very Large Data Bases, 2005, pages 193 - 204. [14] J. K. Min, M. J. Park, and C. W. Chung, “XPRESS: A Queriable Compression for XML Data,” Proceedings of the ACM SIGMOD Conference, 2003, pages 122 - 133. [15] W. Ng, L.W. Yeung, and J. Cheng, “Comparative Analysis of XML Compression Technologies,” World Wide Web Journal, 9(1), 2006, pages 5 - 33. [16] P. O'Neil, E. O'Neil, S. Pal, I. Cseri, G. Schaller, and N. Westbury, “ORDPATHs: Insert-Friendly XML Node Labels,” Proceedings of ACM SIGMOD Conference, 2004, pages 903 - 908. [17] F. Peng and S. S. Chawathe, “XPath Queries on Streaming Data,” Proceedings of ACM SIGMOD Conference, 2003, pages 431 - 442. [18] P. M. Tolani and J. R. Haritsa, “XGRIND: A Query-friendly XML Compressor,” Proceedings of 18th International Conference on Database Engineering, 2002, pages 225 - 234. [19] I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic Coding for Data Compression,” Communications of the ACM, 30(6), 1987, pages 520 - 540. [20] GSFC/NASA XML Project, Nasa XML data, 2001, http://xml.gsfc.nasa.gov/archive/index.html. [21] The XML Data Repository, Computer Science & Engineering Department, University of Washington, U.S.A., http://www.cs.washington.edu/research/xmldatasets/. [22] NIAGARA Experimental Data, http://www.cs.wisc.edu/niagara/data.html. [23] The Penn Treebank Project, http://www.cis.upenn.edu/~treebank/ [24] XMark — An XML Benchmark Project, http://monetdb.cwi.nl/xml/
摘要: 
XML文件已經成為網際網路上標準的資料交換格式,而串流傳輸是廣泛應用於大量連續資料的傳送方式,兩者結合在一起,即利用串流方式去傳輸大量的XML資料時,可應用於電子商務或感測網路(Sensor Network)等領域。然而同時也因此產生許多與傳統資料庫查詢技術不同的研究議題,例如:如何縮短大量資料傳輸時間與加速查詢串流XML文件的技術等等。串流XML文件的特性主要在於XML文件樹狀的半結構化特性與串流本身特殊的傳輸型態。當我們用串流的方式傳送樹狀結構的文件時,會使得客戶端在查詢的過程中,無法預測後續的文件內容,必須等待整份XML文件傳送完畢,才能確定不再有符合查詢的結果,故若資料串流的傳輸時間太長將直接影響查詢的效率,以及客戶端接收的時間與電力的消耗。並且,在串流傳輸的情況下,許多傳統的加速查詢的技術-例如索引(index),亦不適用在串流XML文件之查詢。
本研究提出一個壓縮XML文件的技巧,來加速查詢串流XML文件。我們也設計了適用於串流XML文件的編碼方式,以輔助加快串流XML文件的查詢。
我們所提出的壓縮方法可以有效縮短XML文件的長度、簡化XML文件的結構,使得資料傳輸的時間縮短,客戶端可以更快接收到XML資料。又由於文件結構的簡化,將使得查詢的複雜度因此而降低,客戶端的查詢速度因此而變快。此外,由於我們所設計的查詢演算法只需針對客戶查詢相關部分作解壓縮,故減輕了文件解壓縮的負擔。
URI: http://hdl.handle.net/11455/19455
其他識別: U0005-2008200709264900
Appears in Collections:資訊科學與工程學系所

Show full item record
 

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.