Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/19779
標題: 一個壓縮UCIS-X索引檔及改善XML查詢效率的方法
A Compression Method of UCIS-X Index File for Efficient XML Query Processing
作者: 沈漢傑
Sin, Sanji
關鍵字: XML索引
XML查詢
XML Update
UCIS-X
Branch Map
出版社: 資訊科學與工程學系所
引用: [1] S. Al-Khalifa, H. V. Jagadish, N. Koudas, J. M. Patel, D. Srivastava, and Y. Wu, “Structural joins: a primitive for efficient XML query pattern matching,” Proc. 18th International Conference on Data Engineering (ICDE ''02), IEEE Press, pp. 141-152. 2002. [2] R. Bača, and M. Krátký, “A cost-based join selection for XML twig content-based queries,” Proceedings of the 2008 EDBT Workshop on Database Technologies for Handling XML information on the Web (DataX ''08), 353, Nantes, France, March 25, 2008, ACM, New York, NY, 2008, pp.13-20. [3] M. Benedikt, and C. Koch, “XPath leashed,” ACM Computing Surveys, 41, 1, 2008, pp.1-54. [4] S. Brenes, “Structural summaries for efficient XML query processing,” Proceedings of the 2008 EDBT Ph.D. Workshop, Nantes, France, March 25 - 25, 2008, S. Muller, W. Lindner, and G. Raschia, Eds. Ph.D. ''08, 326, ACM, New York, NY, 2008. pp. 65-73. [5] N. Bruno, N. Koudas, and D. Srivastava, “Holistic twig joins: optimal XML pattern matching,” Proceedings of the 2002 ACM SIGMOD international Conference on Management of Data (SIGMOD ''02), Madison, Wisconsin, June 03 - 06, 2002, pp.310-321. [6] B. Catania, A. Maddalena, and A. Vakali, “XML document indexes: a classification,” IEEE Internet Computing, vol. 9, no. 5, pp. 64-71, 2005. [7] B. Catania, B. C. Ooi, W. Wang, and X. Wang, “Lazy XML updates: laziness as a virtue of update and structural join efficiency,” Proceedings of the 2005 ACM SIGMOD international Conference on Management of Data (SIGMOD ''05), Baltimore, Maryland, June 14 - 16, 2005, ACM, New York, NY, 2005, pp.515-526. [8] D. Chamberlin, J. Robie, and D. Florescu, “Quilt: an XML query language for heterogeneous data sources,” Proceedings of the 3rd International Workshop on the Web and Databases (WebDB'2000), Dallas, TX, May 2000, pp.53-62. [9] Q. Chen, A. Lim, and K. W. Ong, “Enabling structural summaries for efficient update and workload adaptation,” Data and Knowledge Engineering, 64, 3, 2008, pp.558-579. [10] Q. Chen, A. Lim, and K.W. Ong, “D(K)-Index: an adaptive structural summary for graph-structured data,” Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD ''03) , ACM Press, pp. 134-144. 2003. [11] S. Chen, H. G. Li, J. Tatemura, W. P. Hsiung, D. Agrawal, and K. S. Candan, “Twig2Stack: bottom-up processing of generalized tree-pattern queries over XML documents,” Proc. 32nd International Conference on Very Large Data Bases (VLDB ''06), pp. 283-294, 2006. [12] Z. Chen, J. Gehrke, F. Korn, N. Koudas, J. Shanmugasundaram, and D. Srivastava, “Index structures for matching XML twigs using relational query processors,” Data & Knowledge Engineering, vol. 60, no. 2, pp. 283-302, 2007. [13] S.Y. Chien, Z. Vagena, D.Zhang, V.J. Tsotras, and C.Zaniolo, “Efficient structural joins on indexed XML documents, ”Proceedings of the 28th international Conference on Very Large Data Bases (VLDB), Hong Kong, China, August 20 - 23, 2002, Endowment, pp.263-274. [14] C.-W. Chung, J.-K. Min, and K. Shim, “APEX: an adaptive path index for XML data,” Proc. 2002 ACM SIGMOD International Conference on Management of Data, pp.121-132. September 2002. [15] B. Cooper, N. Sample, M.J. Franklin, G.R. Hjaltason, and M. Shadmon, “A fast index for semistructured data,” Proceedings of the 27th international Conference on Very Large Data, September 2001, pp.341-350 [16] R. Goldman, and J. Widom, “DataGuides: enabling query formulation and optimization in semistructured databases,” Proc. 23rd International Conference on Very Large Data Bases, pp. 436-445. 1997. [17] S.-C. Haw, and C.-S. Lee, “Data storage practices and query processing in XML databases: a survey,” Knowledge-Based Systems, Vol. 24, pp.1317-1340. June 2011. [18] Held, Gilbert, “Data compression: techniques and applications, hardware and software considerations, second edition,” John Wiley & Sons, New York, NY, 1987. [19] W.-C. Hsu, I-E. Liao, and S. Sin, “An updatable compressing and indexing scheme for efficient XML query processing” (in preparation). [20] H. Jiang, H. Lu, W. Wang, & B. C. Ooi, “XR-Tree: indexing XML data for efficient structural joins,” Proceedings of the 19nd international Conference on Data Engineering(ICDE), IEEE Computer Society, Washington, DC, 2003, pp.253-264. [21] R. Kaushik, D. Shenoy, P. Bohannon, and E. Gudes, “Exploiting local similarity to efficiently ordex paths in graph-structured data,” Proceedings of the 18nd international Conference on Data Engineering, ICDE, IEEE Computer Society, Washington, DC, 2002, pp.129-140. [22] I-E. Liao, W.-C. Hsu, and Y.-L. Chen, “An efficient indexing and compressing scheme for XML query processing,” Proceedings of the Second International Conference on Networked Digital Technologies (NDT 2010), July 7-9, 2010, pp.70-84. [23] T. Milo, and D. Suciu, “Index structures for path expressions,” Proceedings of the 7th international Conference on Database theory, Lecture Notes in Computer Science, 1540, Springer-Verlag, London, 1999, pp.277-295. [24] P. O''Neil, E. O''Neil, S. Pal, I. Cseri, G. Schaller, and N. Westbury, “ORDPATHs: insert-friendly XML node labels,” Proc. 2004 ACM SIGMOD International Conference on Management of Data (SIGMOD ''04) ACM Press, pp. 903-908. 2004. [25] L. Qin, X. J. Yu, and B. Ding, “TwigList: make twig pattern matching fast,” Proc. 12th International conference on Database systems for advanced applications (DASFAA ''07), Springer press, pp. 850-862. 2007. [26] P. Rao, and B. Moon, “PRIX: indexing and querying XML using prufer sequences,” Proc. 20th International Conference on Data Engineering (ICDE ''04), IEEE Computer Society, pp.288-299. 2004. [27] J. Robie, J. Lapp, and D. Schach, “XML query language (XQL),” http://www.ibiblio.org/xql/xql-proposal.html ,1999, Accessed 19 Dec. 2012. [28] S. Sheu & N. Wu, “XCut: indexing XML data for efficient twig evaluation,” Proceedings of the 22nd international Conference on Data Engineering (ICDE), April 03 - 07, 2006, IEEE Computer Society, Washington, DC, 2006, pp.127-127. [29] K. Thompson, “Programming techniques: regular expression search algorithm,” Communications of the ACM, Vol.11: 6, pp. 419-422. June 1968. [30] W3C Recommendation, “XML path language (XPath) version 2.0,” http://www.w3.org/TR/2005/WD-xpath20-20050404 , 2005, Accessed 19 Dec. 2012. [31] W3C Recommendation, “XQuery 1.0: an XML query language,” http://www.w3.org/TR/2005/WD-xquery-20050404 , 2005, Accessed 19 Dec. 2012. [32] W3C Recommendation, “XQuery update facility 1.0,” http://www.w3.org/TR/2009/CR-xquery-update-10-20090609/, Accessed 19 Dec. 2012. [33] W3C Recommendation, “XSL transformations (XSLT) version 2.0,” http://www.w3.org/TR/2005/WD-xslt20-20050404 , 2005, Accessed 19 Dec. 2012. [34] H. Wang, and X. Meng, “On the sequencing of tree structures for XML indexing,” Proc. 21st International Conference on Data Engineering (ICDE ''05), IEEE Computer Society, pp.372-383. 2005. [35] H. Wang, J. Li , and H. Wang, “Clustered chain path index for XML document: efficiently processing branch queries,” World Wide Web, 11, 1, 2008, pp.153-168. [36] H. Wang, S. Park, W. Fan, and P. S. Yu, “ViST: a dynamic index method for querying XML data by tree structures,” Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD ''03), ACM Press, pp.110-121. 2003. [37] B. Zhang, Z. Geng, and A. Zhou, “SIMP: efficient XML structural index for multiple query processing,” Proc. 9th International Conference on Web-Age Information Management (WAIM ''08), IEEE Computer Society, pp.113-118. 2008.
摘要: XML目前已成為網際網路上資料溝通和資料交換的標準,面對越來越多的文件以XML格式編寫,如何支援使用者快速查詢的需求,已是一個重要的研究課題。文獻上有許多XML索引方法和技術被提出,不過這些索引研究中,或多或少都有以下的缺點:許多索引方法需要很大的索引空間,有些研究在使用特定的資料集時,其索引結構甚至比原始XML文件還大。部份的方法需要花費很長的時間去建構索引,如果每次查詢都要等待索引的建構,並不符合實際所需。有些方法只適用於某些型態的查詢結構,部份查詢、分支查詢、指定內容值的查詢,或更複雜的查詢組合,就無法有效的被處理。更重要的是,大多數的索引技術都不支援文件的更新。 UCIS-X (An Updatable Compressing and Indexing Scheme for XML) 是一個可以支援更新機制的XML文件壓縮與索引方法,先以Structural summary index的方法將XML文件壓縮,再群聚相同標籤名稱的節點,而XML節點之間相對應的結構關係則以Branch Map的編碼方法表示,利用較小的空間記錄,達到節省空間及快速取得資料的優點。這個索引結構可以取代原始XML文件,支援各種型態的查詢時,不需要再參考原始文件。同時在處理W3C所定義的更新運算時,只需要花費極小的成本即可達到更新的目的。雖然UCIS-X有效節省索引空間,且有良好的執行效率,但是會產生大量重複及連續的 "1" 或 "0" 字串,形成空間的浪費。 本研究採用byte資料型態,記錄 "1" 和 "0 " 重複出現的次數,以減少Branch Map所需的空間。基於UCIS-X的索引結構,再加上Level-Tag-Index,讓Dewey ID編碼可以對應出路徑標籤名稱,減少查詢時走訪的節點。實驗結果顯示,本方法比起原UCIS-X方法,在查詢時間、更新速度、以及空間的使用上,有不錯的改進。
The UCIS-X (An Updatable Compressing and Indexing Scheme for XML) is an efficient indexing and compressing scheme for XML update and query processing. UCIS-X adopts an adaptive labeling scheme, called Branch Map, which uses less space to record the structural relationships between nodes. Although the UCIS-X saves index space and has good performance in query evaluation, it may produce a large number of repeated and continuous strings of "1" or "0" that consumes of lots of space. In this study, the Branch Map is compressed by using bytes to record the repetition frequencies of "1" or "0". Based on the index scheme of UCIS-X, a Level-Tag-Index is built to retrieve corresponding tag names in a path. The number of visited nodes is reduced during query processing. The experimental results show that the proposed method is better than the original UCIS-X in terms of space of index and speed of query and update processing.
URI: http://hdl.handle.net/11455/19779
Appears in Collections:資訊科學與工程學系所

文件中的檔案:

取得全文請前往華藝線上圖書館

Show full item record
 
TAIR Related Article
 
Citations:


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.