Please use this identifier to cite or link to this item:
標題: XML文件概要搜尋機制之研究
A Study of Search Schemes for XML Document Schemata
作者: 施美存
Shih, Mei-Tsun
關鍵字: XML Repository
Tree Edit Distance
Search Scheme
DTD Tree
出版社: 資訊管理學系所
引用: 1. 蔡如惠,民87,XML/EDI系統開發之研究與產業應用探討-以花卉運銷通路為例,朝陽大學資訊管理研究所碩士論文。 2. Chandrasekaran, B., Josephson, J. R., and Benjamins, V. R. “What Are Ontologies, and Why Do We Need Them?” IEEE Intelligent Systems (14), January-February 1999, pp. 20-26. 3. Ciancarini, P., Vitali, F., and Mascolo, C. “Managing Complex Documents Over the WWW: A Case Study for XML,” IEEE Transactions on Knowledge and Data Engineering (11), July/August 1999, pp. 629-638. 4. Do, H. H. and Rahm, E. “COMA-A System for Flexible Combination of Schema Matching Approaches,” The 28th International Conference on Very Large Databases, 2002, pp. 610-621. 5. Fernandez, M., Tan, W. C., and Suciu, D. “SilkRoute: Trading between Relations and XML,” Computer Networks (33), June 2000, pp. 723-745. 6. Filman, R. E. and Pant, S. “Search the Internet,” IEEE Internet Computing, July-August 1998, pp. 21-23. 7. Gudivada, V. N., Raghavan, V. V., Grosky, W. I., and Kasanagottu, R. “Information Retrieval on the World Wide Web,” IEEE Internet Computing, September-October 1997, pp. 58-68. 8. Iacovou, C. L., Benbasat, I., and Dexter, A. “Electronic Data Interchange and Small Organizations: Adoption and Impact of Technology,” MIS Quarterly, 1995, pp. 465-485. 9. Jeong, B., Lee, J., and Cho, H. “Efficient Optimization of Process Parameters in Shadow Mask Manufacturing Using NNPLS and Genetic Algorithm,” International Journal of Production Research (43:15), 2005, pp. 3209-3230. 10. Jeong, B. and Cho, H. “Feature Selection Techniques and Comparative Studies for Large-scale Manufacturing Processes,” International Journal of Advanced Manufacturing Technology (28:9), 2006, pp.1006-1011. 11. Jeong, B., Lee, D., Cho, H., and Lee, J. “A Novel Method for Measuring Semantic Similarity for XML Schema Matching,” Expert Systems with Applications(34), 2008, pp. 1651-1658. 12. Johnson, M. “XML for the Absolute Beginner,” (available online at 13. Kobayashi, M. and Takeda, K. “Information Retrieval on the Web,” ACM Computing Surveys (32), June 2000, pp. 144-173. 14. Kotok, A. “XML and EDI Lessons Learned and Baggage to Leave Behind,” August 1999 (available online at 15. Kotsakis, E. and Bohm, K. “XML Schema Directory: A Data Structure for XML Data Processing,” The 1st International Conferece on Web Information Systems Engineering, 2000, pp. 62-99. 16. Kotsakis, E. “XSD: A Hierarchical Access Method for Indexing XML Schemata,” Knowledge and Information Systems (4) , 2002, pp. 168-201. 17. Lee, M. L., Yang, L. H., Hsu, W., and Yang, X. “Xclust: Clustering XML Schemas for Effective Integration,” The 11th ACM International Conference on Information and Knowledge Management, November 2002, pp. 292-299. 18. Lu, E. J. L. and Hwang, R. J. “A Distributed EDI Model,” Journal of Systems and Software (56:1), February 2001, pp. 1-7. 19. Lu, E. J. L., Chou, S., and Tsai, R. H. "An Empirical Study of XML/EDI," Journal of Systems and Software (58:3), September 2001, pp.271-279. 20. Lu, E. J. L. and Jung, Y. M. "XDSearch: An Efficient Search Engine for XML Document Schemata", Expert Systems with Applications (24:2), February 2003, pp. 213--224. 21. Macroibeaird, S., Manes, A. T., Hinklman, S., and Mckee, B. “Using UDDI to Find ebXML Reg/Reps.”, May 2001 (available online at 22. Rissanen, J. “Modeling by Shortest Data Description,” Automatica (14), 1978, pp. 465-471. 23. Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D. J., and Naughton, J. F. “Relational Databases for Querying XML Documents: Limitations and Opportunities,” The 25th International Conference on Very Large Databases, September 1999, pp. 302-314. 24. Su, H., Kuno, H., and Rundensteiner, E., “Automating the Transformation of XML Documents,” The 3rd International Workshop on Web Information and Data Management, November 2001. 25. Tai, K. C. “The Tree-to-Tree Correction Problem,” Journal of the Association for Computing Machinery (26:3), July 1979, pp. 422–433. 26. Tansalarak, N. and Claypool, K. T. “QMatch-Using Paths to Match XML Schemas,” Data and Knowledge Engineering (60:2), February 2007, pp. 260-282. 27. Webber, D. R. “Introducing XML/EDI Frameworks,” Electronic Markets (8:1), 1998, pp. 38-41. 28. Wilson, S. and Kesselman, J. Java(TM) Platform Performance: Strategies and Tactics, Addison-Wesley, 2000. 29. Winkler, K., and Spiliopoulou. M. “Structuring Domain-Specific Text Archives by Deriving a Probabilistic XML DTD,” The 6th European Conference on Principles and Practice of Knowledge Discovery in Databases, August 2002, pp. 461-474. 30. Zhang, K., Statman, R., and Shasha, D. “On the Editing Distance between Unordered Labeled Trees,” Information Processing Letters (42), 1992, pp. 133-139. 31. Zhang, K. and Shasha, D. “Simple Fast Algorithms for the Editing Distance between Trees and Related Problems,” SIAM Journal of Computing (18), 1989, pp. 1245-1262.
摘要: 隨著網際網路的發展,商業交易可透過網路快速方便地完成,若要在網路上交易,則企業之間會進行資料交換的動作,因此企業們為了能夠在網路中順利地進行交易,必須使用良好的訊息交換格式來交換資料。XML是目前公認最好的資料交換格式,然而因為XML允許使用者定義自己的標籤及屬性,且因各企業組織對文件內容的需求不同,容易造成資料交換的困難,因此知名的國際組織利用XML儲存庫來解決此問題。當儲存庫越來越龐大時,如何能夠正確且有效率地找到符合使用者需求的文件概要便是一個很重要的問題。由於到目前為止仍沒有完善的搜尋引擎來查詢文件概要,因此本文提出一個文件概要搜尋機制TEDSearch,並以DTD文件為搜尋的對象。TEDSearch將查詢的DTD文件轉換為DTD樹,再使用樹狀校正距離來計算兩文件概要之間的相似度,最後將搜尋結果依照校正距離由小到大地排名。其中,由於搜尋時若針對整個資料庫的資料進行相似度的計算,將會導致搜尋的效率不彰,因此本文也提出一個過濾機制來有效地降低比對的資料個數。經過完整的實驗結果顯示,TEDSearch在精確性及搜尋結果排名上皆有優越的表現,而且因為TEDSearch出色的設計方式而使其搜尋速度快速,並能提供子結構查詢的功能,而TEDSearch的性能會比XDSearch更好。
Because XML developers can define elements and attributes to fit their own needs, XML is one of the best layout formats for exchanging messages over the Internet, but this character also makes it difficult to exchange XML documents between organizations. Well-known international organizations have established XML repositories in the hope of increasing reusability of collected document schemata to resolve the problem. XML repositories can provide developer with similar document schemata that are currently being modified and used to fit their needs. Therefore, it is necessary to develop an efficient search scheme so that developers can find and get objects they want. However, there is scarcely any perfect search mechanism established for XML repositories. Therefore, in this paper we propose a new search scheme for XML document schemata, called TEDSearch. TEDSearch transforms a queried DTD document into a DTD tree, calculates the edit distance between two DTD trees, and ranks the search results based on the edit distance. To further improve the efficiency of TEDSearch, we also design a filtering method which can significantly decrease the number of DTD trees needed to be calculated. The experimental results showed that not only the precision is greatly improved, but also the sub-structure search is supported. Additionally, TEDSearch outperforms XDSearch in the search of document schemata.
其他識別: U0005-1708201013323300
Appears in Collections:資訊管理學系



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.