Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/90830
標題: 一個主題式本體論學習方法與新聞事件模型之應用
A Topic-Based Ontology Learning Method and Its Application to News Events Modeling
作者: Lun-Chi Chen
陳倫奇
關鍵字: ontology learning
knowledge acquisition
news event detection
recommender system
collaborative filtering
本體論學習
知識擷取
新聞事件偵測
推薦系統
協同過濾
引用: [1] R. McPheters, Magazines and newspapers need to build better apps. 13 June 2012. Available: http://adage.com/article/media/viewpoint-magazines-newspapers-build-apps/232085. Accessed 22 Aug 2013. [2] M.A. Beyer, D. Laney, The importance of big data: a definition. Available: http://www.gartner.com/resId=2057415. Accessed 22 Aug 2013. [3] Z. Xu, X. Wei, X. Luo, Y. Liu, L. Mei, C. Hu, L. Chen, 'Knowle: A semantic link network based system for organizing large scale online news events,' Future Generation Computer Systems, Vol. 43-44, 2014, pp. 40-50. [4] M. Ramezani, M. Reza, F. Derakhshi, 'Automated text summarization: An overview,' Applied Artificial Intelligence, Vol. 28, No. 2, 2014, pp. 178-215. [5] D.M. Blei, A. Ng, M.I. Jordan, 'Latent dirichlet allocation,' Journal of Machine Learning Research, Vol. 3, No. 5, 2003, pp.993-1022. [6] B. Cai, H. Wang, H. Zheng, H. Wang, 'An improved random walk based clustering algorithm for community detection in complex networks,' Proceedings of IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2011, pp. 2162-2167. [7] T.H. Kim, K.M. Lee, S.U. Lee, 'Generative image segmentation using random walks with restart,' Proceedings of 10th European Conference on Computer Vision (ECCV '08), Springer Verlag, 2008, pp. 264-275. [8] J.Y. Pan, H.J. Yang, C. Faloutsos, P. Duygulu, 'Automatic multimedia cross-modal correlation discovery,' Proceedings of 10th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2004, pp. 653-658. [9] Wikipedia, Available: https://www.wikipedia.org/, Accessed 23 May 2015. [10] N. Guarino, 'Formal ontology and information systems,' Proceedings of the International Conference on Formal Ontology in Information Systems, IOS Press, 1998, pp. 3-15. [11] S. Borgo, M. Carrara, P. Garbacz, P.E. Vermaas, 'A formal ontological perspective on the behaviors and functions of technical artifacts,' Artificial Intelligence for Engineering Design, Analysis and Manufacturing (AIEDAM), Vol. 23, No. 2, pp. 3-21. [12] A.B. Rios-Alvarado, I.L. Lopez-Arevalo, E. Tello-Leal, V.J. Sosa-Sosa, 'An Approach for Learning Expressive Ontologies in Medical Domain,' Journal of Medical Systems, Vol. 39, No. 8, 2015, pp. 1-15. [13] S. Sendhilkumar, T. V. Geetha, 'Personalized ontology for web search personalization,' Proceedings of the 1st Bangalore Annual Compute Conference, 2008, pp. 1-7. [14] W. Wong, W. Liu, M. Bennamoun, 'Ontology learning from text: A look back and into the future,' ACM Computing Surveys (CSUR), Vol. 44, No. 4, 2012, Article 20. [15] SKOS Simple Knowledge Organization System Primer. Available: http://www.w3.org/TR/skos-primer/. Accessed 23 May 2015. [16] S. Nourashrafeddin, E. Milios, D.V. Arnold, 'An ensemble approach for text document clustering using Wikipedia concepts,' Proceedings of the 2014 ACM symposium on Document engineering, 2014, pp. 107-116. [17] Y. Labrou, T. Finin, 'Yahoo! as an ontology - using Yahoo! Categories to describe documents,' Proceedings of the 8th International Conference on Information Knowledge Management (CIKM), 1999, pp. 180-187. [18] K. Dave, S. Lawrence, D.M. Pennock, 'Mining the peanut gallery: opinion extraction and semantic classification of product reviews,' Proceedings of the 12th International Conference on World Wide Web (WWW'03), 2003, pp. 519-528. [19] C. Friedman, H. Liu, L. Shagina, S. Johnson, G. Hripcsak, 'Evaluating the UMLS as a source of lexical knowledge for medical language processing,' Proceedings of American Medical Informatics Association Annual Symposium (AMIA), 2001, pp. 189-193 [20] W. Wang, P. Barnaghi, A. Bargiela, 'Probabilistic topic models for learning terminological ontologies,' IEEE Trans. Knowledge and Data Eng., Vol. 22, No. 7, 2010, pp. 1028-1040. [21] WordNet. Available: https://wordnet.princeton.edu/. Accessed 30 June 2015. [22] H. Yang, Personalized concept hierarchy construction, Doctoral Dissertation CMU-LTI-11-018, Carnegie Mellon University, Pittsburgh, USA, 2011. [23] P. Cimiano, Ontology Learning and Population from Text, Springer-Verlag, 2006. [24] N.B. Mustapha, M.A. Aufaure, H.B. Zghal, H.B. Ghezala, 'Query-driven approach of contextual ontology module learning using Web snippets,' Journal of Intelligent Information Systems, Vol. 45, No. 1, 2015, pp. 61-94. [25] P.M. Vitányi, F.J. Balbach, R.L. Cilibrasi, M. Li, 'Normalized Information Distance,' Information Theory and Statistical Learning, Springer US, 2009. pp. 45-82. [26] M.-A. Rizoiu, J. Velcin, 'Topic extraction for ontology learning,' Ontology Learning and Knowledge Discovery Using the Web: Challenges and Recent Advances, 2011, pp. 38-61. [27] H. FOTZO, P. GALLINARI, 'Learning generalization/specialization relations between concepts-application for automatically building thematic document hierarchies,' Proceedings of the 7th International Conference on Computer-Assisted Information Retrieval (RIAO), 2004, pp. 143-155. [28] J.D. Knijff, F. Frasincar, F. Hogenboom, 'Domain taxonomy learning from text: The subsumption method versus hierarchical clustering,' Data & Knowledge Engineering, Vol. 83, 2013, pp. 54-69. [29] D. KLEIN, C. Manning, 'Accurate unlexicalized parsing,' Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL), 2003, pp. 423-430. [30] D. Lin, 'Dependency-based evaluation of MINIPAR,' Proceedings of the 1st International Conference on Language Resources and Evaluation, 1998. [31] R. Sombatsrisonboon, Y. Matsuo, M. Ishizuka, 'Acquisition of hypernyms and hyponyms from the WWW,' Proceedings of the 2nd International Workshop on Active Mining, 2003. [32] M.A. Hearst, 'Automated discovery of WordNet relations,' WordNet: An Electronic Lexical Database, C. Fellbaum, Ed. MIT Press, Cambridge, MA., 1998. [33] M. Shamsfard, A. Barforoush, 'Learning ontologies from natural language texts,' International Journal of Human-computer Studies, Vol. 60, No. 1, 2004, pp. 17-63. [34] J. Han, M. Kamber, J. Pei, 'Data mining: mining object, spatial, multimedia, text and web data, concepts and techniques,' Morgan Kaufmann, 2006, pp. 614-628. [35] C. Vicient, D. Sánchez, A. Moreno, 'A methodology to discover semantic features from textual resources,' 2011 Sixth International Workshop on Semantic Media Adaptation and Personalization (SMAP), 2011, pp. 39-44. [36] R.L. Cilibrasi, P.M. Vitanyi, 'The google similarity distance,' IEEE Trans. Knowledge and Data Eng., Vol. 19, No. 3, 2007, pp. 370-383. [37] S. Na, L. Xumin, G. Yong, 'Research on k-means clustering algorithm: An improved k-means clustering algorithm,' Proceedings of 3th International Symposium on Intelligent Information Technology and Security Informatics (IITSI), 2010, pp. 63-67. [38] V.K. Singh, N. Tiwari, S. Garg, 'Document Clustering using K-means, Heuristic K-means and Fuzzy C-means,' Proceedings of International Conference on Computational Intelligence and Communication Networks (CICN), 2011, pp. 297-301. [39] A. Sharma, R. Dhir, 'A wordsets based document clustering algorithm for large datasets,' Proceedings of International Conference on Methods and Models in Computer Science(ICM2CS), 2009, pp. 1-7. [40] H.C. Chang, C.C. Hsu, 'Using topic keyword clusters for automatic document clustering,' IEICE Trans. on Information and Systems, Vol. 88, No. 8, 2005, pp. 1852-1860. [41] J.M. Kim, H.D. Yang, H.S. Chung, 'Ontology-based recommender system of TV programmes for personalisation service in smart TV,' International Journal of Web and Grid Services, Vol. 11, No. 3, 2015, pp. 283-302. [42] S.C. Liao, K.F. Kao, I.E. Liao, H.L. Chen, S.O. Huang, 'PORE: a personal ontology recommender system for digital libraries,' The Electronic Library, Vol. 27, No. 3, 2009, pp. 496-508. [43] I.E. Liao, W.C. Hsu, M.S. Cheng, L.P. Chen, 'A library recommender system based on a personal ontology model and collaborative filtering technique for English collections,' The Electronic Library, Vol. 28, No. 3, 2010, pp. 386-400. [44] K. Toutanova, D. Klein, C. Manning, Y. Singer, 'Feature-rich part-of-speech tagging with a cyclic dependency network,' Proceedings of HLT-NAACL 2003, 2003, pp. 252-259. [45] W.Y. Ma, K.J. Chen, 'A bottom-up Merging Algorithm for Chinese Unknown Word Extraction,' Proceedings of ACL workshop on Chinese Language Processing 2003, 2003, pp. 31-38. [46] G. Salton, M.J. McGill, 'Introduction to modern information retrieval,' New York: McGraw-Hill, 1983. [47] T. Hofmann, 'Probabilistic latent semantic analysis,' Proceedings of 15th Annual Conference on Uncertainty in Artificial Intelligence, 1999, pp. 289-296. [48] T.L. Griffiths, M. Steyvers, 'Finding scientific topics,' Proceedings of the National Academy of Sciences, Vol. 101, suppl 1, 2004, pp. 5228-5235. [49] K.J. Chen, CKIP. Oct. 2006. Available: http://ckipsvr.iis.sinica.edu.tw/. Accessed 26 May 2014. [50] Jsmailovic, Ontology learning. 30 July 2012. Available: http://project-first.eu/content/ontology-learning. [51] R. Pan, G. Xu, B. Fu, P. Dolog, Z. Wang, M. Leginus, 'Improving recommendations by the clustering of tag neighbours,' Journal of Convergence, Vol. 3, No. 1, 2012, pp. 13-20. [52] M.A. Jabar, F. Sidi, M.H. Selamat, 'Tacit knowledge codification,' Journal of Computer Science, Vol. 6, No. 10, 2010, pp. 1170-1176. [53] T. Teraoka, 'Organization and exploration of heterogeneous personal data collected in daily life,' Human-centric Computing and Information Sciences, Vol. 2, No. 1, 2012, pp. 1-15. [54] M. Franke, G.S. Andreas, A.W. Neumann, 'Recommender services in scientific digital libraries,' Multimedia Services in Intelligent Environments, Springer Verlag, 2008, pp. 377-417. [55] G. Adomavicius, A. Tuzhilin, 'Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions,' IEEE Trans. Knowledge and Data Eng., Vol. 17, No. 6, 2005, pp. 734-749. [56] R.J. Mooney, L. Roy, 'Content-based book recommending using learning for text categorization,' Proceedings of 5th ACM conference on Digital libraries, 2000, pp. 195-204. [57] A. Geyer-Schulz, M. Hahsler, A. Neumann, A. Thede, 'An integration strategy for distributed recommender services in legacy library systems,' Between Data Science and Applied Data Analysis, Springer Verlag, 2003, pp. 412-420. [58] J.L. Herlocker, J.A. Konstan, A. Borchers, J. Riedl, 'An algorithmic framework for performing collaborative filtering,' Proceedings of 22nd Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR), 1999, pp. 230-237. [59] B. Sarwar, G. Karypis, J. Konstan, J. Riedl, 'Item-based collaborative filtering recommendation algorithms,' Proceedings of 10th International Conference on World Wide Web, 2001, pp. 285-295. [60] B.J. Oommen, A. Yazidi, O. Granmo, 'An adaptive approach to learning the preferences of users in a social network using weak estimators,' Journal of Information Processing Systems, Vol. 8, No. 2, 2012, pp. 191-212. [61] N. Guarino, 'Formal ontology in information systems,' Proceedings of the First International Conference (FIOS'98), IOS Press, 1998, 3-15. [62] N. GUARINO, C. WELTY, 'Ontological analysis of taxonomic relationships,' Proceedings of 19th International Conference on Conceptual Modeling, Springer Heidelberg, 2000, pp. 210-224. [63] J. Ye, L. Coyle, S. Dobson, P. Nixon, 'Ontology-based models in pervasive computing systems,' The Knowledge Engineering Review, Vol. 22, No. 4, 2007, pp. 315-347. [64] M.A. Rodríguez, M.J. Egenhofer, 'Determining semantic similarity among entity classes from different ontologies,' IEEE Trans. Knowledge and Data Eng., Vol. 15, No. 2, 2003, pp. 442-456. [65] A. Schwering, 'Hybrid model for semantic similarity measurement,' Proceedings of ODBASE 2005, 2005, pp. 1449-1465. [66] S. Yi, B. Huang, W. Tat Chan, 'XML application schema matching using similarity measure and relaxation labeling,' Information Sciences, Vol. 169, No. 1, 2005, pp. 27-46. [67] J.B. Schafer, D. Frankowski, J. Herlocker, S. Sen, 'Collaborative filtering recommender systems,' Proceedings of The Adaptive Web, Springer Verlag, 2007, pp. 291-324. [68] R. Krestel, P. Fankhauser, 'Personalized topic-based tag recommendation,' Neurocomputing, Vol. 76, No. 1, 2012, pp. 61-70. [69] H. Avancini, L. Candela, U. Straccia, 'Recommenders in a personalized, collaborative digital library environment,' Journal of Intelligent Information Systems, Vol. 28, No. 3, 2007, pp. 253-283. [70] I.E. Liao, S.C. Liao, K.F. Kao, I.F. Harn, 'A personal ontology model for library recommendation system,' Proceedings of Digital Libraries: Achievements, Challenges and Opportunities, Springer Verlag, 2006, pp. 173-182. [71] S.E. Middleton, N.R. Shadbolt, D.C. De Roure, 'Ontological user profiling in recommender systems,' ACM Trans. Information Systems (TOIS), Vol. 22, No. 1, 2004, pp. 54-88. [72] J.L. Herlocker, J.A. Konstan, L.G. Terveen, J.T. Riedl, 'Evaluating collaborative filtering recommender systems,' ACM Trans. Information Systems (TOIS), Vol. 22, No. 1, 2004, pp. 5-53. [73] R.J. Bayardo, Y. Ma, R. Srikant, 'Scaling up all pairs similarity search,' Proceedings of 16th International Conference on World Wide Web (WWW), 2007, pp. 131-140. [74] O. Udrea, L. Getoor, R.J. Miller, 'Leveraging data and structure in ontology integration,' Proceedings of 2007 ACM International Conference on Management of Data (SIGMOD), 2007, pp. 449-460. [75] X. Fei, S. Lu, C. Lin, 'A mapreduce-enabled scientific workflow composition framework,' IEEE International Conference on Web Service (ICWS), 2009, pp. 663-670. [76] J. Dean, S. Ghemawat, 'MapReduce: simplified data processing on large clusters,' Communications of the ACM, Vol. 51, No. 1, 2008, pp. 107-113. [77] J. Lin, 'Brute force and indexed approaches to pairwise document similarity comparisons with MapReduce,' Proceedings of the 32nd International ACM Conference on Research and Development in Information Retrieval (SIGIR), 2009, pp. 155-162. [78] K. Talattinis, A. Sidiropoulou, K. Chalkias, G. Stephanides, 'Parallel collection of live data using Hadoop,' Proceedings of 14th Panhellenic Conference on Informatics (PCI), 2010, pp. 66-71. [79] T. Elsayed, J. Lin, D.W. Oard, 'Pairwise document similarity in large collections with MapReduce,' Proceedings of 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers Association for Computational Linguistics, 2008, pp. 265-268. [80] ACM Digital Library. Available: http://portal.acm.org. Accessed 23 May 2015. [81] 20-Newsgroup. Available: http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.html. Accessed 30 July 2015. [82] Reuters-21578 Text Categorization Collection. Available: http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html. Accessed 30 July 2015. [83] C. Wang, Y. Song, A. EI-Kishky, D. R, M. Zhang, J. Han, 'Incorporating World Knowledge to Document Clustering via Heterogeneous information Networks via Heterogeneous Information Networks,' Proceedings of 21st ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2015, pp. 1215-1224.
摘要: News datasets are one of the most abundant data sources for social events. Examining major events in large news datasets is difficult because the amount of data has grown quickly with the rapid development of the Web, and because of the unstructured data nature of news articles. Discovering events from unstructured articles has become a crit-ical issue. This dissertation proposes an algorithm based on a topic-relationship graph and the random-walk method for detecting events within news articles, and proposes an ontolo-gy learning method for constructing a topic hierarchy from detected events. In the pro-posed event detection algorithm, a probabilistic topic model is used to generate a topic-based graph. News terms are then categorized into five predefined named entities by exploring Wikipedia, thereby generating more distinctive features of each news article. The news articles are aggregated according to their similarities using random-walk with restart (RWR) clustering algorithm. For ontology learning, we propose two approaches, namely, coverage approach and lexical relation approach, for the automatic construction of terminological ontologies based on the subsumption relationship among concepts. The subsumption relationship between two concepts is defined based on the their 'related' and 'broader' relation-ships, which are in turn measured by their cosine similarity and Kullback-Leibler (KL) divergence, respectively. The difference between a coverage approach and a lexical rela-tion approach is that the semantic relations of the WordNet taxonomies are employed in the calculation of the cosine similarity for a subsumption relationship. The ontology learning method can be applied to various applications in a recom-mender systems. This dissertation also presents a case study on a personal ontology-based recommender system and the use of MapReduce to overcome the computational problem inherent to an ontology similarity calculation. The experimental results of ontology learning for news event modeling show that the F-measure of the proposed method is 0.69, whereas that of the k-means and LDA-Gibbs methods is 0.63 and 0.51, respectively. We also validated the proposed ontology learning method using research papers from the Artificial Intelligence and Image Pro-cessing categories of the ACM Digital Library. The experimental results show that the highest precisions for these two categories are 0.574 and 0.532, respectively.
URI: http://hdl.handle.net/11455/90830
其他識別: U0005-2608201509564500
文章公開時間: 2018-08-27
Appears in Collections:資訊科學與工程學系所

文件中的檔案:

取得全文請前往華藝線上圖書館



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.