Please use this identifier to cite or link to this item:
標題: 中文機構及公司名稱擷取
Extraction of Chinese Organization and Company Names
作者: 施仁斌
Shih, Ren-Bin
關鍵字: Organization and Company Names;機構及公司名;Extraction;Chinese Frequent Strings;identification;NLP;擷取;中文常用字串;識別;自然語言處理
出版社: 資訊科學與工程學系所
引用: [1]A. Blum , T. Mitchell , "Combining labeled and unlabeled data with co-training , "Proceedings of the 11th Annual Conference on Computational Learning Theory, Madison, WI., USA, 1998 [2]A. McCallum, Wei Li. "Early results for named entity recognition with conditional random fields, feature induction and Web-enhanced lexicons”, Proc. of the 7th CoNLL , Edmonton, Canada: Morgan Kaufmann, 2003, PP. 188-191. [3]Hongkui Yu , Huaping Zhang," Recognition of Chinese organization name based on role tagging." Proceedings of 20th In ternational Conference on Computer Processing of Oriental Languages, 2003,PP.79~87. [4]Hongkui Yu, Huaping Zhang, and Quan Liu. "Recognition of Chinese Organization Name based Role Tagging." ,Advances in Computation of Oriental Languages. Beijing: Tsinghua University Press,2003, PP. 79–87. [5]Houfeng Wang, Wuguang Shi, "A Simple Rule-Based Approach to Organization Name Recognition in Chinese Text.", 5th CICLing. Heidelberg: Springer-Verlag, 2005, LNCS 3406, PP. 769-772. [6]Jen-Chang Lee, Yue-Shi Lee and Hsin-Hsi Chen (1994). “Identification of Proper Names in Chinese Texts”, ROCLING 1994 , PP. 203-222. [7]Ji Hao YIN, Xiao Zhong FAN, Jiang De YU,”Chinese Organization Name Automatic Recognition Using Class-based Language Model”,Computer Science 2006,33(11),PP.212-214. [8]Jingbo Zhu, Benjamin K Tsou, Xuejun Wu, Tianshun Yao,” Using Co-Training for Chinese Organization NE Identification”, (IJCNLP-04) [9]John Lafferty, Andrew McCallum, and Fernando Pereira,"Conditional random fields: Probabilistic models for segmenting and labeling sequence data", Proc. of the 18th ICML . San Francisco: Morgan Kaufmann, 2001. PP.282-289. [10]Jun-sheng ZHOU, Xin-yu DAI, Cun-yan YIN, Jia-jun CHEN,"Automatic Recognition of Chinese Organization Name Based on Cascaded Conditional Random Fields", Chinese Journal of Electronics 2006, PP.804-809. [11]Suxiang Zhang, Suxian Zhang, Xiaojie Wang,"Automatic Recognition of Chinese Organization Name Based on Conditional Random Fields",2007 IEEE,pp 229-233. [12]Yuanyong Feng, Le Sun, Yuanhua Lv,"Chinese Word Segmentation and Named Entity Recognition Based on Conditional Random Fields Models",Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, PP. 181–184. [13]林義証, ”中文常用字串-一個優於傳統語言模型的新觀念”, 中興大學應用數學系博士論文, 2002. [14]張小衡,王玲玲, ”中文機構名稱的識別與分析”, Journal of Chinese Information Processing 1997, 11(04), PP.21-32. [15]張家銘,”中文人名擷取”, 中興大學資訊科學研究所碩士學位論文, 2007. [16]中央研究院平衡語料庫,。
專有名詞分為許多種類,而其中機構與公司名(Organization and Company Names,OCN)是很難完全收集在辭典的一種,因此中文OCN的擷取是自然語言處理中一個比較困難的問題。本篇論文將針對中文OCN擷取,使用詞性組合、前後詞性和中文常用字串(Chinese Frequent Strings,CFS)方法分別對詞性與結構做分析,並將分析結果製成機率表。

Proper nouns are classified into many categories. Among these, Organization and company names (abbreviated as OCN) in Chinese language are not completely specified in dictionaries. Thus extracting the name of an organization or a company is a difficult problem in Natural Language Processing (NLP). This thesis will discuss some methods for recognizing Chinese organization and company names by using the following three methods: (1) the combination of parts of speech of an OCN. (2) parts of speech surrounding an OCN, and (3) the structure of an OCN in terms of Chinese frequent strings. Such analyses will result in various probability tables which can be used to estimate the probability of a candidate OCN. Using the probability tables created by all three methods above can give better prediction. Thus the extraction of Chinese organization or company names can benefit from the judgment of the possibility for a candidate OCN to be a real Chinese organization or company name.
其他識別: U0005-2907200815194500
Appears in Collections:資訊科學與工程學系所

Show full item record

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.