標題: 中文機構及公司名稱擷取
Extraction of Chinese Organization and Company Names
作者: 施仁斌
Shih, Ren-Bin
關鍵字: Organization and Company Names;機構及公司名;Extraction;Chinese Frequent Strings;identification;NLP;擷取;中文常用字串;識別;自然語言處理
專有名詞分為許多種類,而其中機構與公司名(Organization and Company Names,OCN)是很難完全收集在辭典的一種,因此中文OCN的擷取是自然語言處理中一個比較困難的問題。本篇論文將針對中文OCN擷取,使用詞性組合、前後詞性和中文常用字串(Chinese Frequent Strings,CFS)方法分別對詞性與結構做分析,並將分析結果製成機率表。

Proper nouns are classified into many categories. Among these, Organization and company names (abbreviated as OCN) in Chinese language are not completely specified in dictionaries. Thus extracting the name of an organization or a company is a difficult problem in Natural Language Processing (NLP). This thesis will discuss some methods for recognizing Chinese organization and company names by using the following three methods: (1) the combination of parts of speech of an OCN. (2) parts of speech surrounding an OCN, and (3) the structure of an OCN in terms of Chinese frequent strings. Such analyses will result in various probability tables which can be used to estimate the probability of a candidate OCN. Using the probability tables created by all three methods above can give better prediction. Thus the extraction of Chinese organization or company names can benefit from the judgment of the possibility for a candidate OCN to be a real Chinese organization or company name.
