標題: 使用兩階段機器學習的自然語言查詢系統
A Natural Language Query System base on two steps' Machine Learning
作者: 歐庭銨
Ting-An Ou
關鍵字: 自然語言查詢系統;實體比對;查詢語法產生;機器學習;Natural Language Query;Entity Mapping;SPARQL Generation;Machine Learning
其中,實體比對的傳統做法為根據詞性進行不同實體的比對或是將問句轉為圖形後根據lexical rule進行比對,而查詢語法產生的傳統做法為根據相依關係樹、語法結構樹等圖形產生。我們使用了MEMM進行了兩階段的訓練,試圖在這兩階段的訓練中分別解決實體比對以及產生查詢語法這兩個步驟中的問題。

本研究使用了Question Answering over Linked Date(QALD)大會的QALD-7的多語言訓練集進行訓練,並使用其測試集進行測驗。在將我們的系統在與其他同樣使用QALD-7多語言測試集的研究進行比較後,比較結果證實了我們的系統成效十分不錯,Precision、Recall以及F1-measure都達到了68%,是所有以QALD-7多語言測試集進行測試的英文查詢系統中成效中最佳的。

With the rapid developed of artificial intelligence,to use these artificial intelligences smartly,there are more and more researcher are finding a way to let machine to understand natural language,thus the research in Natural Language Query System becomes a hot research field.A Natural Language Query System will encounter different problems in each search steps,for example,in Entity Mapping step we need to find out the word's corresponding entity,and in SPARQL Generation step we need to find out the realationship between every two entities.

Among the two steps,Entity Mapping's traditional solutuin is accroding the part-of-speech mapping to different entities or transfer the question to DAG and mapping entities by lexical rule.SPARQL Generation's traditional solutuin is accroding to parse tree or dependency tree to generate SPARQL query.We use MEMM to train our two step's training,try to solve Entity Mapping step and SPARQL Generation step's problems.

In our research we use Question Answering over Linked Date's qald-7's multilingual dataset to train our model,and test our system.After the test,the result show that our system get the best Precision、Recall and F1-measure in the qald-7's multilingual dataset in English.
