Please use this identifier to cite or link to this item:
標題: 一個植基於多類別關聯規則演算法之疾病分類代碼預測系統
A Disease code prediction system based on multi-label association rules mining method
作者: 巫俊卿
Wu, Chun-Ching
關鍵字: 多類別關聯規則探勘
出版社: 資訊科學與工程學系所
引用: [1] 總校閱:范碧玉,主編:鄭茉莉、張樹棠, ”ICD-9-CM分類規則彚編”,台灣病歷管理協會, 2006. [2] 賴憲堂、楊志良、范碧玉, ”全民健康保險下疾病分類編碼品質與相關影響因素研究”,中華公共衛生雜誌, 17卷4期, pp.337-348, 1998. [3] 陳鎮揮、楊鎮嘉、吳效文, ”以資料探勘技術建立ICD-9-CM編碼決策系統”,光田醫學雜誌, vol.3, no.5, pp.31-41, 2008. [4] 《全民健康保險手術、處置支付標準與ICD-9-CM手術及處置代碼對應檔》,, 中央健康保險局,, 2010. [5] 《全民健康保險支付標準》, 中央健康保險局,, 2011. [6] 《全民健康保險-用藥品項》, 中央健康保險局,, 2011. [7] 《全民健康保險-特殊材料》, 中央健康保險局,, 2011. [8] 《全民健康保險-藥品代碼與ATC對照》, 中央健康保險局,藥品代碼與ATC對照_1000117.xls, 2011. [9] Alan R. Aronson, Olivier Bodenreider, Dina Demner-Fushman, Kin Wah Fung ,Vivian K. Lee, James G. Mork, Aurelie Neveol, Lee Peters, Willie J. Rogers, “From Indexing the Biomedical Literature to Coding Clinical Text: Experience with MTI and Machine Learning Approaches”, ACL Workshop BioNLP, pp.105-112, 2007. [10] Bing Liu,Wynne Hsu,Yiming Ma, “Integrating Classification and Association Rule Mining”, In Proceedings of Conference Knowledge Discovery and Data Mining (KDD''98), pp.80-86, 1998. [11] Farkas Richard, Szarvas Gyorgy, “Automatic construction of rule-based ICD-9-CM coding systems”, BMC Bioinformatics,9(Suppl 3):S10, 2008. [12] Grigorios Tsoumakas, Ioannis Katakis and Ioannis Vlahavas, ” Mining multi-label data”, Data Mining and Knowledge Discovery Handbook 2nd edition,part 6, pp.667-685, 2010. [13] Hsia DC, Krushat WM, Fagan AB, Tebbutt JA, Kusserow RP, “ Accuracy of diagnostic coding for Medicare patients under the prospective-payment system” , N Engl J Med,pp.318-352, 1988. [14] Koby Crammer , Mark Dredze , Kuzman Ganchev , Partha Pratim Talukdar, “Automatic Code Assignment to Medical Text”, ACL Workshop BioNLP, pp.129-136, 2007. [15] Leah S. Larkey, W. Bruce Croft, “Combining classifiers in text categorization”, In Proceedings of ACM SIGIR Conference on Reasearch and Development in Information Retrieval,pp. 289-297, 1996. [16] Pakhomov SV, Buntrock JD, Chute CG, ” Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning techniques”, Journal of the American Medical Informatics Association Volume 13, Issue 5, pp. 516-525, 2006. [17] Patrick Ruch,Julien Gobeill,Imad Tbahriti,Antoine Geissbuhler, “From Episodes of Care to Diagnosis Codes: Automatic Text Categorization for Medico-Economic Encoding”, AMIA Annu Symp Proc., pp. 636–640, 2008. [18] Pete Chapman, Julian Clinton, Randy Kerber, Thomas Khabaza, Thomas Reinartz, Colin Shearer, Rudiger Wirth, ” CRISP-DM 1.0: Step-by-step data mining guide” , , 2000. [19] S. Kannan, R. Bhaskaran, “Role of Interestingness Measures in CAR Rule Ordering for Associative Classifier: An Empirical Approach”, Journal of Computing, Vol. 2, Issue 1, pp.8-15, 2010. [20] Wenmin Li, Jiawei Han, Jian Pei, “CMAR: Accurate and Efficient Classification based on Multiple Class‐association Rules”, In Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM‐01),pp.369‐376, 2001.
摘要: 在Tw-DRG醫療支付制度下(Taiwan - Diagnosis Related Group,台灣版診斷關聯群),中央健康保險局(以下簡稱健保局)針對每一個Tw-DRG群組定訂給付標準,作為住院醫療費用的支付依據。健保局對住院病患進行醫療保險給付前,會先依據病患的疾病分類編碼作DRG分類,不論住院期間使用醫療資源之多寡,均以所屬DRG的給付標準給與給付。在此制度下,如何確保疾病分類編碼的品質,是一個重要的議題。對醫院來說,要避免發生會導致病患被分類到給付較低之DRG類別的編碼錯誤;對健保局來說,要防止醫院透過申報不實疾病分類編碼獲取較高保險給付的作為。 本研究假設病患住院期間醫師開立的醫令、住院前急診的疾病診斷、前次住院的疾病診斷與疾病分類編碼之間有關聯性存在。我們期望能夠以資料探勘技術找出這些關聯性,建立疾病分類關聯規則集,作為疾病分類代碼預測與檢核系統的基礎。分析某醫學中心74,356個住院病患的ICD-9-CM(International Classification of Diseases, 9th Revision Clinical Modification)代碼發生頻次,編碼頻次在10人次以下(支持度小於74356分之10)的ICD-9-CM代碼佔所有ICD-9-CM的比率高達62%,也就是說,此資料集具有低支持度的特性。參考其他相關研究發現,低支持度幾乎是疾病分類資料集的共通特性。本研究針對此特性,設計適用於低支持度資料集的多類別關聯規則探勘演算法-MLCARFLS (Multi-Label Class Association Rules mining For Low Support dataset),用以探勘疾病分類關聯規則集。 將MLCARFLS產出之疾病分類關聯規則集,以7,345筆住院資料進行測試,在診斷類ICD-9-CM最多推薦13個、手術類ICD-9-CM最多推薦7個的條件下,對手術類ICD-9-CM之召回率為82%、對所有ICD-9-CM之召回率為68%。Patrick Ruch等以瑞士日內瓦大學附設醫院之電子病歷為資料,採用自然語言處理技術,在推薦20個ICD-10下,召回率為63%,本研究與之相較高出5%。健保局以人工作業方式建立【健保醫令與手術類ICD-9-CM對應規則】,使用這些規則對手術類ICD-9-CM進行推薦,召回率為69%,本研究與之相較高出13%。由此可知使用MLCARFLS演算法探勘所得之疾病分類關聯規則集,作為疾病分類代碼預測與檢核系統的基礎,較使用健保局提供之規則為佳。
In the Tw-DRG (Taiwan - Diagnosis Related Group) medical payment system, the National Health Insurance Bureau (NHIB) uses inpatient's disease codes to determine patient's DRG, regardless of the used amount of medical resources during hospitalization, and it sets up the standard payment for each DRG to pay hospitals. Therefore how to ensure the quality of disease coding is an important issue. Hospitals need to avoid errors in disease coding that may cause patients to be classified into a lower payment category of DRG, while NHIB wants to prevent the hospitals from obtaining higher payments by false reporting of disease codes. In this study, we assumed that correlations exist between disease codes and the following items: patient's medical orders, diagnosis codes of the emergency medical care before hospitalization, and diagnosis codes of the previous inpatient hospitalization. We aimed to find these correlations by data mining techniques, and produce an associative ruleset for diseases classification and coding, and to establish a disease code prediction and checking system. We analyzed the ICD-9-CM (International Classification of Diseases, 9th Revision Clinical Modification) coding frequency of 74,356 patients in a medical center. Of all coding frequency of ICD-9-CM, the percentage of the coding frequency less than 10 times (support less than 10 / 74356) is 62%, which means this dataset has a low support feature, which is common for all disease coding dataset. Due to this feature, we designed an algorithm called MLCARFLS (Multi-Label Class Association Rules mining For Low Support dataset) to produce an associative ruleset for diseases classification and coding. We measured the quality of the associative ruleset that produced by MLCARFLS algorithm with 7,345 inpatient's medical records. Under the conditions of ICD-9-CM diagnosis codes recommended up to 13 and ICD-9-CM procedure codes recommended up to 7, the recall rate was 82% for ICD-9-CM procedure codes and 68% for overall ICD-9-CM codes. Patrick Ruch used natural language processing technology to process the electronic medical records of University Hospital of Geneva under the conditions of ICD-10 disease codes recommended up to 20, and obtained a recall rate of 63%, while in our study the recall rate is 5% higher. NHIB establish the mapping rules between order codes and ICD-9-CM procedure codes by a manual processes to recommend ICD-9-CM procedure codes, with a recall rate of 69%, while in our study a recall rate 13% higher (82%) was achieved. We conclude that compared with the ruleset presently used by the NHIB, MLCARFLS algorithm is a better method for disease codes prediction and checking.
Appears in Collections:資訊科學與工程學系所



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.