dc.description.abstractProtein citrullination is catalyzed by peptidylarginine deiminase (PAD), which converts charged arginine residues to neutral citrulline residues in protein. Abnormal citrullination has been shown to play a role in rheumatoid arthritis, multiple sclerosis and Alzheimer’s disease. Due to limited amount of data, the development of citrullination tools is slow in the present. For this reason, a novel experimental approach for the small amounts of data to machine learning was designed in this study, and can be used for predicting citrullination sites. It is important to solve citrullination problem that feature selection from eight features such as rules, sequence similarity, evolutionary information, physicochemical and biochemical properties of amino acid, secondary structure, disorder and surface accessibility. Moreover, we evaluated models from learning process and importance of features, different from previous study. Finally the prediction model can achieve up to Histone accuracy for 99%, BMP accuracy for 79%, CXL accuracy for 66% and other protein accuracy for 68%. A case study demonstrated that the participated in regulatory pathways of peptidylarginine deiminase substrate can reach 88% in accuracy. Furthermore, we predicted and verified an unknown protein in cooperation with a biological laboratory, and the accuracy can be as high as 90%.en_US
dc.description.tableofcontents誌謝.............. i 摘要.............. ii Abstract.............. iii Content.............. iv List of Tables.............. vi List of Figures.............. vii 1. Introduction.............. 1 2. Materials and Methods.............. 4 2. 1Compilation of Citrullination Dataset.............. 4 2.2 Feature Models.............. 5 2.2.1 Rule.............. 5 2.2.2 Binary.............. 6 2.2.3 PSSM.............. 7 2.2.4 Amino Acid Factors.............. 7 2.2.5 Secondary Structure.............. 8 2.2.6 Disorder.............. 8 2.2.7 Surface Accessibility and Secondary Structure....... 9 2.3 Experimental Setup.............. 9 2.4 Feature Model Selection.............. 10 2.5 Model Selection.............. 11 3. Result and Discussion.............. 13 3.1 Evaluation of Feature Models.............. 13 3.2 Evaluation of Feature Model Selection.............. 15 3.3 Evaluation of Model Selection.............. 16 3.4 Performance of Different Types of Proteins.............. 18 3.5 Case Study.............. 19 4. Conclusion.............. 24 Reference.............. 25 Supplementary data.............. 29zh_TW
dc.subjectPeptidylarginine Deiminasesen_US
dc.subjectSupport Vector Machineen_US
dc.subjectGenetic Algorithmen_US
dc.titleProtein Citrullination Sites Prediction Using Machine Learning and Model Selectionen_US
