Computing, School of

 

School of Computing: Conference and Workshop Papers

Accessibility Remediation

If you are unable to use this item in its current form due to accessibility barriers, you may request remediation through our remediation request form.

Date of this Version

12-1-2012

Document Type

Article

Citation

P. Z. Revesz, C. Assi, Data mining of pancreatic cancer protein databases, In: Advances in Environment, Computational Chemistry and Bioscience (includes Proc. 3rd International Conference on Bioscience and Bioinformatics), S. Oprisan et al., eds., WSEAS Press, pp. 320-325, 2012.

Comments

OPEN ACCESS

Christopher Assi, M.S. in Computer Science, August 2012.

Abstract

Data mining of protein databases poses special challenges because many protein databases are non- relational whereas most data mining and machine learning algorithms assume the input data to be a type of rela- tional database that is also representable as an ARFF file. We developed a method to restructure protein databases so that they become amenable for various data mining and machine learning tools. Our restructuring method en- abled us to apply both decision tree and support vector machine classifiers to a pancreatic protein database. The SVM classifier that used both GO term and PFAM families to characterize proteins gave us over 73% accuracy in predicting whether a protein is involved in pancreatic cancer.

Share

COinS