Computing, School of

School of Computing: Dissertations, Theses, and Student Research

Accessibility Remediation

If you are unable to use this item in its current form due to accessibility barriers, you may request remediation through our remediation request form.

Data Mining of Protein Databases

Christopher Assi, University of Nebraska-LincolnFollow

First Advisor

Peter Z. Revesz

Date of this Version

7-27-2012

Document Type

Thesis

Citation

Christopher Assi, Data Mining of Protein Databases, M.S. Thesis, University of Nebraska-Lincoln, August 2012.

Comments

A thesis presented to the faculty of the Graduate College at the University of Nebraska in partial fulfillment of requirements for the degree of Master of Science

Major: Computer Science

Under the supervision of Professor Peter Z. Revesz. Lincoln, Nebraska, August 2012

Abstract

Data mining of protein databases poses special challenges because many protein databases are non-relational whereas most data mining and machine learning algorithms assume the input data to be a relational database. Protein databases are non-relational mainly because they often contain set data types. We developed new data mining algorithms that can restructure non-relational protein databases so that they become relational and amenable for various data mining and machine learning tools. We applied the new restructuring algorithms to a pancreatic protein database. After the restructuring, we also applied two classification methods, such as decision tree and SVM classifiers and compared their accuracy in predicting whether particular pancreatic proteins are involved in pancreatic cancer. From our prediction the SVM gave us not only the highest accuracy, about 73%, but it also gave the most consistency among the GO terms and PFAM family proteins.

Advisor: Peter Z. Revesz

Download

Included in

Computer Engineering Commons, Databases and Information Systems Commons, Organic Chemistry Commons

COinS

Computing, School of

School of Computing: Dissertations, Theses, and Student Research

Accessibility Remediation

Data Mining of Protein Databases

First Advisor

Date of this Version

Document Type

Citation

Comments

Abstract

Included in

Search

Browse

Author Corner

Links

Computing, School of

School of Computing: Dissertations, Theses, and Student Research

Accessibility Remediation

Data Mining of Protein Databases

Authors

First Advisor

Date of this Version

Document Type

Citation

Comments

Abstract

Included in

Share

Search

Browse

Author Corner

Links