Computer Science and Engineering, Department of


Document Type


Date of this Version



P. Z. Revesz, C. Assi, Data mining the functional characterizations of proteins to predict their cancer-relatedness, International Journal of Biology and Biomedical Engineering, 7 (1), 7-14, 2013.


OPEN ACCESS journal.

Christopher Assi, M.S. in Computer Science, University of Nebraska-Lincoln, 2012.


This paper considers two types of protein data. First, data about protein function described in a number of ways, such as, GO terms and PFAM families. Second, data about whether individual proteins are experimentally associated with cancer by an anomalous elevation or lowering of their expressions within cancerous cells. We combine these two types of protein data and test whether the first type of data, that is, the functional descriptors, can predict the second type of data, that is, cancer-relatedness. By using data mining and machine learning, we derive a classifier algorithm that using only GO term and PFAM family descriptions of a protein can predict with over 73 percent accuracy whether it is associated with pancreatic cancer.