Date of this Version
P. Z. Revesz, C. Assi, Data mining the functional characterizations of proteins to predict their cancer-relatedness, International Journal of Biology and Biomedical Engineering, 7 (1), 7-14, 2013.
This paper considers two types of protein data. First, data about protein function described in a number of ways, such as, GO terms and PFAM families. Second, data about whether individual proteins are experimentally associated with cancer by an anomalous elevation or lowering of their expressions within cancerous cells. We combine these two types of protein data and test whether the first type of data, that is, the functional descriptors, can predict the second type of data, that is, cancer-relatedness. By using data mining and machine learning, we derive a classifier algorithm that using only GO term and PFAM family descriptions of a protein can predict with over 73 percent accuracy whether it is associated with pancreatic cancer.
Analytical, Diagnostic and Therapeutic Techniques and Equipment Commons, Congenital, Hereditary, and Neonatal Diseases and Abnormalities Commons, Databases and Information Systems Commons, Disease Modeling Commons, Health Information Technology Commons, Medical Genetics Commons, Oncology Commons