Computer Science and Engineering, Department of


Document Type


Date of this Version



Nucleic Acids Research, 2004, Vol. 32, No. 21 6437–6444


Copyright Oxford University Press 2004



The function of a protein that has no sequence homolog ofknownfunction is difficult to assignonthe basis of sequence similarity. The same problem may arise for homologous proteins of different functions if one is newly discovered and the other is the only known protein of similar sequence. It is desirable to explore methods that are not based on sequence similarity. One approach is to assign functional family of a protein to provide useful hint about its function. Several groups have employed a statistical learning method, support vectormachines (SVMs), for predicting protein functional family directly from sequence irrespective of sequence similarity. These studies showed thatSVMprediction accuracy is at a level useful for functional family assignment. But its capability for assignment of distantly related proteins and homologous proteins of different functions has not been critically and adequately assessed. Here SVM is tested for functional family assignment of two groups of enzymes. One consists of 50 enzymes that have no homolog of known function from PSI-BLAST search of protein databases. The other contains eight pairs of homologous enzymes of different families. SVM correctly assigns 72% of the enzymes in the first group and 62% of the enzyme pairs in the second group, suggesting that it is potentially useful for facilitating functional study of novel proteins.