Plant Science Innovation, Center for
Document Type
Article
Date of this Version
2020
Citation
Dai X, Xu Z, Liang Z, et al. Non-homology-based prediction of gene functions in maize (Zea mays ssp. mays). Plant Genome. 2020;13:e20015. https://doi.org/10.1002/tpg2.20015
Abstract
Advances in genome sequencing and annotation have eased the difficulty of identifying new gene sequences. Predicting the functions of these newly identified genes remains challenging. Genes descended from a common ancestral sequence are likely to have common functions.As a result, homology is widely used for gene function prediction. This means functional annotation errors also propagate from one species to another. Several approaches based on machine learning classification algorithms were evaluated for their ability to accurately predict gene function from non-homology gene features. Among the eight supervised classification algorithms evaluated, random forest-based prediction consistently provided the most accurate gene function prediction. Non-homology-based functional annotation provides complementary strengths to homology-based annotation, with higher average performance in Biological Process GO terms, the domain where homology-based functional annotation performs the worst, and weaker performance in Molecular Function GO terms, the domain where the accuracy of homology-based functional annotation is highest. GO prediction models trained with homology-based annotations were able to successfully predict annotations from a manually curated “gold standard” GO annotation set. Non-homology-based functional annotation based on machine learning may ultimately prove useful both as a method to assign predicted functions to orphan genes which lack functionally characterized homologs, and to identify and correct functional annotation errors which were propagated through homology-based functional annotations.
Dai TPG 2020 Non-homology-based prediction SUPPL 2.xlsx (15 kB)
Dai TPG 2020 Non-homology-based prediction SUPPL 3.xlsx (233 kB)
Dai TPG 2020 Non-homology-based prediction SUPPL 4.xlsx (1941 kB)
Dai TPG 2020 Non-homology-based prediction SUPPL 5.xlsx (162 kB)
Dai TPG 2020 Non-homology-based prediction SUPPL 6.png (61 kB)
Dai TPG 2020 Non-homology-based prediction SUPPL 7.png (170 kB)
Comments
This is an open access article under the terms of the Creative Commons Attribution License. © 2020 The Authors