Date of this Version
Darby et al. BMC Bioinformatics 2011, 12:183
Background: Hybridization of heterologous (non-specific) nucleic acids onto arrays designed for model-organisms has been proposed as a viable genomic resource for estimating sequence variation and gene expression in nonmodel organisms. However, conventional methods of normalization that assume equivalent distributions (such as quantile normalization) are inappropriate when applied to non-specific (heterologous) hybridization. We propose an algorithm for normalizing and centering intensity data from heterologous hybridization that makes no prior assumptions of distribution, reduces the false appearance of homology, and provides a way for researchers to confirm whether heterologous hybridization is suitable.
Results: Data are normalized by adjusting for Gibbs free energy binding, and centered by adjusting for the median of a common set of control probes assumed to be equivalently dissimilar for all species. This procedure was compared to existing approaches and found to be as successful as Loess normalization at detecting sequence variations (deletions) and even more successful than quantile normalization at reducing the accumulation of false positive probe matches between two related nematode species, Caenorhabditis elegans and C. briggsae. Despite the improvements, we still found that probe fluorescence intensity was too poorly correlated with sequence similarity to result in reliable detection of matching probe sequence.
Conclusions: Cross-species hybridizations can be a way to adapt genome-enabled tools for closely related nonmodel organisms, but data must be appropriately normalized and centered in a way that accommodates hybridization of nucleic acids with diverged sequence. For short, 25-mer probes, hybridization intensity alone may be insufficiently correlated with sequence similarity to allow reliable inference of homology at the probe level.