Date of this Version
RNA-Seq is a recently developed technology that can reveal RNA expression profile by taking advantage of deep sequencing. This new technology has many advantages over microarray technologies. Although RNA-Seq is expected to overtake microarray experiments due to their massive amounts of data produced, it presents many challenges to bioinformatics research regarding efficient data processing and storage, and accuracy in data interpretation. One of the challenges and also an important aspect of expression profiling is to detect differentially expressed genes between different experimental conditions. Several statistical methods have been developed over the past few years. In this study, we chose two representative methods: one parametric method, DESeq, and one nonparametric method, NOISeq. We compared the performance of these two methods using simulated and real datasets. We showed that both DESeq and NOISeq identified over-expressed genes more correctly than under-expressed ones. While DESeq was more likely to call longer genes as differentially expressed, NOISeq did not show such bias. When the underlying variation increased, both methods showed higher falsepositive rates at the same threshold. When replicates were not available, both methods showed lower true-positive and higher false-positive rates. Finally, we explored a strategy to combine the results from DESeq and NOISeq when replicates are available. We showed that it is possible to improve differential gene-calling results by combining the results obtained from the two methods. NOISeq is recommended when no replicate is available.
Advisor: Etsuko Moriyama