Computer Science and Engineering, Department of


Document Type


Date of this Version



Xu, Z.; Yan, S.; Yuan, S.; Wu, C.; Chen, S.; Guo, Z.; Li, Y. Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing Data. Stats 2023, 6, 468–481. https://


Open access.


Sequencing-based genetic association analysis is typically performed by first generating genotype calls from sequence data and then performing association tests on the called genotypes. Standard approaches require accurate genotype calling (GC), which can be achieved either with high sequencing depth (typically available in a small number of individuals) or via computationally intensive multi-sample linkage disequilibrium (LD)-aware methods. We propose a computationally efficient two-stage combination approach for association analysis, in which single-nucleotide polymorphisms (SNPs) are screened in the first stage via a rapid maximum likelihood (ML)-based method on sequence data directly (without first calling genotypes), and then the selected SNPs are evaluated in the second stage by performing association tests on genotypes from multi-sample LD-aware calling. Extensive simulation- and real data-based studies show that the proposed two-stage approaches can save 80% of the computational costs and still obtain more than 90% of the power of the classical method to genotype all markers at various depths d ≥ 2.