Computer Science and Engineering, Department of


Date of this Version



Xu, Z.; Yan, S.;Wu, C.; Duan, Q.; Chen, S.; Li, Y. Next-Generation Sequencing Data-Based Association Testing of a Group of Genetic Markers for Complex Responses Using a Generalized Linear Model Framework. Mathematics 2023, 11, 2560. math11112560


Open access.


To study the relationship between genetic variants and phenotypes, association testing is adopted; however, most association studies are conducted by genotype-based testing. Testing methods based on next-generation sequencing (NGS) data without genotype calling demonstrate an advantage over testing methods based on genotypes in the scenarios when genotype estimation is not accurate. Our objective was to develop NGS data-based methods for association studies to fill the gap in the literature. Single-variant testing methods based on NGS data have been proposed, including our previously proposed single-variant NGS data-based testing method, i.e., UNC combo method. The NGS data-based group testing method has been proposed by us using a linear model framework which can handle continuous responses. In this paper, we extend our linear model-based framework to a generalized linear model-based framework so that the methods can handle other types of responses especially binary responses which is a common problem in association studies. To evaluate the performance of various estimators and compare them we performed simulation studies. We found that all methods have Type I errors controlled, and our NGS data-based methods have better performance than genotype-based methods for other types of responses, including binary responses (logistics regression) and count responses (Poisson regression), especially when sequencing depth is low. We have extended our previous linear model (LM) framework to a generalized linear model (GLM) framework and derived NGS data-based methods for a group of genetic variables. Compared with our previously proposed LM-based methods, the new GLM-based methods can handle more complex responses (for example, binary responses and count responses) in addition to continuous responses. Our methods have filled the literature gap and shown advantage over their corresponding genotype-based methods in the literature.