Large Cohort Study
Study on genetic structure differences and adjustment strategies in different areas of China
Zhu Meng, Lyu Jun, Yu Canqing, Jin Guangfu, Guo Yu, Bian Zheng, Robin Walters, Iona Millwood, Chen Zhengming, Shen Hongbing, Hu Zhibin, Li Liming, for the China Kadoorie Biobank Collaborative Group
Published 2019-01-10
Cite as Chin J Epidemiol, 2019, 40(1): 20-25. DOI: 10.3760/cma.j.issn.0254-6450.2019.01.006
Abstract
ObjectiveTo describe the genetic structure of populations in different areas of China, and explore the effects of different strategies to control the confounding factors of the genetic structure in cohort studies.
MethodsBy using the genome-wide association study (GWAS) on data of 4 500 samples from 10 areas of the China Kadoorie Biobank (CKB), we performed principal components analysis to extract the first and second principal components of the samples for the component two-dimensional diagram generation, and then compared them with the source of sample area to analyze the characteristics of genetic structure of the samples from different areas of China. Based on the CKB cohort data, a simulation data set with cluster sample characteristics such as genetic structure differences and extensive kinship was generated; and the effects of different analysis strategies including traditional analysis scheme and mixed linear model on the inflation factor (λ) were evaluated.
ResultsThere were significant genetic structure differences in different areas of China. Distribution of the principal components of the population genetic structure was basically consistent with the geographical distribution of the project area. The first principal component corresponds to the latitude of different areas, and the second principal component corresponds to the longitude of different areas. The generated simulation data showed high false positive rate (λ=1.16), even if the principal components of the genetic structure was adjusted or the area specific subgroup analysis was performed, λ could not be effectively controlled (λ>1.05); while, by using a mixed linear model adjusting for the kinship matrix, λ was effectively controlled regardless of whether the genetic structure principal component was further adjusted (λ=0.99).
ConclusionsThere were large differences in genetic structure among populations in different areas of China. In molecular epidemiology studies, bias caused by population genetic structure needs to be carefully treated. For large cohort data with complex genetic structure and extensive kinship, it is necessary to use a mixed linear model for association analysis.
Key words:
Molecular epidemiology; Population genetic structure; Area differences; Linear mixed model
Contributor Information
Zhu Meng
Department of Epidemiology, School of Public Health, Nanjing Medical University, Nanjing 211166, China
Lyu Jun
Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China
Key Laboratory of Molecular Cardiovascular Sciences, Ministry of Education, Peking University, Beijing 100191, China
Yu Canqing
Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China
Jin Guangfu
Department of Epidemiology, School of Public Health, Nanjing Medical University, Nanjing 211166, China
Guo Yu
Chinese Academy of Medical Sciences, Beijing 100730, China
Bian Zheng
Chinese Academy of Medical Sciences, Beijing 100730, China
Robin Walters
Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK
Iona Millwood
Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK
Chen Zhengming
Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK
Shen Hongbing
Department of Epidemiology, School of Public Health, Nanjing Medical University, Nanjing 211166, China
Hu Zhibin
Department of Epidemiology, School of Public Health, Nanjing Medical University, Nanjing 211166, China
Li Liming
Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China
for the China Kadoorie Biobank Collaborative Group