Evaluating the Efficacy of Type 2 Diabetes Polygenic Risk Scores in an Independent European Population

Numerous type 2 diabetes (T2D) polygenic risk scores (PGSs) have been developed to predict individuals’ predisposition to the disease. An independent assessment and verification of the best-performing PGS are warranted to allow for a rapid application of developed models. To date, only 3% of T2D PGSs have been evaluated. In this study, we assessed all (n = 102) presently published T2D PGSs in an independent cohort of 3718 individuals, which has not been included in the construction or fine-tuning of any T2D PGS so far. We further chose the best-performing PGS, assessed its performance across major population principal component analysis (PCA) clusters, and compared it with newly developed population-specific T2D PGS. Our findings revealed that 88% of the published PGSs were significantly associated with T2D; however, their performance was lower than what had been previously reported. We found a positive association of PGS improvement over the years (p-value = 8.01 × 10−4 with PGS002771 currently showing the best discriminatory power (area under the receiver operating characteristic (AUROC) = 0.669) and PGS003443 exhibiting the strongest association PGS003443 (odds ratio (OR) = 1.899). Further investigation revealed no difference in PGS performance across major population PCA clusters and when compared with newly developed population-specific PGS. Our findings revealed a positive trend in T2D PGS performance, consistently identifying high-T2D-risk individuals in an independent European population.


Introduction
Large-scale multi-cohort genome-wide association studies (GWAS) meta-analyses have allowed for the construction of polygenic risk scores (PGS) with increasingly higher performance for the prediction of individuals' genetic liability to phenotype [1].Many PGS models have been accumulated in the PGS Catalog repository, allowing for a rapid application of new genetic risk scores [1].However, highly variable model validation approaches, especially the ancestry of validation samples, do not allow for a reliable direct comparison of model efficacy [1].Furthermore, to assess the applicability of PGS in a heterogeneous population, further assessment in principal component analysis (PCA) stratified groups may be necessary [2].Finally, if published PGS models are not sufficient for the target application, fine-tuning a new population-specific PGS using established GWAS meta-analyses can further improve disease risk prediction [3,4].
As of now, the PGS Catalog has compiled 102 type 2 diabetes (T2D) PGSs from 26 studies.Larger GWAS analyses tend to produce more robust and accurate disease-associated variant effect weights; thus, the majority of developed models employ increasingly larger multi-cohort meta-analyses produced by dedicated consortiums such as DIAGRAM [5] and DIAMANTE [6].Both meta-analyses and an independent GWAS for weight estimation often involve national biobanks, most notably the biobank study of the United Kingdom (UKB) [7], a large-scale biomedical database of genetics and health information.
For summary statistics variable selection and posterior weight estimation, a study-specific genotype-level target dataset is often employed.Fine-tuning over T2D cases and controls from the study population then allows for finding better-performing models, notably for genetically distant target populations that were not included in the original GWAS weight estimation [8].Recent advances in PGS development methods, however, show that automatically estimated weights can produce results similar to fine-tuning [9,10].
The choice of the PGS construction method also has a crucial effect on model applicability.Current approaches can be distinguished by their prior assumption for variant effect weight distribution.The most common and simplest approach of linkage disequilibrium (LD) pruning + p-value thresholding (P + T) assumes zero effect for variants with a p-value above a set threshold and no shrinkage.In contrast, more sophisticated Bayesian methods assume no hard threshold for variant exclusion and adjust variant weights based on a set prior and local LD patterns.The LDpred method [10] assumes point normal prior distribution, setting most non-causal effect weights to zero and iterating over the proportion of causal effects.On the other hand, a more recently developed method named PRS-CS [9] assumes continuous shrinkage prior and iterates over differing contributions of the tail-end effect weights.Studies have shown T2D to be highly polygenic with many variants contributing to the etiology of the disease.In this context, PRS-CS has been shown to perform better than other methods [9].
Application of differing PGS model construction methods is benefited by validation in an independent population, both outside and within major continental population clusters, such as between countries of different population histories [4,8].Currently, only 3% of T2D PGSs available in the PGS Catalog have been independently assessed with evaluation metrics deposited in the PGS Catalog [1].However, no PGS has been investigated outside the target population, with two studies assessing T2D PGSs in separate populations but for a phenotype other than T2D [11,12].The Genome Database of Latvian Population (LGDB) possesses one of the largest genotyped T2D cohorts in Europe, relative to the population size [13].Moreover, currently published T2D PGSs have been developed using 31 populations and 75 cohorts [1], and samples from Latvia and LGDB so far have not been included.This presents a unique opportunity to conduct an independent assessment of T2D polygenic risk scores within Europe.Such an evaluation could greatly contribute to understanding the applicability and accuracy of these scores across diverse European populations.
In this study, we evaluate all currently available T2D PGSs deposited in the PGS Catalog (end date for data retrieval: 19 September 2023) using the reporting framework recommended for the evaluation of PGSs [14].We compare their performance in an independent European cohort previously not included in the weight calculation of base summary statistics or PGS evaluation.We then choose the best-performing PGS and assess its performance in major population clusters within the LGDB cohort.Finally, based on our findings we fine-tuned effect weights from Mahajan et al.'s 2018 study using data from the population of Latvia by applying the PRS-CS method.We demonstrate that 88% of published PGSs are significantly associated with T2D and the best-performing model PGS002771 performs equally well within major population clusters and when compared with population-specific fine-tuned PGS.We recommend further evaluation of PGS applicability for inclusion in individual T2D risk assessments.

Cohort Characteristics and Genotype-Based Quality Control
The dataset selected from the Genome Database of Latvian Population (LGDB) for our study initially comprised 3990 genotyped samples and 28,200,578 imputed variants, each with an imputation quality score (Rsq) above 0.3.Through our quality control process, several exclusions were made: 42 individuals were removed due to excess heterozygosity, 24 were identified as having second-degree relatives, and 206 samples were classified as outliers based on principal component analysis (PCA).In terms of variant quality, we excluded 1,969,016 variants that exhibited more than two alleles.Additionally, 18,438,822 polymor-phisms were excluded for being rare (minor allele frequency (MAF) of less than 1%), and 393 variants were identified as heterozygosity outliers.Following these exclusions, our main analysis dataset comprised 7,792,347 variants and 3718 individuals (Table S1).Within this cohort, 1496 individuals (40.2%) were diagnosed with type 2 diabetes (T2D).The main demographic characteristics of the participants are shown in Table S1.A majority of the participants were female (62.5%), the average age of participants was 51.5 years (standard deviation (SD) = 14.9), and there were significant age differences between the T2D and control groups (p-value < 0.001).The prevailing self-reported ethnic background was Latvian, representing 59.4% of the cohort, exhibiting population clustering among other European populations.In addition, all individuals projected within European (EUR) genetic ancestry PCA space with the 1000 Genomes Project (1000G) dataset as the ground truth (Figure S1, Table S1).To construct and validate the polygenic risk score (PGS), we divided the cohort randomly into two groups.The first group, comprising 70% of the samples (2603 individuals), was used for the construction of the PGS model.The remaining 30% (1115 individuals) formed the validation set, maintaining the same proportion of cases to controls as in the original cohort.

Evaluation of Published T2D PGS Models
In total, 102 T2D polygenic scores were evaluated and 88% showed significant association with T2D status (Bonferroni-adjusted family-wise error rate = 4.9 × 10 −4 ).A comparison of the performance of evaluated models is depicted in Figure 1, Tables S2-S4, with characteristics of the 10 best-performing PGSs summarized in Table 1.PGS002771 reached the highest area under the receiver operating characteristic (AUROC) value of 0.669 (95% confidence interval (CI) = 0.651-0.686)and explained 10.8% of T2D variance (Negelkarke's R 2 ).The classification performance of PGS002771 demonstrated a notable improvement, reaching an AUROC of 0.757 (95% CI = 0.74-0.773)when accounting for conventional T2D risk factors such as body mass index (BMI), sex, and age.Additionally, a significant difference in performance was observed between the PGS002771 adjusted for conventional risk factors and the model incorporating only the conventional T2D risk factors, reflected as a delta AUROC of 0.046.PGS003443 achieved the highest risk over the SD increase (odds ratio (OR) = 1.89, 95% CI = 1.76-2.05).We also conducted a comparison of the proportion of variants that overlap among the top 10 PGSs with the highest AUROC values (Table S5).The overlap percentage for the highest AUROC PGS002771 was notably low, ranging from 0.01% (with PGS002720) to 0.63% (with PGS003103 and PGS003118).In contrast, other models exhibited higher overlap, with the PGS003443 model, having the highest OR or SD, showing the most overlap with PGS003103 (85.09%) and the least with PGS000729 (8.15%), excluding PGS002771 (0.63% overlap).AUROC evaluation metrics measured in our study were lower compared with the reported (mean AUROC for the values measured in the current study = 0.597, SD = 0.033; mean AUROC for the values reported in the original study = 0.693, SD = 0.068; p-value = 1.18 × 10 −6 ) with significant correlation between the two measures (r = 0.553, p-value = 7.94 × 10 −6 ).When reported AUROC values above 0.65 were excluded, the correlation increased to r = 0.895 (Figure 1).There was no significant association between the year of PGS development and measured AUROC (p-value = 0.103); however, incremental yearly improvement of PGS performance becomes evident when selecting the ten highest AUROC polygenic scores for each year (p-value = 8.01 × 10 −4 ) (Figure 1).PGS: polygenic risk score, GWAS: genome-wide association study, AUROC: area under the receiver operating characteristic, OR: odds ratio, SD: standard deviation; CI: confidence interval.* Please find Table S8 for the description of methods applied for the development of evaluated PGSs.
Investigation of non-genetic predictors alone revealed a comparably high classification ability with AUROC of 0.711 (95% CI = 0.693-0.728);however, it was significantly lower when evaluated in separate age groups (comparison between AUROC of model comprising non-genetic covariates in the whole LGDB cohort and AUROC of the same model in the second age tertile group solely: D = 6.018, df = 1846, p-value = 2.1 × 10 −9 ) with lowest AUROC values in the first (AUROC = 0.538, 95% CI = 0.497-0.578)and third age tertile (AUROC = 0.558, 95% CI = 0.522 = 0.593) (Table S4).The inclusion of PGS significantly improved T2D classification in all age groups (comparison between AUROC of model comprising non-genetic covariates and AUROC of PGS002771 adjusted for non-genetic covariates: Z = −3.3217,p-value = 8.94 × 10 −4 ) with the highest delta gain in the third age tertile when compared with non-genetic covariates only (delta of AUROC for model comprising non-genetic covariates only and AUROC of PGS002771 adjusted for non-genetic covariates = 0.146, both calculated for the third age tertile) (Table S4).

PGS Performance in Ancestry Clusters
Using the best-performing PGS002771, we further investigated PGS applicability between major ancestries characteristic for the population of Latvia.We first defined two major population groups using hclust hierarchical clustering algorithm.The highest adjusted Rand index (ARI) = 0.413 with a self-assigned ethnicity (SAE) as a ground truth was the eight-cluster model (Figure S3; Table S6), with 89.9% of the population assigned to cluster 3 (N = 1934) and cluster 2 (N = 1411) (Table 2; Figure 2).Cluster 3 was characterized by a majority of Latvian SAE (90.0%) while only 23.6% of the cluster 2 cohort had Latvian SAE.The distribution of different SAEs among these clusters is illustrated in Figure 2. We found that clusters differed significantly in the proportion of T2D cases (X-squared = 111.43,df = 1, p-value < 2.2 × 10 −16 ) as well as age (W = 1.04 × 10 6 , p-value = 1.88 × 10 −9 ) and BMI (W = 1.05 × 10 6 , p-value = 9.08 × 10 −8 ) distributions (Table 2).However, despite the divergence of non-genetic factors between the two clusters, there was no significant difference in PGS002771 classification performance (the comparison between AUROC of cluster 3 and AUROC of cluster 2: D = −1.609,df = 3095.7,p-value = 0.108) (Figure 2).

Discussion
Before applying the risk prediction models, it is important to evaluate and calibrate the magnitude of the association of common genetic variants in the studies that are independent of the original genome-wide association studies (GWAS).In this study, we performed a comprehensive, independent assessment of all 102 currently published type 2 diabetes (T2D) polygenic risk scores (PGSs), using the genotypes and phenotypic data of the Latvian population from the Genome Database of Latvian Population (LGDB) dataset.Our sample consisted of 3718 genotyped individuals, representing a broad cross-section of the Latvian population.We discovered that 88% of these PGSs showed a significant association with T2D in our population, indicating the overall performance robustness of developed PGSs across different populations at least within Europe.However, the measured performance of tested PGSs was generally lower than reported by their respective source study.This difference might be explained by different study populations' genetic backgrounds, covariates included, and the limited size of the sample population used in this study, confirming the need for a standardized approach in PGS reporting and assessment [4,14].Notably, our analysis highlighted PGS002771, which outperformed 98% of the other scores.It achieved an area under the receiver operating characteristic (AUROC) of 0.669, demonstrating equal effectiveness in major population clusters, and was comparable with newly developed, fine-tuned, population-specific PGSs.Importantly, for the majority of models, we observed a strong correlation between the source-reported AU-ROC values and those obtained in our study (Figure 1C).While some studies have reported unusually high AUROC values for T2D, we were not able to replicate these, most probably indicating an inflated source value.We thus suggest that such a comparison may be used to exclude the models that do not follow the correlation trend for a particular population.
Remarkably, our findings reveal a consistent improvement in T2D PGSs over time, as illustrated in Figure 1D.Due to the larger number of developed PGSs over the last few years, there is also a greater variability in reported AUROC values.However, when the top ten scores annually are considered for evaluation, this correlation becomes highly significant.This positive trajectory in PGS performance coincides with the development of the method, with the first Bayesian regression-based T2D model published in 2018 [24] and the first PGS with continuous shrinkage priors published in 2022 [23].It is therefore

Discussion
Before applying the risk prediction models, it is important to evaluate and calibrate the magnitude of the association of common genetic variants in the studies that are independent of the original genome-wide association studies (GWAS).In this study, we performed a comprehensive, independent assessment of all 102 currently published type 2 diabetes (T2D) polygenic risk scores (PGSs), using the genotypes and phenotypic data of the Latvian population from the Genome Database of Latvian Population (LGDB) dataset.Our sample consisted of 3718 genotyped individuals, representing a broad cross-section of the Latvian population.We discovered that 88% of these PGSs showed a significant association with T2D in our population, indicating the overall performance robustness of developed PGSs across different populations at least within Europe.However, the measured performance of tested PGSs was generally lower than reported by their respective source study.This difference might be explained by different study populations' genetic backgrounds, covariates included, and the limited size of the sample population used in this study, confirming the need for a standardized approach in PGS reporting and assessment [4,14].Notably, our analysis highlighted PGS002771, which outperformed 98% of the other scores.It achieved an area under the receiver operating characteristic (AUROC) of 0.669, demonstrating equal effectiveness in major population clusters, and was comparable with newly developed, fine-tuned, population-specific PGSs.Importantly, for the majority of models, we observed a strong correlation between the source-reported AUROC values and those obtained in our study (Figure 1C).While some studies have reported unusually high AUROC values for T2D, we were not able to replicate these, most probably indicating an inflated source value.We thus suggest that such a comparison may be used to exclude the models that do not follow the correlation trend for a particular population.
Remarkably, our findings reveal a consistent improvement in T2D PGSs over time, as illustrated in Figure 1D.Due to the larger number of developed PGSs over the last few years, there is also a greater variability in reported AUROC values.However, when the top ten scores annually are considered for evaluation, this correlation becomes highly significant.This positive trajectory in PGS performance coincides with the development of the method, with the first Bayesian regression-based T2D model published in 2018 [24] and the first PGS with continuous shrinkage priors published in 2022 [23].It is therefore important to perform a continuous evaluation of new PGS models and regularly update the methodology when implementing these approaches into practice.
Most of the evaluated scores were developed using European ancestry GWAS summary statistics, derived predominantly from large-scale meta-analyses.Notably, DIA-GRAM [5] and DIAMANTE [6] served as full or partial sources for variant effect weights in nine of the top ten models.Diverging from other base GWAS, the Mahajan et al. 2018 study, which was used in three of the best-performing PGSs, incorporated pancreatic isletspecific regulatory information into its weight estimates.This approach may explain the enhanced discriminatory power these models exhibited for the T2D phenotype.Conversely, when models based on Mahajan et al., 2018, underperformed [11,28,29], they typically employed non-Bayesian methods, such as pruning and p-value thresholding (P + T) or selecting genome-wide significant variants, an approach that has previously been shown to significantly impede PGS performance [4].
Furthermore, all of the top five models employed Bayesian regression and continuous shrinkage priors, implemented in PRS-CS and its advanced version, PRS-CSx [9].Among these, four of the five models with the highest AUROC scores utilized the PRS-CS 'auto' option.The exception was the model described by Huerta-Chagoya et al., 2023, which instead conducted a grid search over the phi parameters [9].Despite the top three models relying on identical summary statistics and PGS development methods, PGS002771 stood out in our study.This model was constructed using a target set from a geographically adjacent Finnish population [15] enabling a more refined selection of region-specific variants, most likely enhancing its applicability and relevance for the Latvian population.
An essential consideration in the effective application of PGSs is the potential ethnic heterogeneity within the population.Latvia's demographic structure underwent significant changes due to the massive wave of immigration between 1945 and 1989.To assess how population subgroup-specific variants influence PGS performance, we analyzed its applicability across major population groups.These groups were identified using a hierarchical clustering algorithm.Principal component analysis (PCA) loading-based clustering revealed two primary ancestry clusters in Latvia, distinguished mainly by either Latvian or Slavic-speaking self-assigned ethnicity (SAE) presence.Despite notable differences in non-genetic factors between these clusters and the known susceptibility of PGSs to ancestral or local regional stratification [8,38], our findings indicate that the PGSs performed consistently well across both groups.This is in line with similar results from other studies in highly heterogeneous populations [2].However, it is important to note that our use of PCA loadings to adjust for population stratification may have reduced the apparent differences between these groups [39].Nonetheless, this finding supports the broader applicability of PGS, particularly for T2D, across the entire Latvian population.
Building on previous research suggesting potential benefits of population-specific, finetuned PGSs [3], we best practiced for this approach by applying them to the three models with the highest T2D discriminatory power: PGS002771, PGS003443, and PGS002308.All these models used the same GWAS association weights source [6], a model construction method incorporating continuous shrinkage prior via PRS-CS, and, except for PGS003443, an automatic global shrinkage parameter phi.Our exploration of a broader range of phi parameters revealed that phi = 1 × 10 −3 yielded the best performance, suggesting a less polygenic nature of T2D etiology in contrast to previous findings like Huerta-Chagoya et al., 2023 (PGS003443) [17], where phi = 1 × 10 −2 was identified as the most effective.Although our development of population-specific PGS did not significantly enhance performance metrics, integrating a local linkage disequilibrium (LD) reference and specific GWAS weights could potentially improve results [3,4].
Our study had some weaknesses.Demographic and anthropometric characteristics were missing for some study subjects (2.2-3.3%).Nevertheless, we decided not to exclude these samples to reach the higher power for analysis.In addition, median age was different between cases and controls, and this may have some impact on the PGS performance.
Although most PGSs were strongly associated with T2D status, efficacy for discriminating for high-T2D-risk individuals currently do not reach clinical utility [40] and might even exaggerate health disparities [41].Further improvements, however, might be achieved by the incorporation of T2D pathway-specific information for weight estimation [6,42] and augmented weight priors tailored specifically for T2D etiology [9,10].

Cohort Description and Data Selection
This study utilized a total of 1630 patients with type 2 diabetes (T2D) and 2360 control subjects, all selected from the Genome Database of the Latvian Population (LGDB) [13].The T2D group comprised participants who had a clinical diagnosis of T2D, as indicated by the International Classification of Diseases-10 (ICD-10) code E11, and for whom genotype data were available.For the control group, we applied two key exclusion criteria: (1) we excluded any LGDB participant who had a clinically reported or self-reported diagnosis of diabetes (codes E10-E14 in the ICD-10 classification) and ( 2) those with a history of using antidiabetic treatments.In addition to the genotype data, we collected information on anthropometric measures and self-reported ethnicity at the time of each participant's enrolment in the LGDB.Written broad consent was obtained from every subject during the recruitment in LGDB (Approval by Central Medical Ethics Committee No. 01-29.1.2/6407).This study was conducted in accordance with the Declaration of Helsinki.The study protocol was approved by the Central Medical Ethics Committee of Latvia (Approval No. 01-29.1/2223).

Genotype Quality Control
Selected genotypes consisted of six batches genotyped with the Infinium Global Screening Array (Illumina, California, USA) on the iScan System microarray scanner (Illumina, USA) from 2016 to 2022 with 192 to 768 samples per batch.Each batch underwent quality control and harmonization, with the merged set resulting in 3990 individuals (1630 cases and 2360 controls) and a total number of 115,362 variants across autosomes with genotype missingness of <0.01.LGDB samples were subsequently imputed with the GRCh38 TOPMed R2 1.0.0 [43] imputation panel using the Michigan imputation server and variants were selected with imputation quality score Rsq > 0.3.
For post-imputation quality, control parameters recommended for polygenic risk score (PGS) analysis were used [44,45].Variants were selected with genotyping rate > 0.99, heterozygosity P > 1 × 10 × 10 6 , and minor allele frequency (MAF) > 1%.We did not remove variants with an allele count below 100 in both control and case groups as external linkage disequilibrium (LD) reference of 1000G European ancestry individuals was used in this study [44].Sample-wise, individuals were excluded if missingness exceeded > 0.05, and heterozygosity was calculated outside three standard deviations (SD) of PLINK-het and pi_hat above the 0.1875 threshold set for second-degree relatives.For filtering steps, we used PLINK [46] and Bcftools [47].

Polygenic Risk Score Calculation
To calculate T2D PGSs available in PGScatalog, genotype dosages were used in PGScatalog/pgsc_calc v2.0.0-alpha.3nextflow pipeline [1].To correct for the impact of population stratification by addressing differences in mean and variance of PGSs among genetic ancestry groups and identifying population structure based on sample principal component analysis (PCA) loadings, the parameter -run-ancestry was set.The chosen approach adjusts PGS using the first 10 PCA loadings and normalizes the score based on standard deviation in the merged 1000G [48] and Human Genomes Project (HGP) reference population [49].

PGS Evaluation
To evaluate the performance of 102 available T2D PGSs, standard quality metrics for each score were calculated using R (v4.2.1, Vienna, Austria) [50].The area under the receiver operator characteristic curve (AUC) was assessed using a pROC v1.18.5 package [51], comparing PGS values with LGDB-defined T2D phenotype.R package psych v2.3.9 [52] was used for correlation coefficient calculation while epiDisplay v3.5.0.2 [53] allowed us to extract odds ratio (OR) over SD increase from the generalized linear model (glm) result.These metrics were subsequently compared with corresponding values reported by the PGS source study.As the AUC value was not reported by PGS002771, we predicted PGS AUC from the reported OR, given a 0.962 correlation between AUC and OR values in other included PGS.For stratified evaluation among age groups, the cohort was divided into tertiles (first tertile: minimum age = 5-maximum age = 47, second: 47-59, third: 59-94).
Additionally, for the best-performing PGS, separate evaluations of its performance in major PCA clusters were assessed.Clusters were formed based on Euclidean distance between the first 10 principal components using hierarchical clusterization implemented in R function hclust [50].A number of optimal clusters were selected manually based on the evaluation of clusterization accuracy by adjusted rank index (ARI) as implemented in R package mclust v6.0.1 [54].Data on participants' self-assigned ethnicity was used as ground truth labels.
We used multiple methods to assess the relationships between genetic and non-genetic factors in relation to T2D status.For non-parametric tests, we used a two-sample Wilcoxon test while significant differences between receiver operator characteristic (ROC) curves were assessed using roc.testfunction [51].Association with T2D status was tested using a generalized linear model for binomial outcomes as implemented in the glm function, with further calculating Negelkarke's R2, a pseudo-R2 statistic as implemented in the fmsb v0.7.5 package [55].The p-value was adjusted according to the Benjamini-Hochberg (BH) procedure by applying the p.adjust function on the glm-produced vector of p-values in R [50].Family-wise error rate was corrected using the Bonferroni method dividing 0.05 by the number of evaluated PGS (n = 102).Trendlines for Figure 1C,D were produced with R ggplot::geom_smooth (method = 'lm') function [50].

Conclusions
In conclusion, our study provides the first comprehensive evaluation of T2D PGSs within the Latvian population, revealing the top-performing model, PGS002771, and affirming the general robustness of the majority of assessed PGSs in this previously unstudied European cohort.Notably, our findings underscore a positive trajectory in the continuous improvement of T2D PGSs over time, emphasizing the need for ongoing research and updates to enhance their precision and utility in diverse populations and clinical settings.Applying the best practices from previously developed PGSs, we constructed a population-specific T2D PGS that matched the performance of PGS002771.Finally, our analysis revealed the consistent and effective performance of PGSs across major LGDB population ancestry clusters, supporting the broader applicability of T2D PGS across the entire Latvian population.
Funding: This study was supported by the European Regional Development Fund under project No. 1.1.1.1/20/A/126,"An integrated population-based Latvian genome reference and its applicability to personal risk estimation for metabolic traits".

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Central Medical Ethics Committee of Latvia (Approval No. 01-29.1/2223).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Figure 2 .
Figure 2. Evaluation of PGS002771 in major population clusters of the LGDB cohort.(A) PCA of population clusters, (B) proportion of individuals in each cluster depending on their self-assigned ethnicity, (C) T2D OR comparison in PGS002771 quartile with the 1st quartile as the reference, (D) T2D classification ROC curves comparison between the clusters.LGDB: Genome Database of Latvian Population, PCA: principal component analysis, T2D: type 2 diabetes, OR: odds ratio.

Figure 2 .
Figure 2. Evaluation of PGS002771 in major population clusters of the LGDB cohort.(A) PCA of population clusters, (B) proportion of individuals in each cluster depending on their self-assigned ethnicity, (C) T2D OR comparison in PGS002771 quartile with the 1st quartile as the reference, (D) T2D classification ROC curves comparison between the clusters.LGDB: Genome Database of Latvian Population, PCA: principal component analysis, T2D: type 2 diabetes, OR: odds ratio.

Figure 3 .
Figure 3.Comparison of the population-specific PGSlvbmc1e_3 and PGS002771.(A) Phi global shrinkage parameter effect on the PGS AUROC, (B) OR comparison in PGS quartiles with the first quartile as the reference, (C) discrimination power comparison ROC curves.PGS: polygenic risk score, AUROC: area under the receiver operating characteristic, OR: odds ratio, TPR: true positive rate, FPR: false positive rate.

Figure 3 .
Figure 3.Comparison of the population-specific PGSlvbmc1e_3 and PGS002771.(A) Phi global shrinkage parameter effect on the PGS AUROC, (B) OR comparison in PGS quartiles with the first quartile as the reference, (C) discrimination power comparison ROC curves.PGS: polygenic risk score, AUROC: area under the receiver operating characteristic, OR: odds ratio, TPR: true positive rate, FPR: false positive rate.

Table 1 .
Development methods and performance metrics of selected 10 polygenic risk scores with the highest type 2 diabetes discriminatory power in the cohort of the Latvian population.

Table 2 .
Cohort characteristics of two major Hclust clusters based on the first 10 PCs in the population of Latvia.