Spline Analysis of Biomarker Data Pooled from Multiple Matched/Nested Case–Control Studies

Simple Summary This paper presents methods to pool continuous biomarker measurements from multiple studies to estimate the dose–response curves that allow for the nonlinear association between biomarker values and disease risks in matched/nested case–control studies. The approach can be easily applied to pooling projects of cancer studies and user-friendly software for implementing the method can be found on the corresponding author’s website. Abstract Pooling biomarker data across multiple studies enables researchers to obtain precise estimates of the association between biomarker measurements and disease risks due to increased sample sizes. However, biomarker measurements often vary significantly across different assays and laboratories; therefore, calibration of the local laboratory measurements to a reference laboratory is necessary before pooling data. We propose two methods for estimating the dose–response curves that allow for a nonlinear association between the continuous biomarker measurements and log relative risk in pooling projects of matched/nested case–control studies. Our methods are based on full calibration and internalized calibration methods. The full calibration method uses calibrated biomarker measurements for all subjects, even for people with reference laboratory measurements, while the internalized calibration method uses the reference laboratory measurements when available and otherwise uses the calibrated biomarker measurements. We conducted simulation studies to compare these methods, as well as a naive method, where data are pooled without calibration. Our simulation and theoretical results suggest that, in estimating the dose–response curves for biomarker-disease relationships, the internalized and full calibration methods perform substantially better than the naive method, and the full calibration approach is the preferred method for calibrating biomarker measurements. We apply our methods in a pooling project of nested case–control studies to estimate the association of circulating Vitamin D levels with risk of colorectal cancer.


Introduction
Pooling biomarker data from multiple studies can result in more precise estimates of the associations between the biomarker levels and disease risks due to increased sample sizes and can also facilitate subgroup analysis [1][2][3]. Examples of pooling project examining biomarker-disease associations include the Circulating 25-Hydroxyvitamin D (25(OH)D) and Colorectal Cancer [4], the 25(OH)D and the Risk of Rarer Cancers [5], the Endogenous Hormones, Nutritional Biomarkers and Prostate Cancer Collaborative Group Studies [6,7] and the NCI Breast and Prostate Cancer Cohort Consortium [8].
A statistical challenge in the pooling analysis is potential between-study variation in the biomarker measurements stemmed from assay or laboratory differences. For instance, the measurements of serum 25(OH)D concentration can vary from 20-40% across different assays, laboratories and even seasons of a year [9][10][11]. Also, hormones such as estradiol and testosterone have high variation across assays and laboratories [1][2][3]. Sloan et al. [12] proposed methods for pooling biomarker data from matched/nested case-control studies based on the regression calibration approach [13,14], which is widely used for handling the covariate measurement error problems. To be specific, a calibration procedure is performed by first selecting a reference laboratory, to which a subset of biospecimens from each study are sent for re-assaying. A calibration model is then fitted for each study based on the study-specific local laboratory and the reference laboratory measurements among the subset of biospecimens selected for re-assaying. The models are used to impute the reference laboratory measurements for the remaining observations with only local laboratory measurements available. Due to their rarity, cases are not usually used for assay calibration. Instead, controls are usually chosen for re-assaying in a reference laboratory [15].
However, the existing methods assume that biomarker measurements have a linear relationship with the log relative risk (RR) of diseases [12]. The nonlinear dose-response relationship between biomarker measurement and disease risk is often observed in practice [16,17]. Restricting the biomarker-disease relationship to be linear may lead to biased estimates when the true association is nonlinear. In this paper, we proposed the method for pooling matched/nested case-control studies to allow for a nonlinear relationship between biomarker levels and log relative risk of diseases by using spline functions [12,18]. We first calibrate biomarker measurements across multiple studies. We then obtain estimates for the coefficients of the spline functions based on an approximate conditional likelihood. We also propose an analytic variance formula for the estimated parameters of interest, which takes account of the uncertainty due to estimating the calibration parameters.
In Section 2, we present the models and statistical methods. In Section 3, we evaluate the performance of our methods in simulation studies. In Section 4, we apply the methods to a pooling project investigating the relationship between circulating Vitamin D levels and colorectal cancer, and finally, we present a discussion in Section 5.

Notation and Model
The logistic regression model with spline functions can be written as: where s ∈ {1, . . . , S} index the studies in the pooling projects, where the first Q studies only use local laboratories for the measurement of biomarkers and thus require calibration; n sj and m sj denote the number of cases and controls, respectively, in the j-th stratum of the s-th study, and within each stratum, i ∈ {1, 2, . . . , m sj } denotes controls and i ∈ {m sj + 1, m sj + 2, . . . , m sj + n sj } denotes cases. Y sji denotes the binary disease status; X sji is the biomarker measurements from the reference laboratory, which is included in the model through a nonlinear function f (·), and Z sji is a p-dimensional vector of potential confounders. β 0sj is the study and stratum specific intercept, and β X and β Z are vectors of the corresponding regression coefficients. Without further specification, all vectors are column vectors in this paper. Note that, in nested case-control studies with density sampling [19], β T X [ f (X * ) − f (X * * )] represents the log relative risk (RR) comparing two distinct biomarker levels X * and X * * . We focus on methods for point and interval estimates of β X in this paper. To model the possible nonlinear relationship between the biomarker and disease risk, we propose to use the restricted cubic spline method [18]. Our approach works for other spline functions as well. The restricted cubic spline method has advantages of being parsimonious, while allowing for flexibility in characterizing nonlinear curves. Typically, the model fit is not heavily affected by the location of knots, but depends more on the number of knots selected. The recommended numbers of knots are 3, 4, 5, 6, and 7 [20].
Let W sji be the biomarker measurements from local laboratories. For each study using a local laboratory, a subset of samples were selected and sent to the reference laboratory for re-assaying to obtain reference biomarker measurements; we refer to this subset as the calibration subset. Therefore, for studies that use local laboratories for biomarker measurements, X sji are available only in the calibration subsets, and W sji are available for all individuals. Since the local laboratory measurements can vary systematically across different studies, using W sji instead of X sji in data analysis can lead to biased estimates of the biomarker-disease relationship.

Approximate Conditional Likelihood and Calibration Methods
Let vectors X sj and W sj and matrix Z sj contain the corresponding measurements of individuals from the j-th stratum of the s-th study. The conditional likelihood contribution from the j-th stratum of the s-th study using the reference laboratory measurements is: where A sj is the set of all subsets of indices of size n sj from {1, 2, . . . , m sj , m sj + 1, . . . , m sj + n sj }; {i 1 , i 2 , . . . , i n sj } corresponds to one specific such subset in A sj , and A sj is A sj with the subset {i 1 = m sj + 1, i 2 = m sj + 2, . . . , i n sj = m sj + n sj } excluded [19]. However, the conditional likelihood function based on L * sj cannot be calculated directly, since X sj is not available to all individuals in the studies that only use local laboratories to measure the biomarkers. To derive an approximate observed conditional likelihood under the matched/nested case-control study design, we make the following 'surrogacy' assumption [12] that takes into account of the study design: This assumption implies that the local laboratory measurements W sj do not contain additional information about the outcome, given the reference laboratory measurements, other covariates of interest, and the matched/nested case-control study design scheme.
Under this surrogacy assumption, the observed likelihood contribution from a stratum based on local laboratory biomarker measurements L sj =P Y sj1 = 0, . . . , Y sj,m sj = 0, Y sj,m sj +1 = 1, . . . , Y sj,m sj +n sj = 1|W sj , Z sj , can be written as: where L * sj is defined in Equation (1). For Equation (2), we further expand L * sj in Taylor series around X sj = E X sj |W sj , Z sj , ∑ m sj +n sj i=1 Y sji = n sj , yielding the following approximate likelihood contribution from the j-th stratum of the s-th study: This approximation performs best when the conditional variance and covariance of X sj are small or when the biomarker effect is not strong. Section S1 of the Supplementary Materials provides a detailed derivation of the approximate observed conditional likelihood and the conditions when the approximation works well.
To obtain an estimate of X sji , for the studies that use local laboratories, the studyspecific calibration models can be fitted in the calibration subsets where subjects were selected for re-assaying in the reference laboratory; these subjects therefore have biomarker measurements from both the local and reference laboratories. The calibration models can thus be used to impute the reference biomarker measurements for the remaining subjects of each study that only have local laboratory measurements.
We make the calibration assumption [12] that, given the local laboratory measurements, the mean reference laboratory measurements are approximately independent from other covariates, that is: We further assume a linear relationship between the reference laboratory measurements and local laboratory measurements, leading to the following calibration model: where a s and b s are study-specific model parameters. Note that these calibration parameters are the same across different strata in each study. However, we can relax this constraint by including matching factors in the calibration model; that is, assuming where M sji is the vector of matching factors. Sloan et al. [15] suggested that Model (3) is sufficient in most study settings. In (3), although a linear term of W sji is typically sufficient to model the X sji -W sji relationship, nonlinear terms in W sji can also be included if appropriate.
The study-specific calibration models are usually fitted only among controls because case biospecimens are often not available in the calibration study subsets. Therefore, the calibration model in practice is typically: where we use a s,co and b s,co to denote the parameters in the calibration model fitted among controls only. Note that a s,co and b s,co are generally not consistent estimates of a s and b s in (3). Sloan et al. [12] provide conditions for a s,co ≈ a s and b s,co ≈ b s under the bivariate normality of X sj and W sj in a 1:1 matched/nested case-control study. It is straightforward to generalize their results to the n sj : m sj matching scenarios. Specifically, b s,co ≈ b s when In addition, if the biomarker effect is small (i.e., β X ≈ 0), a s,co and b s,co will also be close to a s and b s .
In studies that used the local laboratory for measurement, for the internalized calibration method,the biomarker value in the approximate likelihood L sj is X sji = X sji if reference laboratory measurement X sji is available, and X sji = E(X sji |W sji , Y sji = 0) otherwise. For the full calibration method, X sji = E(X sji |W sji , Y sji = 0) regardless of whether reference laboratory measurement X sji is available or not [12]. Therefore, for studies using local laboratories for biomarker measurement, all participants' biomarker measurements are calibrated under the full calibration method while, under the internalized calibration method, the biomarker measurements are calibrated for participants who only have local laboratory measurements available.

Parameter Estimation
are the estimating functions for their corresponding parameters. Section S2 in the Supplementary Materials contains the technical details.
We can obtain the point estimate of β using a two-step pseudo-maximum likelihood method [21]. In the first step, the estimates of a and b of the calibration models are obtained by fitting linear regressions on the subset of controls chosen for re-assaying in the reference laboratory, and in the second step, β is obtained using pseudo-maximum conditional likelihood method by solving the estimating equations ψ β X ( a, b), ψ β Z ( a, b) = 0, where a and b are the estimates obtained in the first step. We can use the sandwich variance formula over the joint estimating equations . See Section S3 of the Supplementary Materials for technical details.

Simulations
We performed a simulation study for a 1:1 matched case-control study design. Define sji as the error term in the linear model of X sji on W sji . We assumed a similar multivariate normal distribution of X sji , W sji and sji , as in Sloan et al. [12], such that: This distribution yields the calibration model E(X sji |W sji ) = a s + b s W sji and Cov(W sji , sji ) = 0. The data were generated for each stratum of each study first, and then a case and a control were randomly chosen in each stratum. In the simulation, we set µ x = 0 and σ 2 x = 1. We assumed four studies in the pooled analysis with 500 case-control pairs (i.e., 1000 total subjects) in each study, and the calibration parameters were set to be a = [−3, 1, −1, 3] and b = [0.5, 0.75, 1.25, 1.5]. We set Var(W sji ) = σ 2 ws = [3.8, 1.7, 0.6, 0.4] to ensure a wide range of variation in local laboratory measurements. The stratum-specific intercept β 0sj was assumed to follow a normal distribution with a mean of 0 and a variance of 0.01, which was chosen to save computer time in the data generation process.
The spline functions were chosen to be restricted cubic splines and for presentational simplicity, we chose three knots, fixed at the (25th, 50th, 75th) quantiles of N(0, 1). We assumed the following risk model without additional covariates: t 3 −t 2 , and t 1 , t 2 , t 3 are the three knots mentioned above [18]. Note that β X 2 = 0 implies a linear relationship between the biomarker level and the log relative risk of disease.
The simulations were performed 1000 times for different combinations of (β X 1 , β X 2 ), and calibration proportions, defined as the proportion of controls re-assayed in the reference laboratory. We set |β X 1 | at relatively large values to evaluate the performance of our method, even when the effect of the biomarker is relatively strong, and β X 2 was determined so that the nonlinear relationships across the ranges of β X 1 considered in the tables is moderate. The calibration proportions were 5%, 10%, or 30% in each contributing study, which represents a reasonable range for the size of the calibration subsets in practice.
We compared the performance of both the internalized (IN) and full (FC) calibration methods in terms of percent bias ( β−β β × 100%) over the simulation replicates and coverage rate, which is defined as the proportion of the estimated 95% confidence intervals containing the true value. We also included the naive (N) method for comparison, where no calibration was performed, and the conditional logistic regression was fitted using the local laboratory measurements directly.
The simulation results in Tables 1 and 2 are for a biomarker that had an inverse effect on the disease risk and the Supplementary Tables S1 and S2 in Supplementary Materials are for a biomarker that had a positive association with the disease risk. The naive method performed poorly in all scenarios regardless of the calibration proportions. The percent biases of the naive estimates were typically larger than 30%, and the coverage rates were typically below 70%. The internalized and full calibration estimates had consistently better performance than the naive method. Full calibration estimates were robust over many combinations of coefficients and calibration proportions, where the percent biases were typically below 10% when the calibration proportion was 5% and below 5% when calibration proportions were 15% and 30%. The coverage rates ranged from 93% to 97%, which were close to the 95% nominal level. The internalized calibration estimates were less robust than the full calibration estimates, and they tended to be more biased when the calibration proportion was large. Section S4 of the Supplementary Materials contains a mathematical justification for the relative performance of the internalized and full calibration methods.
We also plotted the curves reflecting the biomarker-disease association. The x-axis represents the biomarker values, and the y-axis is the log RR. Figure 1 is for the scenario where β X 1 = − log(1.5) ≈ −0.41 and β X 2 = 0.14, and the calibration proportions were set to be 5% and 30%, respectively. We can see that the curve estimated using the naive method deviated from the true curve substantially, while the curves estimated using the internalized and full calibration methods were closer to the true curve. As the calibration proportion increased, the curve estimated using the full calibration method was closer to the true curve than the internalized calibration method. Supplementary Figure S1 in the Supplementary Materials is for the scenario when β X 1 = log(1.75) ≈ 0.56 and β X 2 = −0.16, and the calibration proportions were also set to be 5% and 30%, respectively. We can see similar behaviors of the three estimated curves, where the full calibration method led to the estimated curve that was closest to the true curve and was robust over all calibration proportions.
In addition, we changed σ 2 ws in the simulation setup to vary the ratio of σ 2 ws σ 2 x . This ratio for each study was chosen among 0.75, 0.85, 0.90, and 0.95. The simulation results in Table 3 shows that the performance of the calibration models improved when this variance ratio increased; that is, when the error term in the calibration model X|W became smaller. The full calibration method was more robust than the internalized calibration method with smaller percent biases and coverage rates closer to the 95% nominal level for all the variance ratios considered.  Coverage rate is the proportion of simulations that yield a 95% confidence interval covering the true parameter. Standard deviation is the square root of the empirical variance of parameter estimates over all replicates; we report 10 3 times the standard deviation. The calibration proportions (denoted as Calib. size in the table) were set to be 5%, 15%, and 30%. β X 2 is fixed at 0.08.

Calib. Size
calibration (FC), and Naive methods (N). Relative bias is computed using β−β β , and the reported value in the table is the average over the 1000 simulation replicates. Coverage rate is the proportion of simulations that yield a 95% confidence interval covering the true parameter. Standard deviation is the square root of the empirical variance of parameter estimates over all replicates; we report 10 3 times the standard deviation. The calibration proportions (denoted as Calib. size in the table) were set to be 5%, 15%, and 30%. β X 1 is fixed at − log(1.5).  . Relative bias is computed using β−β β , and the reported value is the average over the 1000 simulation replicates. Coverage rate is the proportion of simulations that yield a 95% confidence interval covering the true parameter. Standard deviation is the square root of the empirical variance of parameter estimates over all replicates; we report 10 3 times the standard deviation. The calibration proportions (denoted as Calib. size in the table) were set to be 5%, 15%, and 30%. β X 1 = −0.25, β X 2 = 0.08.  Lastly, we set the coefficient of the nonlinear term to 0 (i.e., β X 2 = 0) and evaluated the performance of our methods for testing the null hypothesis of no nonlinear effect. Simulation results are reported in Supplementary Tables S4 and S5. Both the full and internalized calibration methods have coverage rates of 95% confidence intervals close to 95% for β X 2 = 0, suggesting that when there is no nonlinear effect, the Wald test based on the point and variance estimates in the full and internalized calibration methods will have a type-I error rate close to 0.05. However, the naive method has coverage rates ranging from 8% to 70%, indicating that directly pooling biomarker measurements together without calibration may lead to false positive evidence for a nonlinear effect.

Applied Example
We applied our methods to evaluate the association of 25(OH)D with colorectal cancer incidence. This example was based on two large cohort studies in the United States: Nurses' Health Study (NHS) [22] and Health Professionals Follow-up Study (HPFS) [23]. In the NHS, 121,701 female nurses aged 30 to 55 were enrolled in 1976, while in the HPFS, 51,529 male health professionals aged 40 to 75 were enrolled in 1986. To account for assay differences or laboratory drift over time, within each study, all measurements were calibrated to a common assay prior to the analyses. In all, our pooling analysis consisted of 1876 subjects with a nested case-control study design. The matching factors mainly included age at blood collection and date of blood collection. For the calibration subsets, controls in each study were divided into 10 deciles based on the 25(OH)D levels, and three subjects were randomly sampled in each decile, except for one decile, where only two subjects were selected [4]. A total of twenty-nine controls in each nested case-control study were selected to have their blood samples re-assayed at Heartland Assays, LLC (Ames, IA, USA), the reference laboratory, from 2011 to 2013. We refer readers to the paper by McCullough et al. [4] for detailed patient selection criteria. Table 4 presents sample sizes and parameter estimates along with standard errors of the study-specific calibration models. The potential confounders that were adjusted for in the conditional logistic regression model included smoking (yes/no), BMI (greater or less than 25), physical activity (continuous), and family history of myocardial infarction (yes/no). We used the restricted cubic spline with three knots at the 25%, 50%, and 75% quantiles of the reference 25(OH)D measurements to estimate how the log RR changes with the Vitamin D levels. Table 5 presents the coefficient estimates along with corresponding 95% confidence intervals. As shown in Table 5, we obtained similar point and confidence interval estimates of coefficients from the internalized and full calibration methods. We observed a significant linear relationship between 25(OH)D measurements and log RR of colorectal cancer (p-value = 0.0211 and 0.0217, for the internalized and full calibration methods, respectively), while the nonlinear term was not significant (p-value = 0.2162 and 0.2219, for the internalized and full calibration methods, respectively). Therefore, we concluded that circulating Vitamin D level had a significant linear association with the log relative risk of colorectal cancer. Table 5. Point estimates and 95% confidence intervals for the nonlinear (and linear) association of circulating 25(OH)D (nmol/L) with colorectal cancer after adjusting for BMI (overweight or not), physical activity (continuous), smoking (never/ever), and family history of colorectal cancer (yes/no). After dropping the nonlinear term from the conditional logistic regression model and refitting the model using the linear method proposed by Sloan et al. [12], in Table 5, the point estimate of the biomarker effect on the log RR of colorectal cancer based on full calibration method was −0.0059 (RR = 0.9941) for a 1 nmol/L increase in 25(OH)D, with a p-value of 0.0177, and the 95% confidence interval was (−0.0108, −0.0010), suggesting a significant negative linear relationship between levels of circulating 25(OH)D measurements and the log relative risk of colorectal cancer.
In Figure 2, we plotted the log RR of colorectal cancer on circulating 25(OH)D measurements under both models with and without the nonlinear spline term. We set the reference level to be individuals with the minimum 25(OH)D measurement 9.734 nmol/L in the aggregated study.

Discussion
In this paper, we propose statistical methods for analyzing pooled matched/nested case-control studies. To apply our method, an assumption is that, for each study, the relationship between the measurements from the local laboratory and those from the reference laboratory estimated in the calibration subset represents that in the entire study. This assumption is likely to be violated for laboratories that do not follow good laboratory practices, even if the calibration subset is selected randomly. Our methods can estimate a possibly nonlinear dose-response curve between biomarker measurements and the diseases and can evaluate whether the relationship is linear or not. We focus on the common situation in which only controls are selected for re-assaying in the reference laboratory. We derived an analytic expression for the variance-covariance matrix of the estimated coefficients in the conditional logistic regression model that takes account of the uncertainty from fitting the study-specific calibration models.
Several remarks and recommendations can be drawn from our work. The full calibration method led to estimates with smaller percent biases in all simulation scenarios, and coverage rates were closer to the 95% nominal level. As the calibration proportion increased, the internalized calibration method became more biased than the full calibration method. Since the calibration model was fitted among controls only, estimates of the model parameters were slightly biased. The bias in the intercept was canceled out in the approximate likelihood function in the full calibration method but not in the internalized calibration method. Therefore, we recommend using the full calibration method for analyses that require the calibration of local laboratory measurements to a reference laboratory. When designing a new study, as discussed in Sloan et al. [15], the calibration set sample size should be sufficiently large so that the estimates in the calibration models are stable. Based on our simulation study, the full calibration method can perform reasonably well when the calibration proportion is at least 5% in studies with 500 case-control pairs and the laboratory errors are moderate.
Our method can be used to estimate dose-response curves between biomarkers and disease risks. Based on the proposed analytic variance estimators that account for the calibration process, the method can also be used to perform statistical tests to evaluate the existence of nonlinear trends. When, in fact, β X 2 = 0, the estimated coefficient of the linear term from our method including both linear and nonlinear terms should be similar to that from the linear model method by Sloan et al. [12], but the variance in the estimated coefficient of the linear term from our model may be slightly larger than that from the model including the linear term alone due to estimating additional coefficients for the nonlinear terms in our method. If there is not enough evidence to reject the null hypothesis of no nonlinear effects, we recommend re-fitting the model using the linear method proposed by Sloan et al. [12].