Correlation of Repeat Measurements of 27 Candidate Protein Markers for Colorectal Cancer Screening Taken Three Years and Multiple Freeze–Thaw Cycles Apart

In recent years the blood proteome has been increasingly researched for biomarkers for early detection of colorectal cancer (CRC). Blood samples from screening studies are often subject to preanalytical variability and repeated freeze–thaw cycles. We aimed to assess the correlation of repeat measurements of 27 candidate protein markers for CRC screening taken three years and multiple freeze–thaw cycles apart. The concentrations of 27 protein markers were measured in plasma samples of 39 newly detected CRC cases from a cohort of 9245 participants of screening colonoscopies. The proteins were measured using proximity extension assays (PEA) carried out on the same set of samples twice, three years apart, with an average of three freeze–thaw cycles in between the two measurements. Pearson’s product moment correlation coefficients were calculated. Correlation coefficients ranged from +0.43 to +0.97, with a median of 0.67 and an interquartile range of +0.58 to +0.84, with all p-values of correlation being <0.01 (<0.0005 for 22 proteins, <0.001 for 4 proteins). Repeat measurements of the 27 protein biomarkers for CRC screening performed three years later, and on average three freeze–thaw cycles apart, showed moderate to high levels of correlation. Apart from the effects of freeze–thaw cycles, slightly different preprocessing performed on the data may have contributed to recorded differences between measurements.


Introduction
In recent years the blood proteome has been increasingly researched for biomarkers for early detection of various cancers, including colorectal cancer (CRC) [1][2][3][4][5]. It is often the case in cancer screening studies for blood samples to be stored in freezers and used for multiple measurements of candidate biomarkers for early detection. If the samples are stored in bigger aliquots as a result of space constraints in large-scale screening studies, they may undergo several freeze-thaw cycles before being used for measurements of candidate biomarkers for early detection of cancer. A commonly expressed concern in this context is if, and to what extent, the measurements might suffer from freeze-thaw cycles. Although adverse effects of freeze-thaw cycles are plausible, no previous study has, to the best of our knowledge, empirically assessed their possible impact in cancer screening studies. Therefore, the objective of this study was to assess the effects of preanalytical freeze-thaw cycles on plasma proteins measured as potential biomarkers for CRC early detection among CRC cases recruited from a screening colonoscopy cohort.

Study Design
Blood samples were selected from participants of a screening colonoscopy collected during the BLITZ study. Details of the BLITZ study design have been reported previously [6][7][8][9][10][11]. Briefly, BLITZ is an ongoing prospective screening study of participants undergoing colonoscopies as a primary screening exam. The German screening colonoscopy program, introduced in October 2002, offers up to two screening colonoscopies at least 10 years apart to men and women aged 55 years or older [12] (starting age was lowered to 50 years for men in 2019) and includes strict measures for quality assurance. Screening colonoscopies are mostly conducted in gastroenterology practices by accredited, highly qualified and experienced endoscopists. In the BLITZ study, participants have been recruited from 20 gastroenterology practices since the end of 2005. Participants are invited to donate prediagnostic blood and stool samples and fill out self-administered questionnaires during preparatory visits at these practices typically one week before the screening colonoscopy. By the end of June 2016, out of 9425 participants in BLITZ, CRC and advanced adenoma (AA) had been detected in 64 and 633 participants, respectively. In a previous study from our group [8,9], 254 plasma samples from participants of the BLITZ study, including 41 plasma samples from participants diagnosed with CRC at the screening colonoscopy, were selected for protein measurements in the year 2015. In a later study conducted in 2018, plasma samples of partly overlapping 270 BLITZ participants, including 56 participants with CRC, were selected for another set of protein measurements [13]. Overall, there was an overlap of 39 participants with CRC and 27 protein markers included in both studies. The paired measurements in 2015 and 2018 from the same blood samples of 39 CRC patients were used for evaluating the correlation of protein measurements performed three years and three freeze-thaw cycles apart.
In addition to the paired measurements in the same plasma samples of 39 CRC cases, the same 27 protein markers were also measured in plasma samples from 181 and 102 control samples free of neoplasms from the 2015 and 2018 studies, respectively [8,9,13]. However, in contrast to paired-measurement cases, there was no overlap between the control samples included from the two studies. Therefore, correlations between repeat measurements that were several freeze-thaw cycles and several years apart could not be calculated for these controls. However, in order to evaluate the potential impact of freeze-thaw cycles and storing time of blood samples on diagnostic performance, we additionally assessed and compared, for each of the 27 protein markers, differences in the measurements performed among the 39 CRC cases and 102 controls in 2018, and differences in the measurements performed among the same 39 CRC cases and 181 controls in 2015. The STARD diagram showing selection of participants from the BLITZ study is presented in Figure 1 The BLITZ study has been approved by the ethics committees of the Medical Faculty of University of Heidelberg (S-178/2005), and of the physicians' boards of Baden-Wuerttemberg (M118-05-f), Rhineland-Palatinate (837.047.06(5145)) and Saarland (217/13). The BLITZ study adheres to the standards set by the Declaration of Helsinki, and all study participants voluntarily provided written informed consent.

Sample Collection and Storage and Lab Assay
The blood draw was performed prior to first diagnosis at recruitment in the BLITZ study. After blood draw, Ethylenediaminetetraacetic acid (EDTA) plasma samples were transported to the laboratory while preserved in a cold transport chain, followed by centrifugation at 2000-2500× g for 10 min at 4 • C and then stored at −80 • C until picked out for the protein measurements. Abbreviation: CRC, colorectal cancer. * The exclusion criteria for selection of CRC cases were not applicable after this point. # Participants selected from research study [13]. § Participants selected from research studies [8,9].

Sample Collection and Storage and Lab Assay
The blood draw was performed prior to first diagnosis at recruitment in the BLITZ study. After blood draw, Ethylenediaminetetraacetic acid (EDTA) plasma samples were transported to the laboratory while preserved in a cold transport chain, followed by centrifugation at 2000-2500× g for 10 min at 4 °C and then stored at −80 °C until picked out for the protein measurements.
Protein concentrations in plasma samples were measured utilizing the proximity extension assay (PEA) offered by Olink. Olink's multiplex panels allow simultaneous analysis of 92 biomarkers and 4 internal controls in samples of 1 µL [14]. Briefly, the 96 pairs of oligonucleotide-labelled antibodies are allowed to pairwise bind to target proteins; when in close proximity, a PCR reporter sequence is formed as a result of DNA polymerization, which is quantified by real-time PCR. The first measurement was performed in 2015 for 92 proteins that were measured using Olink's Proseek ® Multiplex Oncology I Figure 1. STARD (Standards for Reporting Diagnostic Accuracy Studies) flow diagram of the BLITZ study. Abbreviation: CRC, colorectal cancer. * The exclusion criteria for selection of CRC cases were not applicable after this point. # Participants selected from research study [13]. § Participants selected from research studies [8,9].
Protein concentrations in plasma samples were measured utilizing the proximity extension assay (PEA) offered by Olink. Olink's multiplex panels allow simultaneous analysis of 92 biomarkers and 4 internal controls in samples of 1 µL [14]. Briefly, the 96 pairs of oligonucleotide-labelled antibodies are allowed to pairwise bind to target proteins; when in close proximity, a PCR reporter sequence is formed as a result of DNA polymerization, which is quantified by real-time PCR. The first measurement was performed in 2015 for 92 proteins that were measured using Olink's Proseek ® Multiplex Oncology I panel [9] in plasma samples of 41 CRC cases. Subsequently, in the year 2018, 92 proteins were measured using Olink's Proseek ® Multiplex Oncology II panel in plasma samples of 56 CRC cases [13]. In total, there was an overlap of 27 proteins between the two Olink panels for the 39 CRC cases that were included in both studies. There was an average difference of three years and three freeze-thaw cycles between the two measurements. In addition to the paired measurements for the 39 CRC cases, the 27 proteins were also measured in 181 and 102 nonoverlapping control samples free of neoplasms from the years 2015 and 2018, respectively. Blind laboratory analyses were performed with respect to findings at colonoscopy in the laboratory of the manufacturer of the panels. The workflow of the current study is presented in Figure 2.
were measured using Olink's Proseek ® Multiplex Oncology II panel in plasma samples of 56 CRC cases [13]. In total, there was an overlap of 27 proteins between the two Olink panels for the 39 CRC cases that were included in both studies. There was an average difference of three years and three freeze-thaw cycles between the two measurements. In addition to the paired measurements for the 39 CRC cases, the 27 proteins were also measured in 181 and 102 non-overlapping control samples free of neoplasms from the years 2015 and 2018, respectively. Blind laboratory analyses were performed with respect to findings at colonoscopy in the laboratory of the manufacturer of the panels. The workflow of the current study is presented in Figure 2.

Statistical Analyses
Normalization was performed in order to minimize both inter-and intra-assay variation. Nevertheless, there were minor differences in the preprocessing of the data. The xvalue of the point in qPCR where the reaction curve intersects with the threshold line is called the Cq value. Normalization in the first measurement in 2015 was a two-step procedure, with the first step being to subtract the raw Cq value from the extension control in order to correct for technical variation. The second step was to further normalize the calculated dCq-value against the negative control determined in the measurement, which yielded ddCq values (hereafter referred to as Cq values) on a log2 scale [8].
However, for the second measurement, normalization was a three-step procedure to obtain an arbitrary, relative quantification unit called Normalized Protein Expression (NPX). NPX is derived from the Ct values obtained from the qPCR by first subtracting the raw Ct value of an analyte from the Ct of extension controls. The obtained dCt of an analyte is further subtracted from dCt of interplate control in order to obtain the ddCt of an

Statistical Analyses
Normalization was performed in order to minimize both inter-and intra-assay variation. Nevertheless, there were minor differences in the preprocessing of the data. The x-value of the point in qPCR where the reaction curve intersects with the threshold line is called the Cq value. Normalization in the first measurement in 2015 was a two-step procedure, with the first step being to subtract the raw Cq value from the extension control in order to correct for technical variation. The second step was to further normalize the calculated dCq-value against the negative control determined in the measurement, which yielded ddCq values (hereafter referred to as Cq values) on a log2 scale [8].
However, for the second measurement, normalization was a three-step procedure to obtain an arbitrary, relative quantification unit called Normalized Protein Expression (NPX). NPX is derived from the Ct values obtained from the qPCR by first subtracting the raw Ct value of an analyte from the Ct of extension controls. The obtained dCt of an analyte is further subtracted from dCt of interplate control in order to obtain the ddCt of an analyte. In the final step, the predetermined correction factor is subtracted from the ddCt of the analyte in order to obtain the NPX of any protein. NPX is an arbitrary unit and represents the relative signal on a log2 scale; a one-unit NPX difference represents a two-fold difference in protein concentration.
For assessing concordance between the first measurements from 2015 and the second measurements from 2018 of the 27 proteins included in both measurements, the Pearson's product-moment correlation coefficient was calculated for the 39 CRC cases with paired measurements. In addition, the means, standard deviations and standard errors for the first and second measurements of the 27 protein biomarkers were calculated.
Complementary to direct comparisons of the paired 39 CRC samples were assessed and compared for each of the 27 protein markers; differences in the measurements performed among the 39 CRC cases and 102 controls in 2018, as well as differences in the measurements performed among the same 39 CRC cases and 181 controls in 2015 were also assessed and compared. Differences between cases and controls for each of the 27 proteins were assessed for statistical significance using the Wilcoxon rank-sum test with adjustments made for multiple testing using the Benjamini-Hochberg method [15]. Furthermore, for Life 2022, 12, 359 5 of 13 each individual protein biomarker, a logistic regression model was used in order to construct the prediction algorithm for the presence of advanced neoplasms, while predictive performance was assessed using AUCs and their 95% confidence intervals (95% CI). Correlation across the 27 proteins of the AUCs between the 2015 and 2018 measurements was quantified by Pearson's correlation coefficient, and the DeLong test was used to determine the statistical significance of differences between AUCs for each protein from the 2015 and 2018 measurements [16]. All statistical analyses were performed with the R statistical software language and environment (version 3.5.0, R core team) [17], and p-values < 0.05 in two-sided testing were considered to be statistically significant.

Characteristics of the Study Population
The eligibility criteria used for selection of study participants enrolled and selected from the Begleitende Evaluierung innovativer Testverfahren zur Darmkrebs-Früherkennung (BLITZ) study are presented in Figure 1 displaying the Standards for Reporting Diagnostic Accuracy studies (STARD) diagram. The main characteristics of the study population are presented in Table 1. The sample of 39 CRC patients for whom the same 27 proteins were measured twice (with an average difference of three freeze-thaw cycles) from the same plasma samples in 2015 and 2018 included 28 men and 11 women, with a median age of 67 years. CRC was detected at stages I-III in 36 out of 39 cases. Controls whose samples were measured in 2015 and 2018 were on average slightly younger with median ages of 62 and 65.5 years, respectively.
For all protein biomarkers the intra-and inter-coefficients of variation were <20% for all measurements. The limit of detection was set three standard deviations above the background for protein. The internal and external controls developed by Olink enable monitoring of assay performance and quality of samples, and detection control monitors the read out. Extension control facilitates normalization across samples and inter-plate control enables normalization between plates.

Analyses of Diagnostic Performance
Results of the univariate analysis comparing the expression difference of each marker in plasma samples from 39 CRC cases to 181 controls from 2015 and the same 39 CRC cases to 102 controls from the measurements in 2018 are reported in Table 3. Statistically significant expression differences between CRC cases and controls without neoplasms were found for six proteins, and persisted for five proteins after correction for multiple testing in the 2015 measurements. Two proteins, AREG and CEA, displayed areas under the ROC curves (AUCs) ≥0.7 for discrimination of CRC cases from controls. Similarly, for the measurements from 2018 that compared 39 CRC cases to 102 controls free of neoplasms, adjusted p-values ≤0.05 were observed for six protein biomarkers and AUCs ≥0.7 were observed for the same two biomarkers, AREG and CEA, in the second measurements. The AUCs for the 2015 and 2018 measurements were positively correlated, with a Pearson correlation coefficient of +0.58; for only 1 out of the 27 proteins (WFDC2), a statistically significant difference in AUC was seen between the 2015 and 2018 measurements. The mean concentration Cq and NPX values of the first and second measurements, respectively, for all the proteins are also presented in Table 2. As the normalizations performed on datasets from both the measurements were slightly different, the absolute concentration values are not directly comparable. Nevertheless, the standard deviations and standard errors of both measurements were largely similar.

Analyses of Diagnostic Performance
Results of the univariate analysis comparing the expression difference of each marker in plasma samples from 39 CRC cases to 181 controls from 2015 and the same 39 CRC cases to 102 controls from the measurements in 2018 are reported in Table 3. Statistically significant expression differences between CRC cases and controls without neoplasms were found for six proteins, and persisted for five proteins after correction for multiple testing in the 2015 measurements. Two proteins, AREG and CEA, displayed areas under the ROC curves (AUCs) ≥0.7 for discrimination of CRC cases from controls. Similarly, for the measurements from 2018 that compared 39 CRC cases to 102 controls free of neoplasms, adjusted p-values ≤0.05 were observed for six protein biomarkers and AUCs ≥0.7 were observed for the same two biomarkers, AREG and CEA, in the second measurements. The AUCs for the 2015 and 2018 measurements were positively correlated, with a Pearson correlation coefficient of +0.58; for only 1 out of the 27 proteins (WFDC2), a statistically significant difference in AUC was seen between the 2015 and 2018 measurements.

Discussion
In order to determine the effect of freeze-thaw cycles and prolonged sample storage times on expressions of candidate plasma proteins for CRC early detection, concentrations of 27 proteins were directly compared for paired samples from proximity extension assays (PEA) measurements performed in the year 2015 and, after an additional three freezethaw cycles, in 2018. Paired measurements were available for 39 CRC cases detected at screening colonoscopy. Moderate to very strong correlations were seen for all 27 proteins, with correlation coefficients ranging from 0.43 to 0.97 (p < 0.001 for 26 out of 27 proteins). Furthermore, analyses of diagnostic performance based on samples from the two different measurement rounds yielded rather similar results, despite inclusion of different samples of controls.
Large-scale screening and cohort studies are an invaluable resource for evaluating novel biomarkers for cancer screening. However, they often require many years of participant recruitment and follow-up. For reasons related to efficiency and logistics, and to minimize batch effects, laboratory analyses are typically performed in single batches after completion of recruitment and/or follow-up, and the precious biospecimens collected in such studies are typically stored frozen for long times at very low temperatures in order to remain available for additional analyses of novel emerging biomarkers; or to use novel, emerging technologies upon. As a result of limited storage capacities for very small aliquots, larger blood aliquots must sometimes be used multiple times and samples consequently undergo multiple freeze-thaw cycles before analysis. The research of protein biomarkers is mired by several factors, such as the dynamic range of plasma proteins [18], inconsistencies in sample handling, preanalytical and analytical procedures [19,20], in addition to varying times required for sample fixation after sample processing [21]. Although there is consensus that storing times might be of concern for some markers [22,23] and that freeze-thaw cycles should be minimized wherever possible, there is very limited empirical evidence pertaining to the degree to which prolonged storing times and freeze-thaw cycles compromise the reliability and validity of measurements of candidate diagnostic markers; moreover, there are few estimates of their diagnostic performance.
Our study aimed to help fill in these gaps for plasma measurements of 27 candidate diagnostic proteins performed in two batches, three years apart. Overall, rather high correlations were seen between repeat measurements, despite the large differences in sample handling in terms of storage times and freeze-thaw cycles. It is worth noting in this context that even repeat measurements of the same samples without any additional storage time or freeze-thaw cycle would not be expected to yield perfect correlations; variations between the two measurements may not be fully ascribed to sample storage times and freeze-thaw cycles. Although our results are reassuring in that measurement correlations for the majority of the candidate protein markers were very high, they also illustrate that vulnerabilities to preanalytical handling may strongly vary between proteins. Therefore, neither general disregard for samples that have been stored for long time and undergone multiple freeze-thaw cycles, nor general ignorance of the relevance of preanalytical conditions, are warranted. Rather, a differentiated view regarding specific analytical conditions and their relevance for specific biomarkers is warranted. In this respect, our study aims to contribute to the very limited empirical evidence available thus far.
Our study has specific strengths and limitations. A major strength is its conduction in a true screening setting, and in the target population of CRC screening for whom potential inclusion of the candidate proteins in a blood-based screening test would be of immediate practical relevance. Many studies on the role of pre-analytics for diagnostic measurements have relied on small convenience samples. In our study, the presence of CRC and absence of colorectal neoplasms were confirmed by screening colonoscopy in all participants. The study was based on a very large cohort of screening colonoscopy participants that enabled inclusion of a reasonably sized sample of screening-detected CRC patients, despite the very low prevalence of CRC in the screening population. In addition to providing quantitative evidence for the correlation between plasma protein measurements, our study also provides empirical evidence for the potential impact of preanalytical history on estimates of diagnostic performance. However, a number of limitations also need to be addressed. Although the same proteins were measured in both rounds of measurements, they were included in different protein panels in the first and second measurements, which may have affected measurement quality. Furthermore, different types of normalization were employed in the first and second measurement rounds. Finally, direct comparisons of measurements were restricted to screened subjects in whom CRC was detected at colonoscopy, whereas no such direct comparisons were possible among control samples due to a lack of overlap among them.
Despite these limitations, our results provide valuable empirical evidence on the stability against major differences in preanalytics for a number of plasma proteins that may be relevant for a potential CRC multi-protein screening panel. In particular, very high correlations were found for several markers showing the highest diagnostic potential in this and previous analyses, such as AREG and CEA [8,9,13,24], but wide variations in correlation were seen across various proteins. Although biomarkers such as AREG and CEA are not ready for clinical use, using a combination of these potentially promising proteins with other types of genetic, epigenetic, protein or metabolomic biomarkers could contribute to the development of a powerful blood-based screening tool. Such multiomics development may receive further momentum through the application of artificial intelligence tools [25,26]. Given that preanalytics may not often be perfect in routine screening practices, the robustness of markers against various preanalytical conditions is an additional, important criterion to be considered in biomarker research that should receive increased attention in future studies.