1. Introduction
Chronic kidney disease (CKD) and its progression to end-stage renal disease (ESRD) necessitate the implementation of hemodialysis, a life-sustaining treatment that filters waste products and excess fluid from the blood [
1]. In this context, accurate and timely monitoring of hemoglobin (Hb) levels is paramount to managing anemia, a common complication among dialysis patients [
2]. The assessment of Hb concentrations in dialysis patients is not merely a routine test but a critical element of patient care [
3]. It guides the administration of erythropoiesis-stimulating agents (ESAs) and iron therapy, directly impacting patient well-being and quality of life [
4]. Anemia in dialysis patients is multifactorial, attributed to the decreased production of erythropoietin by the failing kidneys, the shortened lifespan of red blood cells, and other factors such as nutrient deficiencies and inflammation [
5].
Blood gas analyzers (BGAs), commonly utilized for rapid assessment of blood gases, electrolytes, and metabolites in critical care settings, also offer the advantage of immediate Hb determination [
6,
7]. The use of BGA analyses in hemodialysis units is of particular interest due to the potential for real-time clinical decision-making. This immediacy can be crucial during dialysis sessions, allowing for on-the-spot adjustments to treatment protocols based on the patient’s current hemodynamic status [
8].
However, the reliability and accuracy of Hb measurements obtained from BGAs as compared to those from central laboratory testing remain subjects of debate [
9,
10,
11,
12,
13]. Central laboratory testing is typically considered the gold standard and employs methodologies such as automated hematology analyzers that provide precise and comprehensive Hb evaluations. While accurate, these tests require venous blood samples, longer processing times, and are resource-intensive, which may delay clinical decisions in the dialysis setting [
14]. The primary aim of this study was to assess whether point-of-care Hb measurements obtained during routine hemodialysis sessions using a BGA are sufficiently accurate and precise compared with central laboratory Hb measurements to guide real-time clinical decisions. Although many dialysis patients undergo routine laboratory testing, the timeliness of central laboratory results varies substantially (turnaround times up to 180 min for hematology results) and may not always support immediate, intradialytic decisions [
15]. In contrast, blood gas analyzers and other POCT devices can provide Hb results within minutes, enabling more rapid clinical action [
16]. Rapid, reliable bedside Hb testing could, therefore, have particular value in settings where laboratory turnaround is prolonged, where dialysis facilities operate remotely from a hospital laboratory, or where transport and processing delays are common, including night shifts and low-resource regions.
We hypothesized that BGA Hb monitoring via the ABL Flex 800 is non-inferior to central laboratory measurements (non-inferiority margin 0.5 g/dL) and sought to derive a simple correction formula for clinical application that may be used for anemia management, especially in out-of-hospital and resource-limited settings.
2. Methods
This study was conducted as a retrospective analysis of a large cohort of dialysis patients treated at the Medical University of Vienna, Division of Nephrology and Dialysis from 1 April 2017 to 1 February 2024. The data collection focused on Hb measurements obtained through two distinct methods: a BGA analysis using the ABL800 Flex system (Radiometer Medical ApS, Copenhagen, Denmark) and standard laboratory testing. In our study, we analyzed 164,085 Hb tests conducted through both BGA and standard laboratory methods. To ensure the accuracy and reliability of our data, we focused exclusively on laboratory results obtained within a 90 min window between measurements, including only the most stable and representative samples. The study was approved by the Institutional Review Board of the Medical University of Vienna (EC-Nr: 2312/2024).
2.1. Blood Sampling
BGA (point of-care-testing; POCT) testing was conducted using an ABL800 Flex standard blood sampler, with a capacity of 1.7 mL, manufactured by Radiometer Medical ApS, Copenhagen, Denmark. The blood sample was collected during the dialysis session and was transferred from the syringe to the analyzer. It then underwent hemolysis by exposure to ultrasound at 30 kHz, using one microliter of the blood. The Hb level was measured using a spectrophotometric method, which involves passing light through the sample at 128 distinct wavelengths ranging from 478 to 672 nm. This light was channeled through glass-fiber optics and separated into the individual wavelengths by diffraction grating. The measurement was captured by an array of 128 photodiodes. The Hb in the blood sample was then calculated based on the Lambert–Beer law.
In the laboratory, Hb quantification was performed using the sodium lauryl sulfate (SLS) Hb technique on the Sysmex XN-Series analyzers (Sysmex Corporation, Kobe, Japan). The analysis commenced with the collection of 3 mL of blood treated with EDTA as an anticoagulant. The SLS method involves a series of processes beginning with a hemolytic reaction in which SLS attaches to the erythrocyte membrane through both ionic and hydrophobic bonds, leading to the release of Hb. Subsequent steps include the modification of the globin molecule structure by SLS, and the oxidation of heme iron from its divalent to trivalent form, a process aided by oxygen. Finally, the Hb levels are determined by the absorption of light at a wavelength of 555 nm. Analytical performances for the point-of-care blood gas analyzer and the central laboratory analyzer are summarized from manufacturer performance data and instrument reproducibility studies. The radiometer (ABL Flex series, by Radiometer Medical ApS, Copenhagen, Denmark) reports 95% confidence intervals for total hemoglobin (tHb) at multiple control levels; for example, manufacturer-reported 95% CIs (and acceptance criteria) include 7 g/dL (−0.01 to +0.11; acceptance ±0.3), 15 g/dL (+0.29 to +0.44; acceptance ±0.5), and 25 g/dL (+0.90 to +1.23; acceptance ±1.3). Sysmex XN manufacturer (Sysmex Corporation, Kobe, Japan) reproducibility data report Hb performances by control level at ~5.90 g/dL (level 1), total SD 0.072 g/dL (total CV 1.22%); at ~12.02 g/dL (level 2), total SD 0.149 g/dL (total CV 1.24%); and at ~16.34 g/dL (level 3), total SD 0.220 g/dL (total CV 1.35%). Instruments were maintained under routine laboratory QC and calibration procedures per institutional policy.
2.2. Statistical Analyses
The initial analysis included descriptive statistics to summarize the data, including age, means of Hb, standard deviations (SD), and ranges for Hb values obtained from both BGA and laboratory methods. To assess the variability of Hb levels across the defined categories and between the two testing methods, an ANOVA was conducted. We stratified the Hb data into several separate categories to conduct a more granular analysis of how varying Hb levels might influence the results. Those were defined according to the central laboratory Hb values, which were considered the reference standard for categorization. These categories were designed to encompass a range of Hb concentrations, specifically: 6–7 g/dL, 7.1–8 g/dL, 8.1–9 g/dL, 9.1–10 g/dL, 10.1–11 g/dL, and greater than 11 g/dL (categories 1–6). For each category, paired t-tests were used to compare the mean Hb levels measured by BGA and laboratory.
Additionally, to assess age-related variations, we divided the study population into three age groups: 20–40 years, 41–60 years, and over 60 years. Additionally, delay between BGA and laboratory measurements were stratified. For each paired observation, we defined measurement delay as the difference between the start times of the two measurement processes: the time at which the bedside BGA measurement was started and the time at which the central-laboratory measurement process for the paired sample began (recorded as the laboratory arrival/analysis timestamp). The non-inferiority margin for comparing measurement methods was set at 0.5 g/dL, slightly lower than previously proposed [
17]. The non-inferiority margin of ±0.5 g/dL (±5 g/L) was chosen a priori on clinical grounds and is consistent with prior evaluations of point-of-care Hb devices, which have used ±0.5 g/dL and ≈±5% relative differences as clinically relevant thresholds for transfusion and anemia-management decisions [
18,
19]. At a cohort mean Hb of approximately 10 g/dL, ±0.5 g/dL corresponds to ±5% and is comparable to commonly accepted device performance thresholds. Because regulatory total allowable error criteria combine both bias and imprecision, we separately report and compare both the mean bias (and its confidence interval) and the Bland–Altman limits of agreement to reflect systematic and random components of disagreement. A Kruskal–Wallis test was used to analyze the effects of measurement delays between BGA and laboratory measurements on the Hb values. The linear regression was utilized to deliver a formula to correct the BGA values to standard laboratory results. Internal validation used patient-level grouped 10-fold cross-validation to avoid optimistic bias from repeated measures within patients; all pairs from the same patient were assigned to the same fold. As patients contributed different numbers of paired measurements, fold test-set row counts varied; therefore, we reported both the unweighted mean (±SD) across folds and the fold-weighted mean (weighted by number of test-set pairs) as the dataset-level estimates. We additionally performed a patient-level 70/30 split and a clustered (patient) bootstrap to estimate uncertainty. Agreement between paired BGA and central laboratory Hb results was assessed using a Bland–Altman analysis. For each pair, the difference was plotted against the mean of the two methods. The overall bias was calculated as the mean of these differences, and the SD of the differences was determined. Ninety-five percent limits of agreement (LoAs) were defined as bias ± 1.96 × SD and displayed using the Bland–Altman plot. A statistical analysis was performed using the commercially available IBM SPSS Statistics software (version 29.0.2.0. for Mac) and GraphPad Prism (GraphPad Prism 10.0.3(217) Macintosh Version by Software MacKiev© 1994–2023 GraphPad Software, LLC).
4. Discussion
Between April 2017 and February 2024, we retrospectively examined 21,604 paired Hb determinations obtained from 291 hemodialysis patients to compare point-of-care BGA measurements with central laboratory results. Central to the discussion is the observed mean Hb level discrepancy between BGA (10.14 g/dL) and laboratory (9.90 g/dL) methods. The derived overall mean difference of 0.24 g/dL, supported by a statistically robust non-inferiority analysis, underscores the BGA’s reliability compared to traditional laboratory methods, within a clinically acceptable variance margin.
Furtherly, the age-specific analysis showcased minimal variance across different demographic segments, fortifying the assertion of the BGA’s consistent performance. The absence of significant discrepancy across age groups, as indicated by the non-significant p-value (0.34), enhances the argument for the BGA’s broader clinical utility.
Category-specific evaluations further illuminate the relative stability in Hb readings across varying Hb ranges. The slight differences noted between BGA and laboratory readings across Hb categories, and the maintained non-inferiority across these spectra, provide a granular understanding of the measurement dynamics. Leveraging this large and longitudinal dataset, we were able to characterize both the systematic bias and the LoA between the two methods across clinically relevant time delays. Our analysis demonstrated that despite a modest positive bias of approximately 0.24 g/dL in raw BGA values, the vast majority of individual differences remained within our pre-specified ±0.5 g/dL equivalence margin, even when sample analysis was delayed by up to 90 min.
The incremental increase in discrepancy over time, especially noted beyond the 30 min mark, signifies a potential temporal impact on BGA accuracy. This time-dependent variability necessitates a timelier analysis post-blood draw to minimize potential discrepancies in clinical interpretation. Nonetheless, whether the modest discrepancies we observed might translate into clinically meaningful differences in anemia management remains to be determined. Several non-dialysis studies have demonstrated strong agreement between BGA and central laboratory Hb measurements in emergency, perioperative, and laboratory settings [
20,
21,
22]. Recent work in COPD patients found arterial blood gas Hb measurements to differ by <0.3 g/dL from venous laboratory values, with >95% correlation [
23].
However, not all investigations have reached the same conclusion. In a prospective comparison of the Roche AVL OMNI S blood gas analyzer and the hospital central laboratory results, BGA-derived Hb, hematocrit, sodium (Na+), and potassium (K+) values were systematically lower than their laboratory counterparts (
p < 0.0001), and nearly 30% of all measurements exceeded the US-CLIA allowable total error limits, even though the mean biases for Hb, Na
+, and K
+ individually fell within the specified cut-offs [
24]. Similarly, a retrospective cross-sectional study of 1927 paired samples demonstrated unacceptably wide Bland–Altman LoAs (e.g., −5.0 to +4.0 g/dL for Hb) and Cohen’s κ values ≤0.60, leading the authors to conclude that BGA and laboratory hematology results could not be used interchangeably [
25].
Several factors may account for these discordant findings. First, differences in analyzer technology, including the number and selection of wavelengths used for spectrophotometry and the specific lysing agents (e.g., SLS vs. ultrasonic hemolysis), can introduce systematic measurement shifts. Second, pre-analytical variables such as the sample source (arterial versus venous), anticoagulant type, and site of blood draw (e.g., direct from the dialysis circuit versus a central vascular access) can affect the Hb concentration through localized hemolysis. Third, variations in instrument maintenance, calibration schedules, and quality control procedures between point-of-care and central laboratory settings may further widen the gap. On the other hand, the dialysis-specific factors, such as intradialytic fluid shifts, access recirculation, and sampling delays, could affect the BGA’s accuracy. Only limited data have compared BGA versus laboratory Hb results in hemodialysis populations [
26]. Our study bridges this gap by evaluating 10,802 paired Hb measurements over seven years in a large dialysis cohort. Our analysis aimed to establish a reliable correction formula that enables clinicians to predict standard laboratory Hb values from BGA measurements.
Our correction formula (Hb = BGA − 0.3), juxtaposed against the more complex regression-based equation, stands out for its clinical expediency without a significant compromise on accuracy. The mean Hb estimation via this simplified method (9.94 g/dL) closely mirrors the laboratory standard (9.90 g/dL), reinforcing its practical utility in clinical environments.
The strong correlation coefficient (0.95) alongside the minimal MAE and RMSE values in both complex and simplified formulas attests to the robustness of our predictive models. In addition, the internal validation demonstrates that the regression correction reduces the mean absolute error to ~0.3 g/dL and yields an RMSE of ~0.5 g/dL across a large, clustered dataset of 10,802 paired measurements. These errors are small relative to clinical decision thresholds (non-inferiority margin ±0.5 g/dL), but the Bland–Altman limits show that individual paired differences can be larger; therefore, the correction improves the average agreement but cannot eliminate occasional clinically meaningful mismatches. These statistical indicators not only validate the accuracy of the formulas but also underscore their potential to streamline clinical workflows by enabling rapid, near-accurate Hb estimations. Additionally, across all three sampling delay categories (<30 min, 30–60 min, and 60–90 min), the 90% confidence intervals for the mean differences of Hb values lay entirely within our pre–specified ±0.5 g/dL equivalence bounds. These results confirm that point-of-care BGA Hb measurements using ABL800 Flex are clinically interchangeable with central laboratory values, even when the analysis is delayed by up to 90 min. Although current CLIA total-allowable-error criteria are stricter than our predefined equivalence margin, we chose ±0.5 g/dL (≈±5% at Hb = 10 g/dL) because it reflects previously published POCT thresholds and clinical decision limits [
27]. Using the Bland–Altman parameters after regression correction (bias ≈ 0, SD ≈ 0.47 g/dL), the estimated total error would be ≈ 0.78 g/dL, which exceeds the strict CLIA total allowable error margin (~0.4 g/dL at Hb ≈ 10 g/dL). Thus, although the mean bias is small and within our ±0.5 g/dL equivalence margin, the combined effect of residual imprecision means the corrected BGA does not meet the most stringent regulatory total error threshold. KDIGO guidelines define Hb thresholds to guide ESA initiation and dose adjustment [
28]. Given that our regression-corrected BGA measurements exhibit minimal systematic bias but a non-negligible total error (~0.8–0.9 g/dL), clinical practice should balance speed with caution; corrected BGA values may be used for routine monitoring and timely intra-dialysis decisions provided individual centers accept this level of total error and implement robust POCT quality control, but any result falling within a predefined buffer zone around KDIGO treatment thresholds, or any value that would trigger major interventions, should be confirmed by the central laboratory before changing the ESA dosing. This pragmatic approach preserves the advantage of rapid point-of-care information while minimizing the risk of inappropriate ESA adjustments due to residual analytical variability.
The limitations of our study include the retrospective design and single-center setting, which may limit the generalizability to other BGA platforms. Additionally, intradialytic fluid shifts and sampling sites (AV-fistula vs. permanent dialysis catheter) were not separately analyzed, warranting prospective evaluation. Although our dataset contains 10,802 paired Hb measurements, these originate from only 291 individual patients. Repeated measures within patients introduce within-subject correlation and could reduce the effective sample size for inferences about independent observations. We addressed this by reporting patient counts alongside test counts, and by using paired analyses for method comparison. Nevertheless, we acknowledge that the large number of repeated samples from the same subjects may inflate the precision and that the results should be interpreted by considering this limitation. Additionally, we did not have a uniform laboratory hemolysis index for all samples in this retrospective dataset, so we could not exclude hemolyzed specimens systematically; therefore, hemolysis may have contributed to the observed imprecision. Finally, we did not exclude implausible or erroneous Hb values in this retrospective cohort, which may have inflated the observed imprecision and contributed to the higher estimated total error. Future studies using mixed-effects models or analyses stratified by patient would further account for intra-patient correlation. Although our regression correction markedly reduced the mean error and improved the precision within the cohort, external validation in independent populations is still required before recommending routine application.
At the end, the decision to rely on BGA testing for routine anemia management should be guided by local logistics and economics. In centers without on-site laboratory services, or where transport and processing delays routinely exceed the dialysis session or clinical need, POCT can materially shorten the time-to-result and support timely decisions; evidence suggests POCT may be cost-effective in these contexts [
29].
In conclusion, this study validates the BGA’s reliability compared to laboratory measurements and offers a clinically viable tool for Hb estimation and anemia management in hemodialysis patients, if the total error of ~0.8–0.9 g/dL is clinically acceptable. The alignment of BGA measurements with laboratory readings, across diverse patient demographics and Hb categories, reinforces its clinical validity. Moreover, the introduction of a simplified correction formula and its internal validation offers a promising avenue for enhancing its clinical efficiency, allowing for real-time, accurate Hb assessments that can significantly impact patient management and outcomes. Future studies should assess the impact of BGA-guided ESA dosing on clinical outcomes and cost-effectiveness.