1. Introduction
Ultrasound has been used as a diagnostic tool in medicine for over sixty years, but its decentralisation from radiology departments and into the hands of clinicians at the patient’s bedside is a recent development, facilitated by changing technology as machines become more portable without compromising their image quality. Increased familiarity with and access to ultrasound, alongside a recognition of the limitations of conventional physical examination and chest X-ray, have encouraged clinicians to explore its use as a diagnostic tool [
1], to the point that ultrasound is now as essential a tool as the stethoscope in managing patients with respiratory disease [
2].
Lung ultrasound (LUS) has delivered a paradigm shift in the assessment of patients with pulmonary congestion and excess extravascular lung water (EVLW), using the presence, number, and pattern of B-lines on LUS as a surrogate marker for pulmonary oedema [
3,
4]. Despite published consensus guidelines on the use of point-of-care LUS [
5], gaps remain in the evidence base and physiological data underpinning this clinical practice. Previous work [
6,
7] has used patients with end-stage renal failure, requiring haemodialysis, as a population with predictable fluid overload to explore the relationship between changing fluid status and the presence of EVLW, and how this can be seen on LUS. However, these studies have suffered from a lack of a blinded LUS assessment, an evaluation of inter-rater reliability, and/or a radiologic gold-standard reference test; issues often encountered when studying diagnostic LUS [
8].
This pilot study evaluated the ability of LUS to diagnose changes in EVLW over time, utilizing a stable outpatient population under close physiological observation. This study built on the existing evidence base, firstly by facilitating a direct comparison with the existing gold-standard radiologic test for lung parenchymal assessment (i.e., computed tomography), and secondly by using multiple assessors blinded to study participants’ clinical status to evaluate the consistency of LUS interpretations between different operators.
2. Methods
2.1. Study Design and Participants
This was a prospective observational cohort study, recruiting patients from a single site over a 15-month period. Adult patients (≥18 years old) were screened from normal clinical care at the outpatient haemodialysis unit by the clinicians responsible for this service. Patients were excluded if they were <18 years old, unable to provide informed consent, or either pregnant or breastfeeding. Study participants provided informed written consent before recruitment and study-related procedures.
The study was sponsored by the University of Oxford, managed through the University of Oxford Respiratory Trials Unit, and registered with ClinicalTrials.gov (NCT01949402). The protocol and its subsequent amendments were approved by the UK National Research Ethics Service (13/SC/0319). This study was supported by the Teresa Rosenbaum Golden Charitable Trust (Rosetrees Trust), who provided a grant to cover administrative and consumable costs, and by Esaote UK, who provided ultrasound equipment. The funders were not involved in the study’s design or procedures, data analysis or interpretation, or manuscript production.
2.2. Study Procedures
All study procedures were carried out during a single hospital visit that was coordinated with one of the participant’s regular haemodialysis sessions to minimise any disruption to usual clinical care. Haemodialysis was performed using a B. Braun Dialog Plus machine (B. Braun Medical, Melsungen, Germany) with a dry weight target set in advance by the participant’s responsible renal physician. Baseline demographic data and physical observations (including the volume of fluid removed by the haemodialysis machine at that timepoint) before, during, and after haemodialysis were recorded; a full blood count (including haematocrit) was also performed pre- and post dialysis.
Study participants underwent standardised LUS examination with an Esaote MyLab25 Gold machine (Esaote UK, Cambridge, UK) and an abdominal curvilinear transducer (3–6 MHz) immediately before, midway through, and immediately after haemodialysis. A modified abdominal preset was used, with all smoothing and artefact minimisation algorithms switched off prior to image acquisition, with the probe placed along the intercostal space. LUS clips (6–10 s in duration, including at least two full respiratory cycles) were acquired, with the patient sitting up at between 45 and 90 degrees according to comfort, at 10 points over each hemithorax (
Supplementary Materials Figure S1) and reported in real time by the primary (bedside) operator. The same LUS video clips were remotely scored at a later stage by two independent assessors blinded to the original scoring and the participants’ clinical status. All LUS scorers were expert practitioners who use LUS regularly as part of their usual clinical practice and have previously published in this area of research. Video clips from different patients and different timepoints were presented to the LUS scorers in a random order, and they were asked to document the maximum number of B-lines seen at each of the 10 points over each hemithorax at any given time during the video clip; these were then added together in order to generate a total LUS B-line score for the patient at that timepoint. B-lines were pre-defined as echogenic, dynamic, wedge-shaped artefacts with a narrow base originating at the pleural line and extending to the distal edge of the ultrasound image, consistent with the previous literature [
3,
5]. All LUS scorers were asked to decide whether the overall examination findings were consistent with an interstitial syndrome as defined in the consensus guidelines [
5].
Study participants underwent non-contrast CT of their chest before and after haemodialysis, performed with the participant lying supine, using a 16-slice scanner (GE Lightspeed; GE Healthcare, Buckinghamshire, UK) with a slice thickness of 0.625 mm. Images were acquired during inspiratory breath hold to minimize motion artefacts while using a low-dose protocol (maximum 1.7 mSv). CT scans were reported by a consultant radiologist with a sub-specialty interest in thoracic imaging. The reporting radiologist was not blinded to the participant’s clinical condition or timing of the scan and was specifically instructed to look for changes that would be suggestive of a change in fluid status.
Study participants completed a questionnaire (
Supplementary Materials File S1) relating to their symptoms pre- and post dialysis, and their experience of and satisfaction with the LUS assessment. This included a visual analogue scale (VAS) score of the pain caused by LUS examination (0–100 mm: no pain at 0 mm and worst possible pain at 100 mm); VAS (0–100 mm: no breathlessness at 0 mm and worst possible breathlessness at 100 mm) and Likert-type scores of breathlessness pre- and post dialysis; and an assessment of their willingness to undergo LUS examination again in the future.
2.3. Outcomes and Analyses
The primary outcome measure was the correlation between the change in the total LUS B-line score, taken to represent the presence and extent of EVLW, and the contemporaneous associated change in fluid status during haemodialysis. Fluid status was assessed using physiological markers, including body weight and haematocrit pre- and post dialysis; the volume of fluid removed during dialysis was measured by the haemodialysis machine.
The secondary outcome measures were (a) change in the total LUS B-line score and change in patient-reported breathlessness, measured using visual analogue and Likert-type scales; (b) measurement of patient comfort and satisfaction with the LUS examination using visual analogue and Likert-type scales; and (c) comparison of the total LUS B-line score and the categorical diagnosis of an interstitial syndrome, made using pre-defined criteria [
5], ascribed to individual ultrasound scans by different operators.
SPSS version 24 was used for statistical analyses; a
p-value < 0.05 was considered significant. Descriptive statistics were used to summarize patient characteristics.
T-tests were used to analyse parametric data. The Pearson correlation coefficient was used to assess the linear correlation between different variables. Interrater reliability was calculated using kappa statistics for categorical data; for continuous variables, the intraclass correlation coefficient was calculated to estimate the degree of their association, alongside Bland–Altman analysis [
9] to demonstrate individual differences between measures.
4. Discussion
This study demonstrates that LUSs can identify the presence of EVLW and monitor its resolution in real time, as evidenced by a reduction in the total LUS B-line scores observed during haemodialysis. We can be confident this is a genuine finding, since the change in fluid status incurred by haemodialysis was the only intervention that took place during the period of observation. This reduction in total B-line scores was observed by the primary LUS operator and also by independent blinded clinicians only given access to the scans, with no knowledge of either the patient’s physiological state or the timing of the imaging in relation to haemodialysis. The relationship between the change in the total B-line score and volume of fluid removed was not linear, suggesting that other factors have an influence on how the changes in intravascular volume status affect the extravascular fluid status of different individuals. The results are consistent with prior published data [
6,
7] and add to the evidence base by showing that LUS appears to outperform CT in both identifying lung changes that would be consistent with the presence of EVLW and showing the resolution of these same changes over time.
Despite the LUS protocol requiring a 20-point examination, patients still found it acceptable. Minimal pain was reported with the LUS assessment, and all participants would have LUS scans in future if it were clinically necessary. An assessment of the change in patient-reported breathlessness following haemodialysis did not demonstrate an association between the reduction in total LUS B-line scores and improved symptoms; however, half the study population were not breathless before dialysis, with 15/30 (50%) participants reporting a pre-dialysis VAS breathlessness score of 10 mm or less. Despite this, there appeared to be a signal for the reduction in total B-line score to predict an improvement in breathing that merits evaluation in a larger study of more symptomatic patients. LUS changes may precede symptoms, and its use as a screening tool in this and other populations prone to the development of EVLW should be explored in future studies.
A major strength of this study was the use of blinded scorers, since LUS findings can be vulnerable to an individual operator’s ability to acquire and interpret images. Some tests of inter-rater reliability were reassuring, with 80% agreement of the diagnosis of interstitial syndromes and at least a moderate correlation between the primary operator and blinded scorers for total LUS B-line scores across different timepoints. However, an evaluation of the absolute agreement between raters on individual LUS scores using a Bland–Altman analysis demonstrated a lack of consistency. This is an important observation, since it suggests that, whilst the broader trends observed in LUSs during dialysis are common between different raters, the precise approach to scoring is unique to each individual clinician (
Figure 4 and
Figure S2), suggesting an element of internal calibration as to what is normal (or abnormal) in an ultrasound assessment. This means that if LUS is used to monitor the change in EVLW over time, the findings may only be valid if the same assessor carries out each examination. It is worth noting that other studies [
7,
12] have demonstrated better inter-rater reliability for LUS examinations than in our work. One explanation for this may be our use of three scorers for each examination as opposed to two [
7,
12], which inevitably increases the likelihood of variations in scoring being identified. Another reason may be that we chose to look not only at the extremes of LUS examination—that is, pre- and post dialysis in this scenario—but also the area in between, represented by the LUS examination mid-dialysis in this study. We do not, however, feel that this difference with other previous work makes our findings any less valid; rather, they illustrate the subjectivity and nuance involved in what is a human process. Standardizing how LUS is used in different clinical situations [
13] may offer some benefits in this regard, whilst further studies on artificial intelligence and computer-aided scoring may also help resolve this issue of human subjectivity over time [
14,
15]. Until then, clinicians will need to remain aware of the limitations of LUS in this specific context.
This study’s finding that LUS outperforms CT in the evaluation of EVLW is a novel one that raises several questions. The physiological and anatomical changes that underpin the development of B-lines on LUS are incompletely understood, although it was a study comparing LUS with CT that first associated their formation with increased subpleural interlobular septal thickening, secondary to either pulmonary oedema or fibrotic lung disease [
3]. It has been theorized that, as lung water increases, the difference in acoustic impedance between the aerated lobular parenchyma and fluid-filled interstitium creates a highly reflective interface, a reverberation of ultrasound waves, and B-lines on LUSs.
During this study, in the majority of patients who were observed to have an excess of B-lines on their LUSs that resolved during haemodialysis, no correlating abnormality (either initial, or evolving between pre- and post-dialysis imaging) could be identified on CT scans, despite these scans being reported by unblinded radiologists with sub-specialist expertise in thoracic imaging. This suggests a change in lung anatomy and physiology either beyond the level of the interlobular septa or the resolution of CT, but one which remains detectable on LUS. This finding will need to be replicated and evaluated in greater detail in future studies, but it appears to suggest that, in this clinical context at least, LUS may now be considered the radiologic gold standard.
The findings of this and previous studies on LUS in this patient population have potential implications for clinical practice. Chronic volume expansion in patients with end-stage renal failure may develop insidiously and is associated with increased long-term morbidity and mortality [
16,
17]. LUS surveillance may allow for the identification of patients with features of pulmonary congestion before they develop symptoms or physiological decompensation, and thus allow for earlier therapeutic intervention. Similar LUS techniques could be applied in the management of other conditions where fluid overload and pulmonary congestion are common; for example, patients with congestive cardiac failure [
18]. In all these scenarios, well-designed and robust clinical trials are needed to demonstrate that LUS can modify hard clinical outcomes of relevance to patients and clinicians before its wider uptake can be recommended [
19].
There were limitations to the design and delivery of this study. The population was recruited from a single centre with experience in the performance of clinical studies in both renal and respiratory medicine, and several patients did not have any radiologic evidence of excess EVLW prior to commencing haemodialysis. This meant that the number of cases included in the final analysis was relatively small and may have impacted our findings. Whilst the patients acted as their own controls and the changes across LUS examinations could therefore be reliably attributed to changing fluid status alone, it is uncertain what impact other co-morbidities might have; for example, how does significant left and/or right ventricular dysfunction impact on B-line resolution during haemodialysis? These questions would need to be addressed as part of a larger multi-centre study, which might also allow for an assessment of whether a more limited LUS protocol provides sufficient information to inform clinical decision making. All the scans were acquired by a single experienced LUS operator; future work should build on the robust methodology of using multiple blinded LUS scorers by having different bedside operators examine patients to ensure the reproducibility of findings at the point of acquisition.
In conclusion, these results demonstrate that LUS reliably identifies the presence of and changes in EVLW in patients with end-stage renal failure undergoing haemodialysis. The absolute scoring of B-lines on LUS scans appears to be dependent on the individual operator, whereas the assessment of change is consistent across different clinicians. LUS appears more sensitive than CT at detecting changes in EVLW and may be considered the radiologic gold standard. These observations have implications for clinical care, and further research is needed to better characterize the role that LUS might play in the management of patients on haemodialysis and patients with other conditions that result in pulmonary congestion, with the long-term aim of reducing morbidity and mortality in these different populations.