SafeBladder: Development and Validation of a Non-Invasive Wearable Device for Neurogenic Bladder Volume Monitoring
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsIn section 2.4, it is mentioned that the prototype validation was under controlled conditions. What conditions were used? It should be considered that inside the human body, the conditions depend on health conditions.
In section 2.1 of the prototype development, the components of an optical system are mentioned; a specification table of the sensor characteristics could be added, considering advantages and limitations.
In section 2.5 it is mentioned that the data was processed with machine learning models, could you explain or add a section of the model? What the authors present is not sufficient and they do not describe the implemented model.
Figures 4 and 5 present the implemented in vitro experiment. What would be the environmental conditions that were used? How are these experiments in reality? How did they determine that the experiment would be implemented in this way?
The authors mention that one of the experiments was performed with water, because a liquid similar to urine was not chosen, could the color affect the measurement?
In the discussion section, the authors mention that the system's sensitivity to ambient light confirms that its optimal performance is achieved when used under clothing, creating a dark environment that significantly improves measurement accuracy. How is this improvement achieved? The figures do not show the implementation under clothing.
Comments on the Quality of English LanguageReview the English translation in the document; this would help improve and produce a better final work.
Author Response
Comments 1: In section 2.4, it is mentioned that the prototype validation was under controlled conditions. What conditions were used? It should be considered that inside the human body, the conditions depend on health conditions.
Response 1: We agree and now explicitly state ambient light, temperature/humidity, and mechanical coupling. We also clarify how these differ from in vivo conditions and why they’re appropriate for an optical first-pass.
“To validate the prototype, tests were performed under controlled conditions, meaning that all in vitro trials were performed in a laboratory at 22 ± 1 â—¦C and typical relative humidity of 45–60%. Two lighting regimes were used: "dark" (< 2 lux; lights off, blackout curtains) and "light" (300–600 lux ambient). The sensor face was kept in full contact with the phantom/container (no gaps), and the belt tension was kept constant by using the 112 same notch across steps. These controls isolate optical effects (absorption/scattering) from confounding factors such as stray NIR, placement drift, and temperature-dependent electronics. In vivo variability (e.g., tissue thickness, perfusion, hydration) is greater than in these phantoms; the aim in this study is to demonstrate feasibility and to characterize domain shifts that can inform per-user calibration in future clinical applications.”
Comments 2: In section 2.1 of the prototype development, the components of an optical system are mentioned; a specification table of the sensor characteristics could be added, considering advantages and limitations.
Response 2: The authors thank the reviewer for this insightful comment, which helped improve the manuscript. A concise specification table has been added summarizing the design-relevant properties and trade-offs of the optical system, including wavelengths, detectors, front-end electronics, ADC characteristics, and sampling rates. The table highlights the advantages and limitations of each component to provide a clear overview of the prototype design.
Comments 3: In section 2.5 it is mentioned that the data was processed with machine learning models, could you explain or add a section of the model? What the authors present is not sufficient and they do not describe the implemented model.
Response 3: The authors thank the reviewer for highlighting this point and agree that the original description was insufficient. A new subsection has been added detailing data preprocessing, feature selection, model families, hyperparameter grids, cross-validation protocol, and measures to prevent data leakage. This addition provides a complete and transparent description of the implemented machine learning models and their evaluation.
“2.6. Modeling Pipeline and Hyperparameters
Channel means (A0–A5) were standardized using only the training-set mean (μ) and standard deviation (σ). The primary feature set consisted of the six standardized channel means. In a sensitivity analysis, engineered features such as pairwise ratios between primary/secondary channels and first differences across volume steps were evaluated, but they did not materially improve performance, so the simpler feature set was retained.
The models tested included Linear Regression, Support Vector Regression (SVR) with a radial basis function (RBF) kernel, and Random Forest Regressor (RF). SVR was intended to capture non-linear relationships while tolerating noise and outliers, whereas RF, as an ensemble of decision trees, offered robustness to non-linear patterns and reduced overfitting. Hyperparameters were optimized via grid search combined with 5-fold cross-validation, stratified by phantom type, container, and volume step, using a random seed of 42. The SVR grid explored C ∈ {1, 10, 100}, ϵ ∈ {0.1, 1.0}, and γ ∈ {scale, 0.1, 0.01}, while RF grid tested n_estimators ∈ {100, 300, 600}, max _depth ∈ {None, 5, 10}, and min _samples_leaf ∈ {1, 3, 5}.
Data were split at the window level into 70/15/15 for train/validation/test sets, stratified as above, with no subject data included. To assess domain shift, models were trained on non-tissue scenarios and tested on held-out tissue scenarios only. Final models were refit on the combined training and validation sets and evaluated on the held-out test set. Model performance was quantified using the Mean Absolute Error (MAE) and Coefficient of Determination (R2), reported as the mean ± standard deviation across cross-validation folds and for the independent test set.”
Comments 4: Figures 4 and 5 present the implemented in vitro experiment. What would be the environmental conditions that were used? How are these experiments in reality? How did they determine that the experiment would be implemented in this way?
Response 4: The authors thank the reviewer for this insightful comment. The manuscript now specifies the exact environmental conditions, including light and temperature controls (see Response 1), and provides a justification for the experimental design. A canonical NIRS phantom approach was followed, beginning with controlled homogeneous media (plastic/glass containers, silicone), followed by heterogeneous tissue layers (pork belly), prior to eventual in vivo studies. An explicit paragraph explaining the design rationale has been added to clarify this progression.
“Design rationale
A staged NIRS validation was implemented: (i) containers (plastic/glass) to isolate interface and refractive-index effects; (ii) homogeneous silicone (1 cm) to approximate a simple attenuating slab; and (iii) layered pork belly (∼2 cm) to introduce realistic heterogeneity. This progression is consistent with biomedical optics phantom practice and reduces risk for subsequent in vivo studies by quantifying domain shifts at each step [9].”
Comments 5: The authors mention that one of the experiments was performed with water, because a liquid similar to urine was not chosen, could the color affect the measurement?
Response 5: The authors thank the reviewer for this important observation. In the selected wavelength range (850–940 nm), water absorption dominates, while urine chromophores responsible for color primarily absorb in the visible band; therefore, color is expected to have limited impact on NIR measurements. Nonetheless, turbidity or hematuria could modestly increase scattering. This explanation has been added to the manuscript, with the effect flagged as a limitation, and a planned follow-up control using a urine simulant has been noted.
“In the 850–940 nm range, water is the dominant absorber and was therefore selected as the fill liquid. Urine chromophores that impart color absorb mainly in the visible band, while turbidity (e.g., cells) may nonetheless increase scattering. A follow-up bench test with a urine simulant (saline + urea) and low-concentration dye is planned to quantify any residual effects.”
Comments 6: In the discussion section, the authors mention that the system's sensitivity to ambient light confirms that its optimal performance is achieved when used under clothing, creating a dark environment that significantly improves measurement accuracy. How is this improvement achieved? The figures do not show the implementation under clothing.
Response 6: The authors thank the reviewer for this comment. The manuscript now clarifies the mechanism by which performance improves under clothing, namely through rejection of ambient NIR light, and links this explanation to the existing light versus dark experimental results.
“The results demonstrate the feasibility and potential of the SafeBladder concept. The "under clothing" recommendation arises directly from the light versus dark comparison (Figure 6), as garments form a low-lux cavity over the sensor, suppressing stray NIR and stabilizing the DC baseline.”
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis paper introduces SafeBladder, a wearable device designed to non-invasively track bladder volume in people with neurogenic bladder. It uses near-infrared spectroscopy paired with machine learning to make its estimates. The prototype combines LEDs and photodetectors to measure how light passes through abdominal tissue, with initial testing done on water-filled containers and biological phantoms. Among the machine learning approaches tried, Random Forest came out on top. The results are promising, but there are a few areas that could use more detail:
- The device uses 850 nm and 940 nm LEDs because they’re readily available, but the water absorption peak is at 975 nm. How was the balance between wavelength sensitivity and hardware availability decided? Were any other wavelengths tried?
- The author mention that adjusting the LED/photodiode angles improved signal quality. Was this based on trial-and-error testing or simulations? Any numbers on how much it helped?
- Silicone and pork belly were used as stand-ins for human tissue, but their optical properties aren’t the same. How did you measure their properties, and how close are they to real abdominal tissue?
- How did you deal with outliers or sensor drift in the data? Was wavelet denoising or Kalman filtering considered?
- Wearable devices often pick up motion noise from walking or bending. Did you run any tests with movement? Is there an accelerometer for motion correction?
- The mean absolute error is 25 mL, how does that stack up against portable ultrasound devices?
- Could other sensing methods be added to improve accuracy, especially in patients with more varied tissue types?
- How will the device handle unusual cases and trigger alerts for patients or clinicians?
Author Response
Comments 1: The device uses 850 nm and 940 nm LEDs because they’re readily available, but the water absorption peak is at 975 nm. How was the balance between wavelength sensitivity and hardware availability decided? Were any other wavelengths tried?
Response 1: Water shows a local absorption band near ~970-980 nm in the NIR window. We chose 850/940 nm as a cost/availability vs. sensitivity trade-off: 850 nm penetrates deeper (lower absorption), while 940 nm trends toward higher water absorption, yielding useful spectral contrast. Suitable 970-980 nm LEDs/drivers at our size/power budget were less available; future iterations may evaluate ~970-980 nm.
Comments 2: The author mention that adjusting the LED/photodiode angles improved signal quality. Was this based on trial-and-error testing or simulations? Any numbers on how much it helped?
Response 2: Empirical optimization with 3D-printed housings of varying inclination (no ray-trace). On dark/plastic bench tests, angled seats increased the mean signal captured by ~15-20 % vs. flat seats and reduced inter-channel variance. We frame this as a prototype-stage observation and plan optical modeling in future work.
Comments 3: Silicone and pork belly were used as stand-ins for human tissue, but their optical properties aren’t the same. How did you measure their properties, and how close are they to real abdominal tissue?
Response 3: We did not measure μa/μs′ directly in this study. Silicone serves as a homogeneous attenuator; pork belly approximates layered skin–fat–muscle with NIR coefficients reported within roughly an order of magnitude of abdominal tissues. We flag this as a limitation and cite phantom-design literature; future work will include calibrated phantoms or integrating-sphere measurements.
Comments 4: How did you deal with outliers or sensor drift in the data? Was wavelet denoising or Kalman filtering considered?
Response 4: We mitigated noise by averaging 500 samples/channel/step (5 s) and standardizing features using training statistics only. Drift was minimal under bench controls; we did not apply wavelet/Kalman at this stage. Future in-vivo versions will evaluate adaptive filtering and IMU-assisted correction.
Comments 5: Wearable devices often pick up motion noise from walking or bending. Did you run any tests with movement? Is there an accelerometer for motion correction?
Response 5: No motion trials or IMU in this prototype; all acquisitions were static to isolate optics. We now explicitly note the plan to add a low-power accelerometer for motion artifact detection/compensation.
Comments 6: The mean absolute error is 25 mL, how does that stack up against portable ultrasound devices?
Response 6: Bench-top phantom MAE 25 mL is within the same order as clinical reports for portable bladder ultrasound (often ~15-30 mL error, context-dependent). We caution that ultrasound results are in vivo, whereas our results are phantom-based; clinical validation will follow.
Comments 7: Could other sensing methods be added to improve accuracy, especially in patients with more varied tissue types?
Response 7: We added outlook notes on hybrid sensing (e.g., bioimpedance, capacitive coupling, or low-power ultrasound) to improve robustness in higher-BMI or atypical tissue profiles.
Comments 8: How will the device handle unusual cases and trigger alerts for patients or clinicians?
Response 8: We describe planned per-user calibration, adaptive thresholds with hysteresis, optional rate-of-change checks, and BLE app alerts. Safety fallbacks (e.g., inability to estimate leads to prompt for ultrasound) are noted.
Author Response File:
Author Response.pdf

