1. Introduction
One of the most essential pillars for the immediate survival of newborns and their long-term neurocognitive and metabolic development is neonatal nutrition. Within the ecosystem of Neonatal Intensive Care Units (NICU), human milk (HM) is recognized as the nutritional gold standard in milk banks (MB) [
1,
2]. However, its nature is not static; it constitutes a highly dynamic biological matrix linked to a biological signature that reflects maternal factors, gestational age, and circadian rhythms [
3,
4]. Within this complex matrix, fatty acids (FA) are critical, contributing with neutral lipids and representing the most energy-dense component necessary for neurological and retinal maturation [
1,
3,
5].
The central problem is that lipid content is the most volatile macronutrient in HM, with fluctuations that can compromise postnatal growth trajectories if not accurately quantified prior to administration [
6,
7]. Recent studies emphasize that the specific fatty acid (FA) profile undergoes critical changes during milk maturation and in response to maternal metabolic status [
5]. Factors such as pre-pregnancy body mass index (pBMI) and dietary intake of long-chain polyunsaturated fatty acids (LCPUFA) significantly alter the quality of the milk product [
8], leading to essential nutrient deficits in very-low-birth-weight infants [
9]. Given this heterogeneity, the practice of using breast milk without precise quantification is insufficient to meet the requirements of preterm neonates who depend on personalized supplementation to avoid extrauterine growth restriction (EUGR) [
10,
11].
The management of this variability is exacerbated by technical limitations at the point of care. Effective monitoring of caloric density is mandatory [
12], but current tools are often costly and inaccessible to many health centers. In particular, there are no devices developed specifically for measuring human breast milk [
13], suggesting a significant lack of solutions focused on women’s health needs. Reference methods like infrared spectroscopy [
14] need specialized maintenance and large capital expenditures, which are not always accessible in rural areas. Additionally, because components like lipids and insulin vary significantly between feeding phases, the absence of standardized sampling protocols introduces bias [
15]. Thus, it is critical to create quick, portable, and easily accessible solutions that optimize the alignment between clinical viability and laboratory standards [
16]. In response to this need, computer vision systems (CVS) function as non-invasive optical sensors that translate spatial and intensity data into compositional metrics. This approach provides a high-resolution alternative to manual inspection by utilizing digital image processing as a primary sensing modality [
17,
18,
19,
20]. The physical principle of this sensing modality relies on the optical contrast at the interface of milk phases. The lipid-rich cream layer exhibits higher light-scattering and reflection coefficients compared to the aqueous serum, resulting in a distinct pixel intensity distribution. These distribution profiles encode the volumetric information of the sample, making quantitative regression feasible by mapping digital features to physical lipid concentration. The rise of telemedicine and the use of AI to predict complications, such as EUGR [
21], demonstrate the viability of digital solutions to optimize neonatal care [
22]. In this context, the combination of a minimalist hardware design with robust processing algorithms allows for superior traceability and quality control [
23]. Nutrition personalization, supported by digital tools, facilitates the adaptation of nutritional strategies to the individual needs of the most fragile users [
24]. Even with aggressive care bundles that include parenteral lipid emulsions to mitigate weight loss, the transition to optimal enteral nutrition remains a critical challenge [
25,
26]. The development of a system capable of isolating and quantifying the cream interface in microcapillaries in an automated manner not only eliminates the subjectivity of the analog creamatocrit method but also provides a scalable and economical tool.
On the other hand, recent advances in intelligent detection systems enable the integration of low-cost hardware with data-driven algorithms to perform reliable physical measurements in non-laboratory environments [
27]. AI-enhanced integrated control platforms have demonstrated stable real-world implementation of sensor-based systems. Likewise, machine learning (ML) [
28,
29] applied to physical sensor data has enabled the automation of diagnostic tasks that replace subjective human assessment [
20]. The combination of image processing with IoT-based acquisition architectures further confirms computer vision as a scalable detection modality in real-world settings [
30]. These advances reinforce the feasibility of objective, automated nutritional assessment through intelligent visual detection.
The main objective of this study is twofold. First, to automate the quantification of cream fraction using a low-cost computer vision system, eliminating the subjectivity inherent in the analogue creamatocrit method. Second, to validate a measurement model capable of operating with high fidelity under controlled lighting conditions. This is achieved by integrating a standardized acquisition prototype with microcapillaries, which allows the isolation and quantification of the cream interface. A central aspect of this approach is the implementation of a Rational Quadratic Gaussian Process Regression (GPR) model. This specific ML architecture was selected after a thorough evaluation of 28 different regression algorithms due to its superior ability to model the complex non-linear relationship between pixel-based volumetric measurements and actual lipid concentration, while effectively managing the heteroscedasticity of biological samples. This system provides a robust, economical, scalable, and high-quality tool.
The rest of the study is divided into six additional sections.
Section 2 details the most relevant related studies.
Section 3 details the Materials and Methods. The results are presented in
Section 4, followed by the Discussion in
Section 5 and the Conclusions with future perspectives in
Section 6.
2. Related Works
The evaluation of the nutritional quality of HM has historically relied on complex laboratory analyses [
14]. These have required high-cost equipment, highly specialized personnel, and prolonged processing times that prevent immediate clinical decision-making [
23]. Recent studies have underlined that the price of Mid-Infrared (MIR) instruments continues to increase due to their optical complexity and that alternative methods, such as ultrasound, although more economical, demand excessive sample volumes and are susceptible to calibration errors [
6]. Beyond traditional analytical chemistry, the emergence of vision-as-sensor frameworks has redefined imaging pipelines as quantitative optical transducers. In these systems, the camera serves as a non-contact probe that captures the interaction between light and the biological sample. Unlike qualitative imaging, this sensing modality relies on the metrological extraction of physical boundaries and optical density variations, where the digital sensor replaces physical gauges to provide higher spatial resolution and repeatability. Given this scenario, computer vision has emerged as a disruptive solution, enabling non-destructive analysis that simulates human inspection capacity while offering superior mathematical objectivity.
At the most advanced level of this technological transition, Deep Learning (DL) and Convolutional Neural Networks (CNN) have demonstrated the ability to transform visual features into predictive metrics with unprecedented accuracy. Investigations by Dahiya et al. [
31] have validated the use of models such as InceptionV3 and Vision Transformers to predict yield and lacteal nutritional quality from images, reaching accuracies of up to 85.6%. Specifically in the human field, Jin et al. [
32] have developed models capable of predicting macronutrient composition and detecting the risk of low milk production using metabolic fingerprints and ML, achieving an accuracy of 87.9%. These models capture texture patterns and density variations in the cream layer that are invisible to the human eye, consolidating artificial intelligence as an emerging standard in nutritional diagnosis.
The necessity for modernizing analytical procedures in the food industry has driven the adoption of efficient, non-invasive inspection technologies. As reviewed by Baiano [
33], conventional techniques for liquid and semi-liquid products are often time-consuming and unsuitable for real-time monitoring. In this context, imaging-based systems have emerged as critical sensing modalities that provide spatial and multi-constituent information regarding the physicochemical properties of products like milk and oils. This transition towards vision-as-sensor frameworks is further supported by recent advancements in the dairy industry, where non-invasive computer vision methods have been successfully deployed to predict milk quality traits, such as fat and protein content, using affordable RGB camera systems [
34]. These sensing architectures demonstrate that digitally extracted features from visible spectra can achieve high correlation coefficients with laboratory ground truths, effectively acting as non-contact optical transducers. Furthermore, recent reviews on food safety emphasize that the integration of AI with smart sensors and computer vision is shifting food quality assessment from slow laboratory testing to real-time, low-cost monitoring of liquid products [
35]. While these imaging-based metrology systems have matured in agricultural and industrial sectors [
36], their application in the clinical environment of neonatal units for human milk analysis remains an underexplored frontier. This study adapts these proven optical sensing principles to the high-precision requirements of neonatal nutritional assessment, where the imaging pipeline is calibrated to function as a high-precision physical gauge for biological samples.
Currently, there is no methodology that integrates low-cost hardware with a hierarchical segmentation process specifically designed to automate creatocrit in breast milk banks. The novelty of this work lies in the proposal of an objective metric derived from CVS that minimizes technical bias through geometric rectification and standardized optical conditioning, offering a robust and scalable alternative for resource-constrained environments.
3. Materials and Methods
This section describes the integrated framework for automated lipid quantification, encompassing the physical experimental design, the hardware deployment of the Computer Vision System (CVS), and the mathematical modeling of measurement uncertainty. The study is structured to bridge the gap between traditional analog creamatocrit methods and automated digital estimation. As shown in
Figure 1, the experimental scheme follows a sequential workflow: from the standardized preparation of biological samples to the high-resolution digital acquisition and subsequent predictive analysis.
3.1. Human Milk Samples and Experimental Design
A large-scale dataset of human milk samples () was obtained from the Human Milk Bank of the San Antonio Abad del Cusco National Hospital (Cusco, Peru). The 6400 samples were obtained from a diverse donor pool during a standardized relabeling process at the regional hospital. In order to ensure data independence and maintain clinical anonymity, each individual milk unit was treated as a distinct biological unit. Furthermore, each milk unit corresponds to a single, unique digital image in the dataset. The experimental phase was structured to ensure metrological reliability through a standardized sample preparation and acquisition protocol. Individual samples were transferred to uniform glass microcapillaries (75 mm length, 1.1 mm internal diameter) and sealed with non-toxic plasticine. In order to achieve a clear physical separation between the lipid fraction and the serum, all microcapillaries underwent a centrifugation process at 10,000 RPM for 15 min using a dedicated clinical microcentrifuge. The study followed a randomized complete block design under controlled environmental conditions (22 ± 2 °C; 45–55% relative humidity). For the baseline analog measurement, a trained operator measured the cream height () and total height () using a precision manual caliper (1 mm resolution). Subsequently, for the digital workflow, the experimental setup was established within a controlled optical enclosure using low-intensity LEDs to minimize background noise. A high-resolution 1080p Logitech camera (Logitech, Lausanne, Switzerland) was positioned at the nadir view distance of 12 cm from the sample holder. In order to generate a robust training dataset, the vertical distances between the detected interfaces (seal–serum, serum–cream, and cream–seal) were computed for the digital images and individually corrected by a trained operator. These measurements act as a high-precision reference. The subsequent ML regression models were then developed to estimate the cream fraction (c) by analyzing global pixel intensity distributions, rather than relying solely on the boundary coordinates; this fraction is used to derive fat and energy quantification through established analytical equations.
3.2. Hardware Deployment and Optical Environment
CVS was integrated into a specialized optical enclosure designed to maintain consistency and eliminate environmental interference. The structure was manufactured in 5.5 mm High-Density Polyethylene. The design isolates the sample from ambient light interference. Its external dimensions are 13 cm (height) × 21 cm (length) × 22.5 cm (width) as shown in
Figure 2a. In order to ensure optical axis stability, a mounting aperture is located at the top for a 1080p high-definition camera, Logitech StreamCam Plus (Logitech, Lausanne, Switzerland). This ensures the stability of the optical axis. The internal compartment is engineered for precise, repeatable placement of the microcapillary tube. The system uses constant-intensity LED strips (Philips, Amsterdam, The Netherlands). Golden light was established as the optimal light. A regulator with the IRFZ44N MOSFET (Infineon Technologies, Neubiberg, Germany) was implemented. The circuit operates at 12 V. A potentiometer allows manual brightness adjustment (see
Figure 2b). The passive filtering stage eliminates noise.
A conditioning chamber with a black background and LED lighting was designed to ensure consistent optical conditions. Capture is performed with a 1080p Logitech camera connected via USB, selected for its precision in segmenting capillary layers. Edge computing is performed, and finally, the results are labelled and stored (see
Figure 3).
3.2.1. Light Sensitivity
A series of experimental tests was conducted to optimize the edge definition of the microcapillary and maximize the visualization of its internal content. The parameters were locked at an exposure time of 10 ms, a gain of ISO 100, and a white balance of 4500 K, with the focus manually fixed at the center of the microcapillary axis. Initial evaluations confirmed that ambient light generates erratic reflections that compromise measurement repeatability. By comparing natural capture with the conditioned environment (
Figure 4), the necessity of the black-background chamber for stable edge detection was validated.
Subsequently, the interaction between different light spectra and the microcapillary was evaluated using blue, yellow, and gold LED sources. As depicted in
Figure 5, each spectrum was tested at high and low intensity levels to identify potential sensor saturation or low-contrast regions. To quantify these responses, a statistical analysis of pixel intensity distribution was performed using 30,000 random subsamples per configuration.
3.2.2. Spectral Optimization
As summarized in
Table 1 and visualized through the violin plots in
Figure 6, the Gold Low spectrum demonstrated superior performance for artificial vision tasks. While blue and yellow sources exhibited higher noise floors or information loss at low levels, the Gold spectrum maintained a wide Dynamic Range (DR = 226 a.u.) and high homogeneity. Under these standardized conditions, the Dynamic Range (DR) was calculated as the intensity spread between the noise floor and the saturation limit across the RGB channels, reaching a peak of 226 a.u. for the Gold Low configuration. Specifically, the coincidence of
and
at 21.0 a.u. indicates a stable sensor response, while the minimum value of 3.0 a.u. facilitates a steeper intensity gradient. This enhanced contrast between the shadows and the microcapillary walls simplifies calibration of hierarchical segmentation algorithms, enabling robust distinction of physical boundaries regardless of minor ambient fluctuations.
3.3. Mathematical Framework and Uncertainty Analysis
The human milk fat percentage (
F) and energy density (
) are not measured directly but are derived from the cream layer height (
) and the total column height (
), which includes both the cream and serum layers. According to the standardized methodology established by [
37],
F and
are calculated using the following Equations (1) and (
2). These equations serve as the universal reference for international clinical protocols and technical health standards, ensuring consistency in the nutritional assessment of donor milk across global healthcare systems [
38,
39].
In order to transform the Computer Vision System (CVS) from a simple image processing tool into a reliable metrological instrument, an uncertainty analysis was performed. This stage establishes the correspondence between the digital coordinate space (pixels) and the physical metric system, ensuring that the extracted heights possess physical meaning. By evaluating how resolution limits propagate through the analytical models, we can define the precision boundaries of the system.
3.3.1. Error Propagation Model
In order to propagate the measurement error, a logarithmic transformation and subsequent differentiation were applied to Equation (1) to linearize the relationship between variables. Differentiating both sides yields the relationship between the relative uncertainties. Since uncertainties are additive in the worst-case scenario, the signs are taken as positive to determine the maximum possible absolute error (
), resulting in Equation (3):
Likewise, for the energy density defined in Equation (2), the constant offset is subtracted before applying the logarithmic transformation and differentiation. The resulting absolute uncertainty for the
is expressed in Equation (4):
3.3.2. Mathematical Model for Uncertainty Analysis
By defining the cream fraction as
and substituting the original definitions into Equations (3) and (4), we obtain the final numerical models in Equations (5) and (6).
The resulting expressions in Equations (5) and (6) reveal that the measurement uncertainty is governed by the cream fraction (
c). Since the instrument resolution (
) constitutes a fixed constraint, defined either by the physical caliper with a nominal resolution of 1 mm (corresponding to an analog uncertainty of
mm by definition of half-scale division) or by the digital pixel resolution (±1 px), and the total column height (
) remains constant for a standardized capillary volume, the propagation of error is effectively scaled by a constant coefficient. Consequently, the absolute uncertainty is minimal for samples with lower lipid content and increases linearly as the cream layer occupies a larger fraction of the tube. This mathematical behavior allows for a predictable assessment of sensor precision across the entire range of human milk compositions [
40]. In order to fully characterize the digital measurement uncertainty beyond the discrete pixel resolution, three additional factors were addressed. First, lens distortion was minimized by using a fixed-focus assembly and maintaining a perpendicular optical axis relative to the microcapillary, ensuring a linear geometric projection within the Region of Interest (ROI) [
41]. Regarding spatial calibration, the system operates directly in the raw pixel domain to maintain data integrity and avoid rounding errors associated with external metric conversions. Since the core variables are derived from the cream fraction (
c), which is a dimensionless ratio, any physical scaling factor is mathematically cancelled, rendering the pixel-based measurement intrinsically self-calibrating as long as the sensor-to-sample distance remains constant. Finally, segmentation variability was mitigated through the standardized Gold Low illumination environment, which provides high-contrast interfaces and stable intensity gradients. This controlled setup ensures that the automated detection of
and
is repeatable and that the
px resolution remains the dominant and predictable limit of the system uncertainty.
3.4. Computer Vision Pipeline
The algorithm, developed in Python 3.10, processes images at their native resolution to avoid information loss from interpolation. The operational flow (see
Figure 7) is based on segmenting raw units (pixels) to calculate the dimensions of the phases in the microcapillary.
A semiautomated labeling protocol was implemented to ensure the highest metrological standards during the training phase. While an initial segmentation algorithm was developed to estimate and , each of the 6400 images captured underwent rigorous manual supervision and refinement. An expert reviewed and adjusted the detected boundaries in each sample to eliminate any residual errors caused by optical artifacts or meniscus distortion. This carefully selected dataset served as the high-precision ground truth (target). Consequently, the ML models were trained not only to replicate the initial segmentation but also to understand the complex relationship between global pixel distributions and achieve a level of robustness that surpasses simple automated edge detection.
3.4.1. Hierarchical Segmentation and ROI Definition
From the original image
, the conversion to greyscale
is performed. To isolate the microcapillary, global binarization is applied as described in Equation (7).
where
because it is sufficient to distinguish the microcapillary from the background. This value is determined empirically and standardized by Gold Low illumination environment. The contours (
) are extracted, and the geometry of the area (
A) is validated using the shoelace formula derived from Green’s Theorem:
On the other hand, the general Region of Interest (
) is defined by a bounding rectangle:
The system divides the
using the midpoint
. An intensity threshold
is applied to isolate the seal (
) and the cream (
). The lower boundary of the seal (
) is defined as:
where
is the origin of the capillary,
is the relative displacement, and
is the height of the seal. The upper limit of the cream (
) is located in the second subregion as described in Equation (11) where
is the relative displacement of the cream:
where
represents the displacement of the cream layer relative to the start of the second subregion. This spatial discrimination allows the analysis of the serum phase to be confined to a third region of interest (
) defined by Equation (12).
A grey range
is applied to extract the dominant contour of the serum. The final heights of the serum (
) and cream (
) in pixels are obtained by adjusting a rectangle of minimum area:
where
w is the width and
h is the height. Finally, based on these values, the percentage of cream, the percentage of
F, and the total
are calculated using analytical expressions aligned with the technical guidelines established in [
39] and based on the formulations proposed by [
37].
3.4.2. Adaptive Cropping
To ensure that color analysis is restricted to the biological content of the sample, an adaptive cropping algorithm is implemented. The adaptive cropping is performed only along the vertical axis (
y). Once the general Region of Interest (
) has been defined, the final vertical limits of the sample (see Equations (14) and (15)) are computed by detecting intensity transitions corresponding to the lower boundary of the seal and the lower boundary of the cream layer, respectively.
where
denotes the vertical intensity gradient. No additional offset is applied; therefore, the cropping limits coincide with the detected physical interfaces. The resulting cropped image
is defined as the subset of pixels containing exclusively the serum and cream phases, effectively removing external artefacts and non-biological regions.
The horizontal extent of the crop is defined by the full width of the microcapillary ( to ), ensuring that all columns of the sample are included in the analysis.
3.4.3. Multichannel Statistical Feature Extraction
After isolating the sample, intensity-domain feature extraction is performed to feed the regression models. For each cropped image
, the RGB color channels are separated, and the luminance (Gray Scale) channel is computed. The feature vector
X is constructed by computing frequency histograms for each channel
, as shown in Equation (16).
where
is the Kronecker delta function, and
i represents the intensity level. The final feature space is defined by concatenating these histograms. Using this procedure, a structured dataset of feature vectors was generated, where each row corresponds to a single sample image and each column represents a specific feature. This organized dataset was then ready for the application of ML algorithms for regression and predictive analysis.
3.5. Predictive Modeling via ML Regression Algorithms
In order to identify the algorithm with the highest predictive capacity for estimating the cream fraction (
c), 28 ML regression models were evaluated. These models were grouped into eight main families: linear models (EL), ensemble trees (ESB), Gaussian process regressors (GPR), kernel-based methods (KN/SVM), linear regression models (LR), neural networks (NN), and decision trees (Tree).
Table 2 provides the specific configurations and hyperparameters used for each architecture. The total set of images was divided into 65% for training, 25% for validation, and the remaining 10% for testing. Each model was evaluated under identical computational conditions. The selection criterion for the optimal model was based on maximizing the coefficient of determination (
).
5. Discussion
The transition from manual inspection to the proposed CVS represents a fundamental shift in measurement principles, as synthesized in the cross-method evaluation in
Table 4. While electrochemical sensors [
13,
16] rely on ion-selective redox reactions—which offer high chemical sensitivity but are susceptible to thermal noise and electrode fouling—the CVS employs a non-invasive optical reflectance-based methodology. Unlike electrochemical methods that require periodic reagent calibration, the CVS maintains long-term stability through fixed hardware optimization. The 57.5% relative error improvement should be interpreted as a meteorological milestone in interface detection, directly resulting from the peak performance of the Rational Quadratic GPR model using the 768 RGB feature set (
). This model has a training time of 573.4 s, a prediction Speed of 2807.9 obs/s, and a model size of 33.9 MB. The computational feasibility was validated on a portable workstation with an Intel Core i7-1255U processor and 12GB of RAM, where the total processing time per sample remained consistently under 1.5 s.
In manual clinical practice, factors such as the parallax effect and subjective boundary identification lead to an MRE of 9.52%. The CVS eliminates this variance by applying a probabilistic sub-pixel analysis to the chromatic signatures of the milk phases. This improvement is not merely a statistical gain but a reduction in the noise floor of the Creamatocrit method, effectively shifting the measurement from a human-centered estimation to a standardized digital quantification. The superior performance of the Rational Quadratic GPR model is rooted in its kernel structure, which is uniquely suited for the physical nature of milk samples. Unlike linear kernels or the rigid architectures of CNNs [
31] that may overfit small clinical datasets, the Rational Quadratic kernel acts as a scale mixture of RBF kernels. This allows the model to handle both small-scale chromatic fluctuations and larger-scale trends in the c fraction simultaneously.
The proposed CVS offers a low-cost, automated alternative for Human Milk Banks in resource-limited settings and low-income countries. By replacing manual inspection with a digital scan, the system eliminates inter-operator variability and parallax errors. This allows non-specialized clinical staff to perform precise, real-time nutritional fortification for infants, providing a scalable, standardized tool where expensive infrared analyzers are not feasible.
6. Conclusions
This research successfully developed an optimized CVS for the automated estimation of the cream fraction, which is subsequently used to calculate the quantification of fat and energy in human milk through established analytical equations. The integration of hardware-level optimization and ML demonstrates a measurable advancement in metrological reliability for neonatal care, rather than a mere incremental improvement. This progress is characterized by two distinct technical contributions: first, the hardware optimization and illumination control effectively lowered the measurement noise floor, enabling a 57.5% reduction in final uncertainty; second, the Rational Quadratic GPR model provided the necessary sub-pixel precision and stability () to maintain a negligible bias of 0.06% across the clinical range.
Despite these advancements, certain limitations must be acknowledged. While the CVS eliminates analog errors such as parallax, its performance remains dependent on the physical quality of the capillary filling and a controlled acquisition environment. Furthermore, the current validation is focused on a specific clinical range, with observed cream fraction values between 0.0429 and 0.0663, which correspond to energy density levels between 576.8 and 732.9 kcal/L. These values correspond to the typical normative range for mature human milk; consequently, further studies are required to ensure the system’s generalization across extreme concentrations, such as those found in colostrum (potentially higher) or in cases of severe maternal malnutrition (potentially lower).
Future work will focus on two key areas. First, we will prioritize scalability and mobile integration by porting the CVS to smartphone platforms [
28], enabling home-based nutritional monitoring for lactating individuals. Second, the target biomarker database will be expanded to include the quantification of total proteins and lactose levels [
13]. In order to ensure model robustness under real-world conditions, future validation will include non-normative milk samples, considering diverse maternal profiles, seasonal variations, and extreme fat concentrations. Additionally, multimodal sensor fusion and advanced preprocessing techniques will be explored [
42] to isolate biological responses from environmental effects. These advancements will not only enhance individual neonatal care but also provide a scalable tool for large-scale epidemiological surveillance of maternal-infant nutritional status, ensuring global clinical applicability even in the most remote regions.