1. Introduction
Breast milk is a complex and highly variable fluid that provides nutrients and bioactive components for the correct growth and development of infants. The composition of breast milk changes throughout the lactation period, adapting to the nutritional needs of the rapidly growing newborn. Depending on the time of lactation, three types of milk can be distinguished: colostrum, transitional milk, and mature milk.
Breastfeeding is necessary for the development of newborns because it protects them from infections and diseases such as diabetes, obesity, and or hypercholesterolemia. Due to the multiple benefits of breast milk for infants, in the middle of the 20th century, milk banks were established. Currently, the World Health Organization and national and international pediatric associations consider breast milk banks as necessary to guarantee adequate nutrition for premature infants (born before 32 weeks of gestation or weighing less than 1500 g) or sick infants who, for whatever reason, cannot be fed with their mother’s milk.
Milk banks are a specialized service integrated into the neonatology units of hospitals. Their objectives are the promotion and support of breastfeeding; they are responsible for the selection of donors, and the storage, processing, analysis, and distribution of milk. To guarantee the safety of donated milk, it is subjected to a pasteurization process using the Holder method (62.5 °C for 30 min and then rapidly cooled in less than 15 min to 4 °C). In neonatal units, human breast milk contains 70 kcal/100 mL because it was obtained from women of term babies later in the lactation period. However, the amount of macronutrients in breast milk depends on different factors such as gestational age, feed, or sampling procedure [
1] and no predictions can be established [
2]. As such, when comparing preterm and term milk, the former one has less energy and protein than the latter. Considering these issues, one major concern when feeding neonates with breast milk from hospital banks is the nutritional adequacy of the milk in meeting the nutritional requirements of the neonate, because some nutrients such as protein are associated with neurodevelopmental outcomes [
3].
To determine the quality of breast milk, nutritional analysis should be carried out. Nowadays, milk quality is controlled in laboratories using properly validated chemical reference methodologies such as Mojonnier, Folch, Gerber, or the Roese-Gottlieb method for fat quantification, and the Bradford method for protein elemental analysis. Another alternative used in some milk banks involves quantifying the referred compounds in milk analyzers using pre-calibrated medium or near-infrared (IR) instruments. To establish the quality of the results obtained by the IR technique, Fusch et al. compared eight different laboratory IR analyzers to quantify macronutrients in breast milk, concluding that research groups using these devices must be cautious about their measurements, finding that adequate sample preparation and instrumental calibration and validation are necessary following the Good Laboratory and Clinical Practices [
4].
Given these considerations, the best method of guaranteeing the adequate nutrition of newborns is the use of onsite and real-time sensors that are able to quantify the macronutrients in breast milk. Moreover, due to the value and short supply of breast milk, a noninvasive analysis would be the best method to achieve final sample analysis. This type of analysis can be carried out just before consumption. Among the portable, noninvasive, and real-time analytical techniques for food analysis, near-infrared spectroscopy (NIRS) is a real-time, noncontaminating, and versatile technology capable of providing information on food quality attributes in situ. Moreover, no sample pretreatment or chemical reagents are required for the analysis, making it a waste-free technique, unlike traditional laboratory nutritional compositional analyses [
5,
6,
7]. Nevertheless, there are some important limitations of NIRS analysis due to the strong absorption of water in the NIR region, which increases the background, preventing obtaining satisfactory results in quantitative analysis. Nowadays, the applications of NIRS have increased due to the development of aquaphotomics [
8], a novel science and methodology that features water NIRS spectra in aqueous and biosystems for indirect analysis of components.
The drawbacks of using NIRS (Near Infrared Spectroscopy) with portable instruments are the sensitivity of the NIR signal, the narrow wavelength range of low-cost devices, and spectrum complexity. NIR spectra are difficult to interpret because the vibrations of different molecular bonds are involved in the same absorption band, resulting in overlapping information. However, to overcome these shortcomings and to extract information, it is necessary the use multivariate analysis [
5,
9,
10]. By combining NIR spectra and appropriate multivariate analysis, it is possible to extract all the relevant information and to develop a fit calibration model that is able to quantify the macronutrients in breast milk both onsite and in real-time. This methodology will enable newborns to be fed with breast milk that is adequate for their development stage [
11,
12,
13].
Qualitative models have been developed with portable instruments, which allow differentiation between colostrum, transition milk, and mature milk [
14]; however, no information about breast milk quantification parameters or the effect of spectra chemometric pretreatments on the final calibration models with portable devices is available. NIRS methodologies were developed based on the use of handheld, portable NIRS, and were tested for the analysis of the three major components in cow raw milk (fat, protein, and nonfat solids) [
15]. Because of this, the application of onsite NIRS technology to the study of breast milk can be an effective alternative for the characterization and control of the quality of donated breast milk received by milk banks. An important aspect to take into account, and studied by Kwan et al., is that quantitative NIRS studies on breast milk were carried out with high-performance laboratory equipment, that were acquired precalibrated, which can lead to errors due to a systematic displacement of the data or some inaccurate calibrations [
16].
To the best of our knowledge, no studies have proved the effectiveness of low-cost, onsite, and easy-to-use handheld instruments in quantifying macronutrients in breast milk. The complexity of the analysis, the need for obtaining macronutrients quantification in real-time, and the small amount of available breast milk demand new methodologies able to meet all the above-detailed requirements.
As such, in this study, we constructed methodology based on the use of low-cost NIRS sensors and appropriate chemometric procedures for onsite, real-time quality control monitoring of breast milk. The aim was to provide the neonatology units and milk banks with a cheap and easy-to-use tool that is able to establish the quality of milk just before newborn consumption to enable real-time decision making and ensure adequate nutrient combinations for feeding newborns.
3. Results and Discussion
In order to understand the information in the collected NIRS spectra,
Figure 2 plots the raw and after-derivation values of the averaged spectra for the calibration and validation sets. We observed no differences between the populations, with the water band (O-H interactions) at 1950 nm being the most intense. At this wavelength, the band was a result of multiple overlapping bands, and it was directly influenced by chemical interactions with other molecular species in the sample [
5]. Other important bands that could be associated with macronutrients in milk located at 2300 and 2380 nm. These are described as protein and fat (oil) bands, respectively [
22]. Based on the aquaphotomic principle (strategy of monitoring a spectral band associated with a specific parameter, such as water, fat, protein, etc.), the location of these specifics bands (protein and fat) could help improve the calibration models for the aforementioned macronutrients.
After explaining the effect of vibrations associated with the macronutrients at different wavelength bands and their importance in the development of the NIRS procedure for the proposed parameters, we applied PCA with Hotelling’s T
2 ellipse to raw NIR spectra of the calibration set to explore the spectra and detect outliers. As shown in
Figure 3, the data outside the ellipse need to be checked because they are potential outliers. We can see that there are two samples located outside the ellipse: numbers 36 and 44. These samples were a mix of the originals 16 + 17 and 7 + 12, respectively. These results could be explained by operational error and not due to a compositional or spectral difference between these samples and all the others involved in this study. All original samples other than the mixed and diluted ones were satisfactory according PCA. Both outliers were deleted for the development of the final calibration models.
The next step was to run the calibration models with the global spectrum using PLS as a regression strategy and cross-validation with random groups. As detailed in the
Section 2, different pretreatment procedures were evaluated to obtain the best calibration statistics, ranging from no pretreatment to scattering correction (SNV) plus first derivative. As shown in
Table 2, a total of 18 calibration models applying different pretreatments, both with and without scattering correction and derivatives, were used to quantify six parameters (fat, crude protein, raw protein, carbohydrates, energy, and total solids) for quality control of breast milk.
As shown in
Table 2, for all parameters, with the exception of raw protein (R
2 = 0.58), the R
2 calibration coefficient obtained was higher than 0.7, and the best performance was exhibited by fat, energy, and total solids with R
2 values all higher than 0.9. These results for fat could be explained due to spectrum bands: as shown in
Figure 2, a specific band for fat is noted around 2380 nm, as was observed for the breast milk samples. The presence of this band allowed us to obtain excellent calibration results with raw spectra data, without any pretreatment (R
2 = 0.91). Notably, fat is one of the most important parameters when characterizing donated breast milk as it is the main source of energy for newborns. Furthermore, it provides essential nutrients such as fat-soluble vitamins and polyunsaturated fatty acids [
23]. These results showed that the NIR spectra successfully captured quantitative variations in fat, showing the richness of the NIRS spectra.
Similarly, quantitative results could be obtained using the proposed procedure, because NIRS regression coefficients were obtained for energy and total solids, with R
2 values higher than 0.9 [
5]. Precision values calculated as coefficient of variation (CV% = 100 × SEC/mean calibration set reference values) ranged between 10% and 15% for all the parameters involved in this study. Similar results were obtained in previous research [
16] when comparing the results produced by different devices and reference methods. For these parameters, the best results were obtained after applying SNV and the first derivative to the raw spectra. For RP, CP, and CH, not many differences were observed when developing PLS regression with or without pretreated spectra. The R
2 value for protein data was higher than 0.75 and for CH, it was higher than 0.8. These values indicated that all the developed models can be used for quantitative analysis.
After studying the calibration statistics of the NIRS multivariate models in depth, the next step was to select the best chemometric models to quantify each macronutrient in breast milk. This choice was made based on the criterion detailed in the
Section 2 (the highest r
2 value and the lowest SECV) and on the comparison between SEC and SECV, because a gap between SEC and SECV is related to large differences between calibration and prediction results, which indicate that the calibration model was not robust.
Although few differences were found when comparing the NIRS calibration and cross-validation statistics, the best results for fat (r
2 = 0.841, SECV = 0.51), proteins (RP, r
2 = 0.512, SECV = 0.21), and carbohydrates (r
2 = 0.741, SECV = 1.35) were obtained when no pretreatment was applied to the NIR spectra. For energy (r
2 = 0.830, SECV = 9.60), SNV plus the first derivative obtained satisfactory results. As shown in
Table 2, the best R
2 value for TS was obtained with a first-derivative pretreatment; however, the difference observed between cross-validation and calibration data was too large, indicating that the results were not robust and the validity of that model was limited. For these reasons, we determined that the best model for TS quantification was that without applying any pretreatment to the spectra (r
2 = 0.685, SECV = 2.42).
The energy and TS parameters showed the largest differences in SECV and SEC values. The reason for these results may be that both parameters are indirect. This means that they are not directly related to a molecular bond. However, the robustness of the calibrations could be improved by enlarging the sample size and the variability in the breast milk. An update of the calibration models is required when new samples are considered. However, no conclusions could be reached with this calibration and cross-validation data, as an adequate statistic test is needed to quantify the SEP (external validation).
After selecting the best treatments, to confirm the validity of the developed models, we then externally validated the models, predicting all the macronutrients by using the selected models. As detailed in the
Section 2, 15 breast milk samples with reference data were selected for external validation and all the parameters were quantified. After quantifying all the breast milk parameters with the developed models, Student’s
t-test was applied to compare the results obtained when analyzing samples using the reference and onsite NIRS method.
As detailed above, the final acceptance or evaluation of the NIRS calibration models necessitated an external validation including samples not involved in the calibration procedure. In this work, 15 breast milk samples were included in the external validation set and quantified with the selected calibration models developed in this study. The selected methods and external validation statistics are detailed in
Table 3.
As shown in
Table 3, we compared the SEP and SECV values for the selected models. The ratio between SECV and SEP ranged between 0.817 and 1.035 for all parameters except for TS (0.533). The similarity between SECV and SEP confirmed that no difference was found between the external and cross-validation predicted errors, indicating a sufficiently robust calibration for all parameters [
24]. For TS quantification, an improvement in the models is required by including new samples and enhancing the multivariate model.
Another statistic that can explain model reliability in NIR spectroscopy is RPD (RPD = standard deviation/SEP). For this parameter, three categories can be defined: (1) excellent models, with RPD > 2; (2) fair models, with 1.4 < RPD < 2; and, (3) unreliable models, with RPD < 1.4 [
25]. These values have been applied in NIRS studies; however, no statistical basis has been used to establish these thresholds. Moreover, researchers developed useful calibration models with RPD values lower than the proposed standard values [
25]. Considering this statistic in
Table 3, we the fat and energy models are categorized as excellent and the RP and HC models as fair. For TS and CP, the range of and variability in the samples should be increased to improve the NIR statistics and obtain a valuable model.
After determining the fit and quality of the calibration models, and taking into account the coefficients of determination and calibration or cross-validation errors, a statistical test including SEPs and reference data was conducted to evaluate the prediction errors of the multivariate NIRS models. In this work, we selected Student’s t-test to compare paired samples. We applied a paired difference t-test to compare the results obtained when samples were analyzed by the reference and NIRS methods. Then, we used the set of differences to build the t-statistic using the mean and standard deviation of the differences. Student’s t-test is a useful statistic strategy for comparing two data sets of quantitative results obtained with different analytical methods. In this study, we compared the reference and predicted data of all samples included in the validation set (N = 15). The results showed that there were no differences when considering a confidence level of 95% because, for all parameters, the calculated tstudent was is lower than 2.145 (theoretical value of t for 14 freedom degrees and 95% confidence level). Another parameter that we used to characterize the proposed methodology was accuracy. We calculated the accuracy for all the macronutrients involved in this study, and the best results were obtained for fat and energy with a value of 94%.
We could find no information about analytical methodologies for onsite and real-time quantification of the macronutrients in breast milk.
Table 4 summarizes the most relevant studies conducted using NIRS technology, including the instrumentation employed to quantify macronutrients (high-performance laboratory instruments). Only dos Santos et al. [
14] used a portable NIRS instrument to classify breast milk in colostrum, transition milk, and mature milk, which are the three stages of the lactation period.
Focusing on spectroscopic laboratory methodologies, previous researchers evaluated and compared the use of near- and mid-infrared instruments vs. reference methodologies using high-performance laboratory devices [
28]. More specifically, the macronutrients quantified were fat, protein, and carbohydrates. The NIRS instruments employed were precalibrated NIRS laboratory devices, scanning in a range of 1200–2400 nm. No information about the statistics of calibration models were included in this work; however, external validation statistics can be compared with our results. The values of coefficients of determination detailed by Fusch et al. for validation were 0.76 for protein, 0.01 for lactose, and 0.79 for fat. In our study, employing a portable device and developing our own calibration models, the coefficient of determination of cross-validation was 0.52 for protein, 0.74 for HC, and 0.84 for fat MISSING [
28].
Table 5 compares the external validation statistics obtained with laboratory instrument vs. the developed portable-device methodology by computing the linear regression of the reference and predicted NIRS data [
28]. To evaluate the random error in the prediction based on regression results, the S
y/x statistic (random error in the y-direction, y-direction indicates the prediction values and x-direction reference values) was calculated. Comparing these data with SEP values (
Table 3) for fat, RP, and HC, we observed that the regression error (S
y/x) was equal or lower than the error obtained in external validation (SEP), confirming the validity of the developed models.
Another NIRS strategy for macronutrients milk analysis was carried out by de la Roza et al. [
2], who developed in-house NIR calibrations with a high-performance laboratory instrument (working range 400–2500 nm) for fat, protein, and nonfat total solids in cow milk. The best result, with an R
2 value of 0.971, was obtained for the quantification of fat, applying mathematical SNV, detrend, and second derivative pretreatments. The R
2 value is similar to that obtained in this work (0.910). However, for total solids, the values of the calibration coefficient of determination for cross-validation were lower than 0.700 (r
2 = 0.612) [
2], and better results, above 0.750, were obtained using our handheld portable instrument (r
2 = 0.787).