1. Introduction
The carob tree (
Ceratonia siliqua L.), a species native to the Mediterranean basin, has long been appreciated for both its ecological resilience and economic importance. Naturally adapted to semi-arid and low-input environments, it grows well in poor soils and under drought conditions, making it a strategic crop for sustainable agriculture, particularly in marginal regions, such as the Moroccan hinterland or the Algarve
barrocal [
1]. In recent decades, the carob has received renewed interest due to the nutritional, functional, and industrial potential of its fruit, positioning it as a valuable asset in both traditional and emerging value chains [
2,
3,
4].
The carob pod is composed primarily of a naturally sweet pulp, widely processed into powders and syrups that serve as natural sweeteners and cocoa alternatives. Its high sugar content has also triggered growing interest in its potential for bioethanol production, placing carob pulp as a promising renewable energy source [
5,
6]. The seeds, which represent 10–15% of the pod’s weight, mainly comprise three distinct parts: the husk (also called shell or peel), the embryo, and the endosperm. The husk, a tough brown outer layer, constitutes 30–35% of the seed’s dry weight. The embryo, centrally located, accounts for 15–30% and is rich in proteins, enzymes, dietary fiber, minerals, and polyphenols. Most notably, the endosperm, which makes up 40–50% of the seed, is the source of locust bean gum (LBG), a highly valued galactomannan-based polysaccharide also designated as E410 [
7]. Chemically, LBG consists of a linear β-(1→4)-linked mannose backbone with α-(1→6)-linked galactose side groups, where the mannose-to-galactose ratio is typically around 4:1 [
8]. This structural composition strongly influences its solubility and functional properties, such as viscosity and synergistic gel formation with other hydrocolloids (e.g., xanthan gum, carrageenan) [
9,
10]. Beyond its well-known thickening and stabilizing effects, LBG has been studied for its prebiotic potential [
11], controlled drug delivery systems [
12], and as a sustainable alternative to synthetic polymers [
8,
13]. LBG is highly appreciated for its thickening, stabilizing, emulsifying, and gelling capabilities [
14]. Previous studies have highlighted the rheological behavior of LBG in aqueous solutions and its role in improving texture and water retention in food products [
7]. Its biodegradable and non-toxic nature makes it suitable for diverse in pharmaceutical and biomedical formulations [
15] as well as in cosmetics and textiles [
7,
16]. With growing global demand for clean-label, plant-based ingredients, carob seed-based products are increasingly recognized as valuable agro-industrial resources.
Extracting LBG with high purity and functionality strongly depends on the effective removal of the seed coat, a process known as dehusking. Conventional dehusking methods generally rely on mechanical milling, roasting, or thermal treatments, but these approaches often result in incomplete husk removal, contamination of the endosperm, and even thermal degradation of galactomannans [
7]. Thermal roasting at high temperatures facilitates husk brittleness but can negatively affect gum viscosity and solubility by partial depolymerization of the polysaccharide chains [
7]. As a consequence, conventional approaches often yield products of heterogeneous quality, requiring additional purification steps to achieve food- or pharmaceutical-grade gums [
17].
To overcome these drawbacks, several alternative techniques have been proposed to improve efficiency and gum quality. Among them, water-dehulling (boiling the seeds followed by manual peeling) softens the husk, thereby reducing contamination, whereas acid-peeling—typically using sulfuric acid at elevated temperatures—carbonizes the husk and produces whiter gum with higher polysaccharide purity, better solubility, and greater intrinsic viscosity [
18]. However, acid-based methods are environmentally hazardous, generating toxic SO
x gases and requiring specialized waste management, which limits their industrial sustainability.
To address these issues, alternative sustainable dehusking solutions have been explored. One promising direction involves novel aqueous acid systems, such as methanesulfonic acid, which allow efficient peel removal under milder and safer conditions [
19]. Despite the progress, evaluating the effectiveness of these dehusking methods remains a technical challenge.
Current assessment techniques are often subjective, destructive, or lack the sensitivity needed to detect subtle variations in surface properties and seed integrity. In this context, Diffuse Reflectance Spectroscopy (DRS) emerges as a powerful and non-destructive analytical tool. Based on the measurement of reflected light across a broad spectral range, DRS provides unique spectral fingerprints that capture both chemical composition and physical surface characteristics [
20]. It has been widely applied in agriculture and food sciences for different purposes, such as monitoring fruit firmness [
21], ripening [
22], evaluating leaf pigmentation [
23], plant stress [
20], assessing post-harvest quality [
24], and detecting moisture-related changes in biological materials [
25].
Building upon our earlier findings, where DRS was introduced as rapid, non-invasive, and highly sensitive method, the present study explores its remarkable potential as a robust and reliable diagnostic tool for monitoring dehusking efficiency. This technique offers a unique opportunity to objectively compare processing methods, ensure consistency and quality in LBG production, and minimize material loss during seed treatment. By benchmarking spectral data against commercial standards and analyzing changes in seed surface properties, we demonstrate the feasibility of integrating DRS into industrial-scale carob processing workflows. To refine the analysis, we employed the Kubelka–Munk (KM) formalism. Originally developed by Paul Kubelka and Franz Munk to describe the behavior of light in turbid media like paint films, this theory has become a cornerstone for interpreting reflectance data across numerous fields, including textiles, color science, and paper manufacturing [
26]. In biological and agronomical sciences, the KM model is widely applied to deconvolve the complex interplay of light absorption and scattering, enabling the non-destructive analysis of plant leaves and fruit quality [
27,
28]. In this study, we leverage the KM two-layer model to isolate the optical properties of the carob seed’s skin from those of the underlying endosperm. This approach enables the derivation of the spectrum attributable to the skin alone, providing a robust and quantitative metric to evaluate the efficacy of different dehusking treatments.
DRS ability to monitor both surface reflectivity and internal compositional features offers significant advantages for standardization, quality control, and sustainable process optimization in carob value chains. Beyond carob, the versatility of DRS suggests broad applicability across other agricultural and food systems.
3. Results
Colorimetric analysis offers a quick and straightforward approach to evaluate visual differences in seed appearance, providing an initial indication of the effectiveness of the extraction process. However, if the treatment changes the internal composition of the endosperm, such as affecting polysaccharides integrity, these variations may not be visually perceptible. Thus, a more robust and sensitive analytical method is required to fully capture the solvent’s impact on seed chemistry. To this end, DRS coupled with KM analysis was employed, for the first time, as a non-destructive tool to infer potential compositional changes induced by different extraction treatments [
19].
As described in the Materials and Methods Section, four spectrometers with distinct optical configurations (e.g., diffraction gratings, detectors) were used to cover the full spectral range. These differences led to slight mismatches at the spectral boundaries, which were corrected using a custom matching and smoothing algorithm to generate continuous, artifact-free spectra. The initial results obtained are presented in
Figure 2. For each seed type, five replicates were recorded. The raw seed spectra are shown in black, the spectra from industrially treated seeds appear in red, and those from seeds treated with our optimized method are depicted in blue. This visualization clearly highlights the spectral distinctions across treatments, offering insights into both surface and compositional changes. The main spectroscopic absorption bands in this spectral range are overlayed in the plot. They are represented in the form
, with
n − 1 the order of the overtone,
for stretching and
for bending oscillations, and
X = O-H, C-H, CH
2 or CH
3, the chemical bonds. The main spectroscopic bands highlighted in
Figure 2 are related to the chlorophyll absorption peaks (Chl), second and first overtones of O-H stretching (3
ν(O-H) and 2
ν(O-H), respectively), second overtone of C-H stretching (3
ν(C-H)), combination band of first overtone of stretching and fundamental of bending of C-H (2
ν(C-H) +
δ(C-H)) and first overtone of CH
2 and CH
3 stretching (2
ν(CH
2,CH
3)).
The first striking interpretation of the results is that the carob seed husk, like fruit peels and most plant leaves, functions as a broadband optical reflector. This reflective effect is interrupted at specific wavelengths where absorption is significant, most notably in the visible range, due to high pigment concentrations, and at water absorption bands in the NIR region. Thus, the Vis-NIR reflectance spectra of seeds and fruits can be broadly described as a reflectance plateau with distinct absorption dips at characteristic bands, that correspond essentially with those shown in the plot bars. This framework explains the main features observed in the raw carob seed Vis-NIR spectra shown in
Figure 2.
A pronounced reflectance plateau appears between 800 and 1400 nm at reflectance values of approximately 0.4 to 0.5, where the optical influence of the husk dominates due to minimal absorption in this region. We refer to this as the NIR plateau. The notable dip in the visible region is attributed to pigments, such as phenolic compounds, including tannins, flavonoids, anthocyanins, and lignin. Particularly, the local dip near 680 nm aligns with traces of chlorophyll presence. Above 1400 nm, the reflectance decreases to around 0.3, reflecting absorption by water (specifically the first overtone of O-H vibrations, ) and C-H bonds (combination bands of the first overtone stretching with fundamental bending, ).
Additional minor spectral features further complement this interpretation: for example, a subtle dip near 1200 nm is linked to the second overtone of C-H stretching vibrations, , while a depression around 980 nm corresponds to the second overtone of O-H vibrations, . An unusually strong dip at approximately 1700 nm is observed, whose origin remains unclear but coincides with the first overtone of C-H stretching in CH2 and CH3 groups, although this feature is typically less pronounced in other samples.
This spectral characterization of raw seeds serves as a baseline for comparing the effects of dehusking. The husk’s principal role as a reflector means that its removal leads to a global decrease in seed reflectance, as evidenced by the data: reflectance curves for both industrially and laboratory-dehusked seeds are significantly lower than those of raw seeds. Focusing on the NIR plateau, reflectance drops from 0.4 to 0.5 in raw seeds to 0.1–0.2 in industrially treated seeds and further to 0.07–0.1 in laboratory-processed seeds. The lower reflectance in the laboratory samples suggests a more efficient husk removal process, whereas industrially dehusked seeds likely retain residual husk fragments, explaining the differences between the two methods.
One might initially expect industrially dehusked seeds to reflect less light, since residual seed coat should retain absorbing pigments. However, our results show the opposite trend. This apparent discrepancy can be explained by considering the two main optical roles of seed skin: absorption by pigments and scattering due to structural refractive index mismatches. Industrial treatment removes most of the pigment-rich outer layers, thereby reducing absorption, but often leaves behind residual inner layers. These remnants contain few pigments yet maintain a strong refractive index contrast with the underlying tissue, which enhances scattering. As a result, industrially dehusked seeds exhibit higher reflectance than lab-treated seeds, where the seed coats are removed more completely.
Figure 3 shows the absorbance, calculated form Equation (3). As expected, higher reflectance corresponds to lower absorbance; thus, raw seeds exhibit the lowest absorbance values. The laboratory-treated seeds display higher absorbance than the industry-treated ones, which, according to our hypothesis, is due to their cleaner surface (i.e., more effectively stripped of husk remnants) allowing greater light penetration and absorption. It is also interesting to note that the convolution of the
and
bands now emerge clearly as the main absorption feature not caused by the pigments.
Figure 4 shows the result of the KM procedure, applied to the average spectra of the three groups. Therefore, for the industry-treated seeds,
average of reflectance from industry-treated seeds and
average of reflectance from raw seeds (see Equation (5)). For the laboratory-treated seeds
is the same and
average of reflectance from laboratory-treated seeds. The value of the skin reflectance,
, is obtained from (7), after finding
from (5) and the self-consistent guess of
a and
b in (6) (which determines
).
The spectroscopic features are enhanced in the husk: the absorption dips caused by O-H and C-H vibrations are much more visible than in the raw or dehusked seeds. As expected, skin reflectance, , is larger than that of the flesh in the NIR and lower in the visible, due to the pigments. The raw seed reflectance, , is a combination of the two, yielding intermediate values in the NIR and approximately the same values as the husk in the visible. The KM calculated in the NIR is higher for the laboratory treated samples than for the industry-treated samples. This is easily understandable, considering that the laboratory treatment removed the husk more efficiently. Therefore, the mathematical reconstruction of from the laboratory data preserved all the reflective power from the husk. On the contrary, the husk residuals left in the flesh on industry-treated seeds, were missing in the mathematical reconstruction of , leading to a calculated lower reflectance.
In practice, the results may be interpreted as
(industry)
(laboratory) because the width of the skin removed by the industry process is effectively smaller than that removed by our laboratory approach.
Figure 5 shows the calculated absorption coefficient of the skin (from the numerical resolution of Equation (5)) and its assumed scattering coefficient (from the self-consistent guess of a and b in (6)).
On the left, note that the scale is logarithmic, showing the largely dominant effect of the pigments extending from 400 nm to nearly 800 nm. On the other hand, there is a NIR optical window of extremely low absorption in the range 800–1400 nm. This window is limited to the right by the water absorption peak at 1400 nm. The KM reconstruction of delivers high absorption in the NIR window for the industry-treated seeds. As before, the best explanation seems to be the fact that the husk tissue removed by the industrial treatment missed some reflecting residues. Therefore, the tissue effectively removed by the industrial treatment reflects less, that is, absorbs more, as calculated by the KM method.
It is well established in chemometrics that applying an absorbance transformation followed by spectral derivation can effectively eliminate multiplicative effects, particularly those related to baseline offsets and scattering [
35]. Specifically, the first derivative removes linear baselines, while the second derivative corrects for sloping baselines and further reduces multiplicative scattering effects [
26]. A key advantage of the second derivative is its ability to sharpen and highlight absorption features, producing well-defined dips at the location of absorption peaks, particularly in the case of isolated bands. This enhances the spectral resolution and facilitates more accurate interpretation.
Figure 6 illustrates the second derivative of the absorbance spectra of carob seeds after applying a Savitzky–Golay filter with a polynomial order of 4 and a window width of 25 points. In the visible region, the raw seed spectra exhibit prominent variations associated with pigment absorption. In contrast, the dehusked seeds display relatively flat curves, consistent with the removal of pigment-rich outer layers.
In the Vis/NIR and NIR regions, dehusked seeds, particularly those treated using the laboratory process, exhibit enhanced absorption features around 960 nm (second overtone of O-H vibration), 1200 nm (second overtone of C-H stretching), and 1400 nm (first overtone of O-H stretching). This enhancement is attributed to increased light penetration through the seed surface, allowing more light to interact with the underlying kernel. The greater absorbance observed in the lab-treated seeds supports the conclusion that their husk removal was more effective, providing less obstruction to light and improving access to the kernel.
Notably, the shape of these absorption features remains consistent across all treatments, suggesting that the chemical composition of the seed kernel remains unaltered by the dehusking process. However, a deviation from this trend is observed at 1700 nm. Here, no enhancement is detected, and all curves are nearly superimposed. A plausible explanation is that absorption at this wavelength, associated with the first overtone of C-H stretching in CH2 and CH3 groups, arises from compounds equally present in both the husk and the kernel, resulting in similar spectral responses regardless of dehusking process. An optical artifact could also explain the peak (second order diffraction effects, for example), but this has not been detected in other samples, for example, in orange peel. Overall, the second derivative analysis reinforces the interpretation that the laboratory dehusking method is more efficient at removing the husk, while preserving the chemical integrity of kernel’s polysaccharides.
The obtained spectra appear clearly distinguishable, both in the raw and derivative plots. In this respect, a PCA plot to discriminate the samples would seem redundant. However, the PCA loadings may reveal finer details about the spectral differences between the treatments, and this is the main motivation to proceed for a PCA.
Figure 7 shows the scores plot of the first two principal components obtained from the raw reflectance data (on the left) and from the first derivative of the absorbance data (on the right). It is worth noticing that the second derivative spectra were presented in
Figure 6 because they provide a more intuitive visualization of absorbance features, with peaks appearing as valleys at the same positions. However, for PCA, the first derivative was used since its loadings are less complex and therefore easier to interpret than those from second derivative spectra, which contain more peaks and valleys. Importantly, both approaches yield similar results in the PCA scores plot (
Figure 7), so the choice of first derivative data for PCA was made primarily for clarity of interpretation. In this regard, the
Supplementary Information contains the figures derived from the derivative processing that are not presented here, specifically the first derivative spectra, the PCA scores plot on the second derivative data and the PCA loadings plot from the second derivative data (
Figures S1–S3).
As expected, both absorbance and first order derivative spectra exhibit a distinct clustering of the samples. In both plots, raw seeds are clearly separated from dehusked seeds along the first principal component (PC1), which accounts for most of the data variance. The separation is clearer in the raw reflectance data. The dehusked seeds also form separate clusters, and their separation is clearer in the derivative data. Additionally, seeds dehusked by the industrial process show larger confidence ellipses, which means results less homogeneous than with the laboratory process.
Figure 8 shows the loadings plots of the first (PC1, top) and second (PC2, bottom) principal components for the PCA decomposition of the raw reflectance (left) and first derivative of absorbance (right) datasets.
The loadings for PC1 of raw data (top left) reveal a fundamental distinction along the dichotomy between the NIR plateau and the pigments in the visible. The dehusked seeds have smaller NIR plateaus and higher reflectances in the visible, which is the picture captured in the PC1 loadings plot.
The loadings for PC2 of raw data (bottom left) put again the weight on the visible range and a much smaller weight on the features around 1400 nm. Interestingly, PC1 attributes the highest loadings near the Chlorophyll blue peak, while PC2 attributes the highest loadings to the red peak.
The loadings associated with the derivative of the absorbance are more difficult to interpret, but essentially, they align with the main spectroscopic bands. The interpretation must be made in terms of first derivative peaks = wavelengths with more pronounced change. For example, for PC1, the raw seeds have the more negative loadings and the laboratory-treated the more positive. Therefore, the negative peaks (“dips”, or “valleys”) in PC1 mean more intense changes in the raw seeds and positive peaks mean more intense changes in laboratory-treated seeds. There are two narrow dips in the Chlorophyll bands meaning, as expected, large variations in the raw seeds’ spectra in that region. On the other hand, positive peaks at
,
,
and
mean stronger variations in these bands on the laboratory-treated spectra, which is also evident from
Figure 3.
PC1 describes a continuous transformation raw seed
industry
laboratory, but PC2 describes the difference between raw seed + laboratory, on one side, and industry treated seeds, on the other. The main source of differences is in negative loadings and in the band 950–1300 nm. This is also the core range for the spectral differences in
(
Figure 5). Globally, this suggests that the main difference in the chemical treatments between industry and laboratory treated seeds is to be found in this band.
As stressed in the introduction, the main objective of this work was to demonstrate the proof of principle of using DRS to assess dehusking efficiency. However, we recognize such a method is only relevant if accompanied by a plausible route to industrial application. One possible implementation would involve a spectrometer positioned above a conveyor belt carrying the seeds, acquiring their reflectance spectra in real time. From the data, the average dehusking efficiency could be estimated using a pre-calibrated model. As with all spectroscopic models, calibration requires a set of reference (“golden standard”) samples with known dehusking efficiencies, determined individually by microscopic and/or photographic methods that quantify residual seed coat material. The calibration model would then establish a relationship between spectral features and dehusking efficiency. This could rely on a simple linear approach, such as the ratio between reflectance of raw and dehusked seeds at selected wavelengths, or on more advanced multivariate methods using all or part of the spectra, for example, partial least squares (PLS).