Diagnosis by SAM Linked to Machine Vision Systems in Olive Pitting Machines

Gandul, Luis Villanueva; Madueño-Luna, Antonio; Madueño-Luna, José Miguel; López-Gordillo, Miguel Calixto; González-Ortega, Manuel Jesús

doi:10.3390/app15137395

Open AccessArticle

Diagnosis by SAM Linked to Machine Vision Systems in Olive Pitting Machines

by

Luis Villanueva Gandul

¹

,

Antonio Madueño-Luna

^1,*

,

José Miguel Madueño-Luna

²

,

Miguel Calixto López-Gordillo

² and

Manuel Jesús González-Ortega

¹

Department of Aerospace Engineering and Fluid Mechanics, University of Seville, 41013 Seville, Spain

²

Department of Graphic Engineering, University of Seville, 41013 Seville, Spain

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(13), 7395; https://doi.org/10.3390/app15137395

Submission received: 14 May 2025 / Revised: 17 June 2025 / Accepted: 24 June 2025 / Published: 1 July 2025

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Computer Vision (CV) has proven to be a powerful tool for automation in agri-food industrial processes, offering high-precision solutions tailored to specific working conditions. Recent advancements in Artificial Neural Networks (ANNs) have revolutionized CV applications, enabling systems to autonomously learn and optimize tasks. However, ANN-based approaches often require complex development and lengthy training periods, making their implementation a challenge. In this study, we explore the use of the Segment Anything Model (SAM), a pre-trained neural network developed by META AI in 2023, as an alternative for industrial segmentation tasks in the table olive (Olea europaea L.) processing industry. SAM’s ability to segment objects regardless of scene composition makes it a promising tool to improve the efficiency of olive pitting machines (DRRs). These machines, widely employed in industrial processing, frequently experience mechanical inefficiencies, including the “boat error,” which arises when olives are improperly oriented, leading to defective pitting and pit splinter contamination. Our approach integrates SAM into n CV workflow to diagnose and quantify boat errors without designing or training an additional task-specific ANN. By analyzing the segmented images, we can determine both the percentage of boat errors and the size distribution of olives during transport. The results validate SAM as a feasible option for industrial segmentation, offering a simpler and more accessible solution compared to traditional ANN-based methods. Moreover, our statistical analysis reveals that improper calibration—manifested as size deviations from the nominal value—does not significantly increase boat error rates. This finding supports the adoption of complementary CV technologies to enhance olive pitting efficiency. Future work could investigate real-time integration and the combination of CV with electromechanical correction systems to fully automate and optimize the pitting process.

Keywords:

table olive industry; olive pitting; SAM segmentation; industrial process optimization; DRR machines; boat error; Gordal olive; Hojiblanca olive; computer vision

1. Introduction

Computer Vision (CV) technologies have become indispensable in agri-food industry tasks, such as quality control, counting and grading, where camera hardware is paired with dedicated software pipelines [1,2,3,4]. The latest leap in performance has come from deep-learning, particularly Artificial Neural Networks (ANNs) [5,6,7,8,9,10]. Although ANNs deliver high accuracy and real-time throughput, they demand extensive datasets, careful annotation, hyper-parameter tuning and long training cycles [4,8,11,12].

Manual annotation can inadvertently introduce noisy labels that impair supervised learning. Recent literature describes two complementary mitigation strategies: (i) PSSCL, a progressive-sample-selection framework with contrastive loss that detects and prunes potentially mis-labelled instances [13]; and (ii) BPT-PLR, which partitions the data into balanced mini-batches and employs a pseudo-label-relaxed contrastive loss to dampen label noise during training [14]. Within our table olive workflow, this risk is virtually eliminated because SAM is used in zero-shot mode, obviating the need for a dedicated dataset or additional annotation; nevertheless, the aforementioned strategies would be valuable if domain-specific models are trained in the future.

A different paradigm emerged with the Segment Anything Model (SAM), released by Meta AI in 2023 [15,16]. The word Anything underscores SAM’s capacity to generalize to previously unseen objects and scenes, enabled by its billion-mask pre-training and prompt-agnostic interface. SAM was trained on >1 billion masks and can segment virtually any object after receiving only a simple prompt (point, box or text), without any additional fine-tuning. Its zero-shot capability offers an attractive shortcut for industrial environments where building task-specific models is costly. Even so, SAM has been explored mainly in remote sensing and precision agriculture [17,18,19,20], while its potential for inline food product inspection remains largely untested.

Table olive processing is an ideal test bed. Pitting, slicing and stuffing machines (DRRs) run at 2500–3000 olives min⁻¹ but still suffer from the so-called boat error: olives that arrive vertically oriented are pierced off-axis, producing pulp damage and pit splinters that jeopardize product safety [8,9]. CV can either (i) diagnose the root causes of these misorientations or (ii) close the loop with electromechanical actuators. In both cases, fast and reliable segmentation of each olive on the carrier chain is a prerequisite.

1.1. State-of-the-Art Segmentation Models for Industrial and Agri-Food Applications

Image segmentation under variable illumination, background clutter and occlusion is challenging [1]. Traditional algorithms (thresholding, active contours, region growing) have gradually been replaced by deep-learning approaches that are more robust to such variability [1].

Four families dominate current practice:

U-Net [12]—a symmetric encoder–decoder that yields accurate semantic masks even with medium-sized datasets but does not separate individual instances.
Mask R-CNN [10]—adds an instance-mask branch to Faster R-CNN, producing highly precise masks at the cost of heavy computation and dense supervision.
DeepLab v3+ [21]—leverages atrous (dilated) convolutions and ASPP to achieve state-of-the-art IoU, yet relies on deep backbones that increase latency.
SAM [16]—a foundation model trained on SA-1B; delivers zero-shot masks but uses a ViT-H encoder (~632 M parameters) that is computation-intensive. MobileSAM [22] shrinks SAM ≈ 60×, reaching ~10 ms image⁻¹.

Agri-food studies have begun to adapt SAM to crop imagery [17], employ it as an annotation accelerator [1], or benchmark it against task-specific CNNs [20]. UAV trials confirm MobileSAM’s ability to delineate fruits at video rate [19]. Table 1 summarizes the main trade-offs of these standalone models.

1.2. Hybrid Detector–Prompt Pipelines

To meet strict cycle-time budgets, recent studies couple a fast detector with a prompt-based segmenter. A one-stage detector (e.g., YOLOv8) first supplies bounding-box prompts; SAM (or MobileSAM) then refines pixel-level masks inside these regions, cutting SAM’s workload by one to two orders of magnitude [23]. Other combinations use Grounding DINO for open-vocabulary boxes [24], light objectness CNNs feeding MobileSAM on edge devices [25], or FastSAM paired with a proposal network [26]. Conversely, SAM can propose global masks that a tiny CNN filters [27]. Table 2 profiles the most representative hybrids.

1.3. Scope of the Present Work

The current study integrates SAM without any fine-tuning into an olive-pitting line. Images are captured by a machine-vision unit mounted above the carrier chain; SAM segments the olives, and a lightweight post-processing step selects the correct mask. After that, morphological/statistical analyses are performed to quantify the boat-error incidence and to examine whether off-caliber olives increase that error. By relying on a foundation model plus minimal code, the extensive data collection and training phase typical of bespoke CNN-based solutions have been eliminated, which demonstrates a practical, low-overhead path to 100% inline inspection.

2. Materials and Methods

2.1. Olive Varieties and Treatment

For this study, two olive varieties were selected due to their representative size ranges in current industrial processes: Gordal (Olea europaea regalis), a large-sized variety with calibers ranging from 60 to 140 olives/kg, and Hojiblanca (Olea europaea arolensis), which encompasses medium to small calibers (100 to 420 olives/kg). The Gordal variety exhibits a metallic black color, resulting from an oxidation treatment known as the California style, whereas Hojiblanca olives are pickled green, following the Seville style. Both treatments eliminate bitterness through the chemical hydrolysis of the glycoside oleuropein, with the key distinction being the presence or absence of oxygen during processing [28,29,30,31].

Specifically, Gordal olives undergo 2 to 5 baths in sodium hydroxide (NaOH) at a concentration of 1–4% (w/v), with an olive-to-solution ratio of 1:1, for 1 to 4 h, where intermediate washings are performed to remove excess alkali, completing the process within 24 h. In this treatment, oxygenation plays a crucial role in gradually darkening the olives, a process that becomes irreversible upon the addition of iron salts [29,30].

Conversely, Hojiblanca olives retain their early green color by avoiding oxidation. They are processed in an initial bath with a higher NaOH concentration (2–5%), followed by several washings that prepare them for a second treatment in sodium chloride (NaCl, 10–12%), with a final step that enhances their organoleptic and textural properties through lactic fermentation [28,29,30].

After undergoing these treatments, the olives are sorted by caliber before being introduced into a DRR machine, where they can be pitted, sliced, or stuffed. However, in this study, only the pitting process is performed.

2.2. MVS and DRRs

The experiment was conducted at an olive pitting and packaging factory near Seville (Morón de la Frontera, Spain), where two CV systems [8,9] were installed in parallel on different DRR machines, as it illustrates Figure 1. These systems were placed at a critical control point in the olive carrier chain, near the pitting area, where it is possible to detect the orientation in which the olives will be pitted [8,9]. Previous studies have demonstrated that olive orientation is a key factor in predicting the occurrence of the boat error, which occurs when the awl meets the olive’s minor axis in a parallel alignment [8].

The CV system was installed on two different DRR pitting machines: a Sadrym Model 130 (Morón de la Frontera, Spain) short-arm pitting machine with a ¾-inch-wide feed chain, calibrated for Gordal olives with a nominal size of 80 olives/kg, and an OFM Model PSL51 (Morón de la Frontera, Spain) pitting machine [32], adjusted for Hojiblanca olives with a nominal size of 300 olives/kg.

The installation integrates a PVC adapter onto the carrier chain of each DRR machine, incorporating an optical sensor (camera) and a LED ring light positioned above for image acquisition (Figure 2). These components are triggered by an electronic pulse train generated by a magnetic sensor installed at track level, which detects the passage of each olive.

The magnetic variation is stimulated by the movement of the transport buckets at a distance of 8 mm [9]. An electronic control unit coordinates the operation of all devices and manages data transmission, operating at 12V from a switched power supply (Figure 3) [9].

The illumination system is arranged in a ring configuration to ensure uniform lighting, minimizing shadows, with a luminous flux of 900 lumens and a color temperature of approximately 3000 K. Both the lighting system and the camera operate at the processing speed of the machines (800–2500 olives/min), maintaining an operational threshold below their maximum capacity to prevent system failures or overheating. For instance, the lighting system functions at only 8–10% of its full capacity.

Both installed CV systems differ only in the camera used for image acquisition, which were selected based on their ability to operate at high speed and their compact size. Additionally, either color or grayscale mode was chosen, resulting in two challenging case studies characterized by the absence of color contrast or low resolution. Oxidized black Gordal olives on a gray background are captured in color using a DFK 33GV024 camera (The Imaging Source Europe GmbH, Bremen, Germany) [33], equipped with a 1/3″ MT9V024 CMOS sensor from ON Semiconductor Corporation, Phoenix, AZ, USA, capable of up to 100 fps, producing images with a resolution of 320 × 240 pixels. The Hojiblanca case is handled with an SPS02 smart camera from Toshiba Teli Corporation, Hino, Tokyo, Japan [34], featuring a ½″ CMOS sensor operating in grayscale mode with a depth of 256 levels (1 byte), generating images of 176 × 144 pixels. All images captured by both CV systems are automatically sent to a conventional PC running Windows 10.

Figure 4 presents examples of image capture for each variety during the pitting process.

2.3. Computer Vision Softwares

2.3.1. Industrial Applications

This section describes the implementation of an industrial image processing system designed for real-time monitoring and analysis of olives during the pitting process. The system utilizes Qt-Creator 4.2.0 (Community) and OpenCV 3.4.3 for color sampling of oxidized black Gordal olives, while a C++-based application is used for B/W image acquisition from the SPS02-Teli Toshiba camera from Toshiba Teli Corporation, Hino, Tokyo, Japan, which captures pickled Hojiblanca olives in grayscale.

The examples shown in Figure 4 represent a broader set of raw images for each olive variety, captured continuously by each camera (digital samplings). All these images can be monitored in real time from the PC using industrial-grade image-processing software based on OpenCV 4.5.0 (release 17 November 2020; available online: https://opencv.org/, accessed on 6 May 2025), devel-oped in C++ within the Qt Creator 4.14 IDE (Qt 5.15.2 runtime; The Qt Company, Espoo, Finland, release December 2020) [35,36]. The application also allows the region of interest (ROI) or scale to be adjusted, the color mode to be selected, and image series to be stored on the hard drive for later inspection.

For this study, two subfolders containing captured images were downloaded for offline analysis: 1638 color photographs of Gordal olives and 10,151 grayscale images of Hojiblanca olives. However, an additional filtering process was applied to focus the study exclusively on individual olives in good condition (Figure 5), as size and orientation are the key parameters of interest.

This situation is justified by an additional discarding operation applied to the Gordal variety, which particularly affects orientations. Due to the large size of the olive relative to the ROI, misoriented olives are frequently cropped in the images. However, this does not impact the correlation results, as orientations can still be determined based on the superimposed ellipse, along with the rest of the parameters. This results in two initial digital samplings for analysis:

-: 1305 color images of oxidized black Gordal olives, nominal size 80, resolution 320 × 240 pixels in .png format.
-: 9150 B/W images of Hojiblanca olives in Sevillian style, nominal size 300, resolution 176 × 144 pixels in .bmp format.

Both samples are representative of an infinite population, such as the one processed throughout the operational lifespan of the pitting machine, with error rates below even 5%.

2.3.2. Self-Developed Software to Process Black Gordal’s Colored Images or Green Hojiblanca’s B/W Images

To integrate SAM and execute the diagnostic process, a dedicated application was developed in Google Colaboratory (Python 3.10.12 runtime; Colab build 2025-05-06; available online: https://colab.research.google.com/, accessed on 6 May 2025) together with MATLAB R2022b (version 9.13.0 Update 4, build 2166757; MathWorks, Natick, MA, USA) [37]. The application consists of a set of five scripts following a structured workflow based on fundamental operations (Figure 6).

Segmentation.

The script p15.txt, developed in Google Colab, integrates SAM (Segment Anything Model) by META AI to perform zero-shot image segmentation (without additional training) using the “Everything” or unsupervised mode [16] (Figure 6a). This script automatically processes the sample folders gordal_olives (.png) and BN_olives (.bmp) and connects via the web to the available SAM platform, executing segmentation without specific prompts. For each image, SAM generates a set of binary masks representing the detected elements (Figure 6b), where the detected object appears in white and the background in black. The masks are automatically stored in output subfolders generated by the script itself for later selection. Both the color images of Gordal and the grayscale images of Hojiblanca were subjected to SAM segmentation without any preprocessing, such as region of interest (ROI) adjustments or RGB channel separation.

Mask selection.

Using the “Everything” mode in SAM requires post-processing operations. For this purpose, the selector_BN.m script was developed in Matlab to select the correct olive mask by binarization (Figure 6c) within the coordinates where the olive is located on the transport chain (Example for Hojiblanca: x = 140, y = 79). This operation ensures that only the binarized olive mask is retained within the segmented image folders, reducing the dataset to images containing a single binarized olive while simultaneously discarding other cases from the processed folder (pixel values different from grayscale level = 255).

Mask corrections.

The segmentation process performed by SAM occasionally presents two types of anomalies, which are corrected using the blob_and_filling.m script developed in Matlab. These errors include background noise, which must be removed, and gaps inside the olive blob, which need to be filled (Figure 6d). The script scans each folder resulting from the previous segmentation and selection process and, if necessary, converts the image to grayscale. It then identifies the largest blob and applies a filtering process to eliminate background noise or fill voids within the selected blob (Figure 6d). These operations are similar to erosion or filling techniques. The final result is a single binarized segmentation mask per olive image captured during the pitting process (Figure 6e). The processed images are stored in two output subfolders, one for each variety, as the final segmentation output.

Morphological analysis.

The ellipse_3.ipynb script was developed to perform a morphological analysis of the olives, extracting key parameters, such as size, shape and orientation (Figure 6f). This Google Colab application processes the segmented images and first analyzes pixel values in a binary space to detect the olive’s contour (pixel values: black = 0, white = 255). Due to the symmetry and regularity of the olives, they can be represented as an ellipse, which is fitted to the contour using the cv2.findContours/cv2.fitEllipse functions from the OpenCV library (Figure 6f).

The superimposed ellipse provides an alternative method for measuring the area by counting white pixels. Additionally, the script locates the geometric center, detects the endpoints of the axes, and overlays the major axis (red) and minor axis (green). This enables the determination of the minor axis orientation relative to the horizontal axis, set from the ellipse’s coordinate center. The output includes values for area, angle, major axis length, minor axis length and the axis ratio for each binarized image from the pitting process. All extracted data are stored in a “report.txt” file for each variety for subsequent statistical analysis (Figure 6g), while images with overlaid ellipses and angles are saved in a default folder as proof of the processing results.

Statistical analysis.

The generated data were analyzed statistically using the histogram.m script in Matlab (Figure 6g). This script imports the morphological parameters stored in the report_hojiblanca.txt and report_black.txt files and generates:

-: A histogram (100-bin) for total size (area) values;
-: A histogram (100-bin) for orientation (angle) distribution.

These histograms, generated using Matlab functions, can be graphically downloaded for analysis upon completion. Additionally, the normal distribution curve was calculated from the mean and standard deviation values.

A final script, correlation_matrix.py, generates a correlation matrix from the report.txt data using the correlation_matrix function, which can be exported as an .xlsx file. This step analyzes the correlations between extracted parameters, such as area, angle, axis lengths and axis ratio, with a particular focus on the correlation between angle and other parameters to evaluate the initial hypothesis.

3. Results

3.1. Performance of SAM in the Google Colab (SAM) + Matlab Application

The unsupervised zero-shot segmentation using SAM enabled the generation of a set of binarized masks with well-defined contours, where at least one of the generated masks corresponded to the actual olive (Figure 7). For this operation, a GPU rental version was chosen over the demo option, considering the high computational cost associated with processing the large number of images in the series and the characteristics of the processing equipment (a conventional PC). This rental option, available on the Google Colaboratory (Google Colab), offers greater capacity and speed compared to the free version. As a result, the total execution time was approximately six hours for processing 9150 Hojiblanca images, as an example.

By selecting the unsupervised mode (“Everything”), it was necessary to complement the segmentation process with post-segmentation filtering using the Selector_BN.m script. This step was implemented to reduce the multiplicity of images in the automatic output folder generated by SAM. After execution, 98 duplicated mask cases (1.1%) were identified, requiring manual correction (examples: Figure 8, mask_0_8.bmp and mask_9_8.bmp, the latter containing an external pixel).

Some of the binarized images resulting from the segmentation process exhibited anomalies (Figure 9), including background noise or voids within the olive mask. These were filtered using the blob_and_filling.m script, ultimately ensuring optimal segmentation in 100% of the cases.

At this stage, 1305 binarized images of oxidized black Gordal olives segmented from their metallic background (Figure 10a) with dimensions of 320 × 240 pixels are available. Under the same conditions, there are 9150 binarized masks of the green Hojiblanca variety (Figure 10b) with dimensions of 144 × 176 pixels. These are stored in two separate folders corresponding to those downloaded directly from the CV system.

3.2. Diagnosis of the Pitting Operation

Using the segmented images, data were extracted on the area (pixels²), as well as the values for the major axis length (pixels), major axis orientation (angle with respect to the 0–180° axis), minor axis length (pixels) and the ratio between both axes for each olive processed by the machinery during the pitting operation, using the histogram.m script.

The application generates, on one hand, an accumulated size distribution histogram for each processed variety (Figure 11 left and Figure 12 left), where the x-axis represents the distribution of detected sizes in ascending order, grouped into 100-bin segments, while the y-axis shows their cumulative frequency. A normal distribution curve, calculated from the mean and variance values, is superimposed on this histogram.

Simultaneously, a histogram was generated to analyze the total distribution of major axis orientations for all olives (Figure 11 right and Figure 12 right). In this case, the detected angles are grouped into 100-bin segments along the x-axis, while the y-axis represents their frequency.

Gordal exhibits an average size of 24,154 pixels (<25,500 pixels, caliber 80) and a variance of 5,783,792 pixels². The mean value differs from the nominal caliber, and the variance presents a margin of up to 2–3 times compared to the nominal caliber, with irregularities observed between the histogram and the normal distribution curve (Figure 11 left). These observations are made under a strict criterion, assuming the ideal scenario where all olives entering the machine match the nominal caliber, in this case, 80. The boat error can be evaluated within the [80, 100] segments of the angle histogram [8] (Figure 11 right). A total of 14 cases (1.1%) were identified with olives oriented within this range, placing this error below the 2% threshold currently accepted for pitting operations.

Hojiblanca (Figure 12), with a sample size 6–7 times larger than Gordal, shows a mean of 3508 pixels and a variance of 68,350 pixels²; its size distribution appears approximately Gaussian (Figure 12, left), although subsequent tests reveal departures from strict normality. These results indicate that the calibration process prior to system entry was more precisely controlled. The boat error for this variety is measured at 1.6%, remaining below the 2% threshold.

3.2.1. Normality Testing of Olive Size Distributions

To support the use of Gaussian fits in Figure 11 and Figure 12, we assessed the normality of the area (pixel) distributions obtained for both varieties. Four complementary goodness-of-fit tests were applied in MATLAB^® 2022b (Table 3):

Shapiro–Wilk (SW)—most powerful for n ≤ 5000. For the larger Hojiblanca set (n = 9150), a simple random sample of 5000 values was evaluated, as recommended in ISO 13528 [38] for very large datasets.
D’Agostino–Pearson omnibus K² (DP)—combines skewness and kurtosis; implemented via a helper function (dagostinoK2).
Anderson–Darling (AD)—built-in adtest, sensitive to deviations in the tails.
Jarque–Bera (JB)—built-in jbtest; quick skewness–kurtosis check.

Skewness (sk) and Pearson kurtosis (ku, normal = 3) were also reported to quantify shape.

3.2.2. Interpretation

The statistical evidence confirms that the Gordal size distribution is fully compatible with a normal law: all four tests return p > 0.05, and both skewness (–0.11) and kurtosis (2.95) remain practically Gaussian. For Hojiblanca, the very large sample (n ≈ 9000) makes the normality tests highly sensitive; thus, Shapiro–Wilk, D’Agostino–Pearson, Anderson–Darling and Jarque–Bera all yield p < 0.01. Even so, the deviation is mild (skewness = 0.06; kurtosis = 3.79), producing only a slight right-tail elongation. For routine caliber monitoring we therefore retain the mean and standard deviation as summary indicators—visualized by the bell-shaped curve in Figure 12—while acknowledging that a non-parametric control chart could be adopted in future work should stricter adherence to normality become necessary.

3.3. Validation of SAM Segmentation Accuracy

It has been implemented as a two-tier validation protocol.

Area–level agreement provides a rapid assessment of systematic size bias.
Pixel-wise agreement offers a stringent evaluation of spatial overlap, quantified by intersection-over-union (IoU), Dice coefficient, precision and recall.

Both tiers are applied separately to each cultivar. The Gordal dataset (n ≈ 1608 masks) will be evaluated with the same pipeline, whereas the Hojiblanca dataset has already been analyzed (n = 9150 masks).

3.3.1. Metrics and Computation Workflow

1.: Ground-truth masks (manual) were extracted from the red-outlined images in contornos_Gordal and contornos_Hojiblanca by thresholding the white interior (R, G, B > 200).
2.: SAM masks were those stored in aceitunas_Gordal_mascara_relleno and aceitunas_Hojiblanca_mascara_relleno.
3.: Masks were mutually trimmed to the common FoV to prevent size mismatches.
4.: For each pair we recorded:

Two binary masks are analyzed: the SAM prediction M_SAM and the ground-truth annotation M_GT. Let the logical operators ∧ (AND) and ¬ (NOT) act pixel-wise, and let ∑ denote the sum over all pixels in the common field-of-view. We define

$T P = \sum (M_{S A M} \land M_{G T})$ —true-positive pixels (olive correctly identified).
$F P = \sum (M_{S A M} \land {\neg M}_{G T})$ —false positives (background labelled as olive).
$F N = \sum ({\neg M}_{S A M} \land M_{G T})$ —false negatives (olive pixels missed by SAM).

From these counts, we derive the standard overlap and detection metrics:

I o U = \frac{T P}{(T P + F P + F N)};

D i c e = \frac{2 \cdot T P}{(2 \cdot T P + F P + F N)};

P r e c i s i o n = \frac{T P}{(T P + F P)};

R e c a l l = \frac{T P}{(T P + F N)} .

IoU (intersection-over-union) and Dice quantify spatial overlap, whereas precision and recall disentangle over- from under-segmentation.

5.: Area metrics (MAE, MAPE, RMSE, Pearson r, $R_{y = x}^{2}$ ) compared the pixel counts of each mask.
6.: The analysis was performed with two MATLAB scripts: pixel_metrics_SAM_vs_manual.m and valida_area_SAM.m.

3.3.2. Results for Gordal

The Gordal set shows very high pixel-wise concordance between SAM and the manual ground truth. The mean IoU is 0.963 ± 0.003, and Dice is 0.981 ± 0.002, with recall = 1.00 (no false negatives) and precision mirroring IoU (≈3% over-segmentation). Area-level metrics confirm the good agreement (MAPE = 3.9% and RMSE ≈ 912 px²), while the Pearson correlation of areas is virtually perfect (r > 0.98). The coefficient of determination against the identity line,

R_{y = x}^{2}

= 0.88, indicates a small but systematic positive bias (≈4%), much lower than for Hojiblanca (Table 4).

3.3.3. Results for Hojiblanca

The Hojiblanca subset (n = 9150 pairs) also exhibits strong agreement between SAM and the manual ground truth, although its performance is slightly below that of the larger-bodied Gordal olives. The average IoU is 0.904 ± 0.004, and Dice is 0.950 ± 0.002; recall is again 1.00 (no false negatives), while precision equals IoU, indicating an over-segmentation of ≈9% concentrated at the two-pixel halo.

At the area level, the error remains modest (MAPE ≈ 7.6%, RMSE ≈ 247 px²), and the correlation of areas is virtually perfect (r >≈ 0.98).

The coefficient of determination with respect to the identity line,

R_{y = x}^{2}

= 0.04, reveals a uniform positive bias of ~7%; this bias is larger than in Gordal because the fixed two-pixel rim represents a greater fraction of the smaller Hojiblanca fruit (Table 5).

The very high Pearson coefficient is expected because size variability among Hojiblanca olives (σ ≈ 1000 px²) dwarfs the systematic 7% bias. The low

R_{y = x}^{2}

simply reflects that uniform bias.

3.3.4. Gordal—Ellipse-Fitting Error Analysis (Automatic vs. Manual Ground Truth)

The Gordal set was evaluated by confronting each manual mask with the ellipse automatically traced in OpenCV (Table 6).

Five complementary metrics were computed: intersection-over-union of the ellipse (IoU_ellipse), absolute errors of the semi-axes (MAE_a, MAE_b), aspect-ratio bias (MAE_AR) and angular root-mean-square error (RMSE_θ°). The results demonstrate an almost perfect match between the automatic ellipse and the manual ground truth.

3.3.5. Hojiblanca—Ellipse-Fitting Error Analysis (Automatic vs. Manual Ground Truth)

The results for the Hojiblanca variety are presented in Table 7.

The automatically generated ellipse reproduces the manual geometry almost perfectly: sub-pixel errors on both semi-axes (≤1 px) and an IoU of approximately 0.95 confirm the accuracy of the fitting procedure.

3.3.6. Post-Processing Bias Analysis and Quantitative Impact on Segmentation Accuracy

After the raw SAM output is produced, three deterministic phases are applied: (i) mask selector, which keeps the single mask that overlaps the known olive position in the pitting cup; (ii) hole filling and speck removal, which turns every black pixel inside the olive white and deletes isolated white pixels outside the edge; and (iii) ellipse fitting, which overlays an OpenCV ellipse and records its axes and orientation. No morphological eroded or dilated operations are used. Table 8 summarizes the potential biases of each step and quantifies them with the metrics already reported for Gordal and Hojiblanca.

Overall, these post-processing operations shift the segmented area by no more than 10% and decrease IoU by less than 0.02, falling well within the tolerances required for the application.

3.3.7. Comparative Benchmark with an HSV-Based OpenCV Baseline Under Ideal Lighting

Deep-learning segmenters, such as UNet, DeepLab v3+ or Mask R-CNN, are deliberately excluded from the quantitative benchmark because their deployment would negate the main advantage of the present workflow—namely operating without any hand-labelled data. Training a lightweight UNet on Gordal olives alone would require at least 500 pixel-accurate masks (≈8 h of expert annotation) and ≈45 min of GPU fine-tuning; a larger backbone (ResNet-101 + DeepLab) pushes that effort beyond 2000 masks and several GPU-hours. In contrast, zero-shot SAM achieves ≥ 0.90 IoU across all lighting conditions with zero annotation and zero training. We therefore restrict the head-to-head experiment to the HSV-based OpenCV baseline proposed by Gandul et al. (2025) [39], as it is the only alternative that, like SAM, can be deployed immediately without a dedicated training set (see Table 9).

3.3.8. Analysis

Under the exact lighting conditions assumed by the classical OpenCV pipeline—diffuse white LED illumination and a neutral background—the HSV-based approach achieves virtually perfect segmentation (IoU ≈ 0.99) at real-time CPU speed. Zero-shot SAM trails by <2 pp in IoU and Dice yet sustains a recall of 1.00 and requires neither parameter tuning nor hand-labelled data. Because the HSV thresholds break down as soon as illumination drifts or when grayscale Hojiblanca frames are processed, we use it solely as an ideal baseline. SAM, in contrast, offers the best accuracy-to-engineering-effort ratio for inline table olive inspection, maintaining ≥ 0.90 IoU across the full, unconstrained dataset without any additional training.

3.4. Hypothesis of Correlation Between Boat Error and Size

Finally, a correlation matrix between parameters was obtained using the correlation_matrix.py script (Table 10 and Table 11). This matrix represents the area (A), the olive orientation (a), the major axis length (X), the minor axis length (x) and the axis ratio (X/x). To assess the correlation between orientation and the morphological aspects of size and shape, the row corresponding to parameter “a” should be compared against the others (A, X, x and X/x). Values close to 0 indicate low correlation, whereas values near ±1 suggest a strong direct or inverse correlation.

In the case of Gordal (Table 10), all values are close to ‘0’, with correlation below 12% for the area and between 5 and 20% for the axis parameters. The Hojiblanca variety (Table 11), following the same analysis procedure, shows even lower correlation values, below 5% for the area and between 2 and 9% for the axis-related parameters.

4. Discussion

The zero-shot application of the SAM model to an image analysis process has enabled the segmentation of olive images into masks, captured directly from the carrier chain during an industrial pitting operation, bypassing the complexity of developing a task-specific ANN [8,10,12] and stages commonly dependent on algorithmic development [4,8,10,40] or manual interventions [4,40], thereby addressing a significant bottleneck. Furthermore, the segmentation results demonstrated good definition, evaluated based on the contours of the olive relative to the background in images that, due to the absence of color or low resolution, frequently encounter identification challenges in industrial analysis operations based on OpenCV. In these cases, SAM has performed well, even without special attention to scene conditions, such as lighting, background or framing. These operations are currently entrusted to more advanced and precise technologies that offer advantages, such as self-learning architectures, including CNN or U-Net [8,9,10,12,40]. However, these approaches incur high design costs and long processing times and often require several iterations before a satisfactory solution is obtained. Furthermore, they are typically tied to a single case study: when the product changes or the system is installed on a different machine, a new dataset must be collected and annotated, and the model must be redesigned, retrained and re-optimized.

Although SAM itself is a large vision-transformer neural network, it is released as a frozen foundation model; therefore, no additional architecture design, training or tuning is required for our task. SAM thus removes the burden of developing a task-specific ANN while still leveraging the power of deep neural networks, providing full adaptability to diverse segmentation scenarios.

Recent work in industrial vision corroborates this trend. Purohit and Dave [41] present a comparative study—“Leveraging Deep Learning Techniques to Obtain Efficacious Segmentation Results”—that benchmarks classical CNNs (U-Net, DeepLab v3+, Mask R-CNN) against pretrained transformer models in low-contrast settings. They report that foundation-style models retain high accuracy without case-specific retraining, precisely the advantage observed here with SAM. By adopting SAM as a frozen backbone, our pipeline achieves sub-pixel geometric errors and an IoU of ≈0.95 while avoiding the extensive data collection and network redesign normally required in industrial automation.

Assuming a Cartesian reference frame XYZ, the working plane of the pitting machine corresponds to the XY plane, and the conveyor carries the olives toward the pitting station along the Y-axis:

Normal position (long axis ≈ X). While advancing, the olives roll about their own longitudinal axis, which facilitates alignment.
Boat position (long axis ≈ Y). The fruit lies broad-side, does not roll and accounts for roughly 1–2% of the flow.
Pivoted position (long axis ≈ Z). The fruit stands upright on one pole; this orientation is very rare (fewer than 1 in 1000 olives).

The levelling brush mounted above the feed chain effectively corrects pivoted olives by applying a torque that forces them to roll and return to the normal position, but it is ineffective for olives travelling in boat position. Consequently, the brush alone cannot guarantee proper alignment. The machine-vision system presented in this study monitors the orientation of every olive in real time and records any fruit that reaches the pitting knives in a misaligned state, providing objective feedback for line adjustment or, if desired, for triggering external rejection mechanisms.

In addition, the raw camera images were processed without any pre-processing, such as ROI adjustment, scale normalization, RGB-channel separation or noise filtering. The study deliberately ran SAM in its “everything” mode [16] instead of prompting the actual olive with points or bounding boxes; this choice necessitated only minor post-processing of the masks. Such post-processing is not a limitation of SAM itself but a corrective step for occasional noise caused by highlights and shadows in high-illumination, color-poor scenes. While some reports [16,18,20] have questioned SAM’s utility in specialized industrial tasks without supervision, the well-defined contours obtained for olives demonstrate that the model remains highly effective and can be refined with lightweight post-processing—even at the cost of a slightly longer computation time.

Processing time remains the primary limitation for executing the model, as its high computational cost currently prevents real-time operation at rates of 40–50 images per second, restricting its use to offline processing. Nevertheless, the demo version of SAM remains open to critique and potential improvements [16,20] in both classification and segmentation, as well as processing time, and future specialized versions may emerge for industrial applications. Enhancements to the model itself, combined with the next generation of quantum computing, may eventually enable the use of SAM beyond what has been demonstrated in this experiment. In the meantime, a hybrid strategy that integrates SAM with lightweight models, such as a traditional classifier or a small ANN, could be a viable alternative.

The corrected segmented images allowed for a morphological analysis by extracting representative data on the size and shape of the processed olives. In this case, a method based on contour fitting with an ellipse was chosen, enabling straightforward measurement of parameters, such as area, axis length and orientation. The statistical data generated provides precise insights into the total size distribution and the orientations of each olive before pitting. It is well established that an orientation between 30° and 150° [8] is a determining factor in whether the olive will be properly pitted, thereby validating this approach.

Currently, the boat error is evaluated occasionally to adjust machinery settings, maintaining a minimal loss threshold of 1–2%. This is achieved by adjusting the density of a brine tank and assessing the percentage of freshly pitted olives that sink due to their weight, as they often contain whole pits or fragments. However, this method is not fully effective, requiring an additional expert assessment to evaluate the percentage of olives that may have floated due to trapped air, allowing them to reach the consumer. The alter-native proposed in this study enables an exact measurement of this error, eliminating the randomness of the flotation and sensory evaluation phases. Moreover, it can be applied to any process without the need for evaluation of personnel or additional equipment, pre-venting product loss and providing a cost-effective solution.

Additionally, assessing the range of processed sizes (Figure 11 left and Figure 12 left) represents a novel contribution, as size control is often overlooked, relying solely on effective calibration before the olives enter the DRR machine. However, this study demonstrates that such control is not always rigorously enforced, at least not to the level required by these systems, whose efficiency is maximized through precise selection of pieces and mechanical couplings [8].

The generation of size and orientation data has made it possible to evaluate whether this factor influences the increase in boat error due to a direct correlation between parameters (Table 10 and Table 11), yielding a negative result. In other words, despite the need for precise adjustments, these systems do not experience a decline in performance when processing sizes that deviate from the nominal value. This finding supports the necessity of implementing CV systems in these machines to achieve full efficiency in the short term and remove the speed limitation imposed to increase production.

4.1. Future Work

The following lines of research are planned to enable full real-time and closed-loop deployment of the system:

Operator feedback and active ejection: The current vision module can already display, in real time, the share of olives travelling in boat or pivoted positions. Showing these metrics on the HMI enables operators to fine-tune brush height or conveyor speed until the misalignment rate is minimized, thereby closing a manual feedback loop. A lightweight, quantized version of SAM (≥50 fps) could also drive an array of air nozzles located upstream of the pitting knives: any mis-oriented olive would be blown back onto the in-feed conveyor for a second pass, further reducing the fraction of off-axis fruit without interrupting line throughput.

Digital-twin integration: The same orientation data can be streamed to a digital-twin layer that supervises the pitting line and automatically adjusts variables, such as brush height or belt speed. A pertinent precedent is the framework developed for an electrical submersible pump by Don et al. [42], where continuous sensor streams are synchronized with an electromechanical model to enable predictive control. Adapting this architecture to a table olive pitter would allow the image-based orientation signals reported here to update PLC set-points in real time, closing the feedback loop and enabling data-driven optimization.

Real-time hybrid architectures: The current pipeline validates SAM offline, yet factory floor deployment demands inference latencies below 20 ms per frame. Two complementary strategies will therefore be pursued:

(i): Cascaded SAM → lightweight model: Full-size SAM will be executed once on ~5 k representative frames to generate high-quality masks that are then used to distil a compact network (e.g., a MobileNet-V3 encoder with a three-level FPN decoder). A Dice + IoU distillation loss will transfer SAM’s spatial priors, while TensorRT INT8 quantization is expected to reduce model size to <12 MB and runtime to ≈8 ms on an NVIDIA Jetson Orin-NX.
(ii): FastSAM/MobileSAM variants: Recent works, such as FastSAM [43] and MobileSAM [22], replace the heavy ViT encoder with a YOLO-N segmentation head or a ShuffleViT backbone, achieving 50–120 fps on a laptop GPU. Both variants will be benchmarked on olive images, fine-tuning only the prompt encoder (points + box) to keep the annotation cost negligible.

A hybrid schedule—FastSAM on every frame, full SAM every 30 s for self-supervised drift correction—will be evaluated to maintain IoU ≥ 0.94 while meeting the 50 fps throughput of the pitting line.

4.2. Cyber-Security Considerations

The prototype is currently deployed on an air-gapped local network inside the factory, isolating the pitting line from the corporate backbone and thus shielding production data. Remote connectivity is enabled only intermittently via a 4 G router for online diagnostics. Even under this restricted topology, an IoT-enabled version must address the well-known vulnerabilities of industrial edge devices—unsecured brokers, weak TLS, device spoofing and adversarial inputs to the vision model. Aryavalli and Kumar’s survey “Safeguarding Tomorrow: Strengthening IoT-Enhanced Immersive Research Spaces with State-of-the-Art Cybersecurity” [44] reviews these threats and recommends a layered defense: device attestation, zero-trust segmentation, encrypted OPC UA or MQTT transport, secure-boot on Jetson/Edge-TPU boards, and runtime monitoring for anomalous inference patterns. These counter-measures will be adopted in the next project phase so that feedback signals, digital-twin commands and potential air-jet actuations remain authenticated and tamper-proof, even when 4 G connectivity is temporarily enabled.

4.3. Beyond Incremental Change

Classical table olive graders rely on bespoke CNNs or hand-tuned morphology that must be retrained, re-lit and re-calibrated whenever the cultivar, camera or conveyor geometry changes. In contrast, the pipeline proposed here leverages a frozen foundation model (SAM) that transfers zero-shot to two morphologically distinct cultivars (Gordal vs. Hojiblanca) without additional labels, delivers sub-pixel geometric accuracy (IoU ≥ 0.94) under harsh specular lighting and integrates seamlessly with deterministic post-processing to recover millimetric morpho-metrics (area, ellipse, major axis orientation). The ability to generalize across cultivars, cup designs and illumination regimes—while maintaining real-time throughput—positions SAM not as an incremental replacement for U-Net or Mask R-CNN but as a paradigm shift: segmentation moves from task-specific model engineering to prompt-based configuration. This decouples algorithm development from plant maintenance, lowers the entry barrier for small processors and creates a reusable foundation for digital-twin feedback loops and predictive control across the broader agri-food sector.

5. Conclusions

The use of the SAM model by META AI helps mitigate the segmentation bottleneck in industrial visual inspection operations, such as in the olive pitting process, by providing an ANN without the need for its development. In this specific case, SAM has demonstrated its ability to adapt not only without pre-training but also without re-training when applied to two types of images that are challenging for conventional OpenCV-based systems: black olives against a metallic gray background and grayscale images of Hojiblanca olives. The model stands out for its high definition and versatility. Additionally, these images were processed without any preprocessing, exactly as they were captured by the camera. The segmentation errors were minimal and could be easily corrected using complementary Matlab scripts, positioning this demo version as a viable, cost-effective and hardware-efficient alternative for certain industrial applications. Future improvements in both the model and hardware could make it a highly effective solution by providing fundamental image analysis operations, such as segmentation, pre-solved in advance, making it particularly valuable if it could operate at machine speed (40–50 images per second).

The segmentation of olive images immediately before pitting, using this simple, alternative and optimized method, enables the extraction of size and orientation data for each olive, facilitating a more rigorous diagnostic approach compared to the conventional method based on the flotation tank and expert assessment. This method can be applied in every pitting operation, recognizing parameters, such as area, axis length and orientation. Additionally, the generated data have, for the first time, identified the absence of correlation between miscalibration and boat error, demonstrating that DRR machines do not experience a decline in performance when processing sizes different from the nominal caliber (for which the machine was adjusted), despite their precise calibration requirements [8]. This finding supports the integration of CV systems alongside other automated solutions as a strategy to achieve full efficiency (100%) in the short term while eliminating the imposed speed limitation, preventing an increase in error and enhancing the productivity of these mechanical systems.

This initiative represents an optimized alternative method of interest for modernizing the table olive industry. However, given the general-purpose nature of SAM, it is also fully applicable to other automated processing operations in the agri-food sector, being accessible remotely at any time via the internet.

Author Contributions

Conceptualization, A.M.-L.; methodology, A.M.-L. and L.V.G.; software, A.M.-L. and L.V.G.; validation, A.M.-L., L.V.G. and J.M.M.-L.; formal analysis, A.M.-L. and L.V.G.; investigation, L.V.G. and M.C.L.-G.; data curation, L.V.G.; resources, J.M.M.-L., M.C.L.-G. and M.J.G.-O.; writing—original draft preparation, L.V.G., A.M.-L. and M.J.G.-O.; writing—review and editing, L.V.G., A.M.-L. and M.J.G.-O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data that support the findings of this study are openly available in Dropbox at https://www.dropbox.com/scl/fo/yzemoxh60yvolipkqxz4c/AFplU8x36kgFnsAN4YpCOKA?rlkey=qxj3g89tf2176j44deqw9yqcc&e=1&dl=0 (Version 1, accessed on 14 May 2025). The repository contains Raw image datasets: 1608 color PNG images of Gordal olives (320 × 240 px) and 9150 greyscale BMP images of Hojiblanca olives (176 × 144 px); Intermediate products: SAM segmentation masks, post-processed masks, ellipse overlays and validation contours; Source code: MATLAB^® scripts and Python/Colab notebooks for SAM inference and statistical analysis; Result files: TXT reports.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CV	Computer Vision
ANN	Artificial Neural Network
SAM	Segment Anything Model
DRR	Destoning, Stuffing and Slicing (machines for olive processing)
ROI	Region of Interest
PC	Personal Computer
LS	LED Ring for Illumination
MS	Magnetic Sensor
TE	Trigger Electronics
Cam	Optical Sensor (Camera)
Qt	Qt-Creator (Integrated Development Environment for C++)
NaOH	Sodium Hydroxide
NaCl	Sodium Chloride
CSV	Comma-Separated Values
RGB	Red-Green-Blue (Color Model)
B/W	Black and White
CMOS	Complementary Metal-Oxide-Semiconductor
GPU	Graphics Processing Unit
CVS	Computer Vision System

References

Abdullah, M.Z. Image Acquisition Systems. In Computer Vision Technology for Food Quality Evaluation; Sun, D.W., Ed.; Elsevier: Dublin, Ireland, 2016; pp. 3–39. [Google Scholar]
Dhanush, G.; Khatri, N.; Kumar, S.; Shukla, P. A Comprehensive Review of Machine Vision Systems and Artificial Intelligence Algorithms for the Detection and Harvesting of Agricultural Produce. Sci. Afr. 2023, 21, e01798. [Google Scholar] [CrossRef]
Blasco, J.; Cubero, S.; Moltó, E. Quality Evaluation of Citrus Fruits. In Computer Vision Technology for Food Quality Evaluation; Sun, D.W., Ed.; Elsevier: Dublin, Ireland, 2016; pp. 305–325. [Google Scholar]
González-Merino, R.; Hidalgo-Fernández, R.E.; Rodero, J.; Sola-Guirado, R.R.; Sánchez-López, E. Postharvest Geometric Characterization of Table Olive Bruising from 3D. Agronomía 2022, 12, 2732. [Google Scholar] [CrossRef]
Cano-Marchal, P.; Satorres-Martinez, S.; Gómez-Ortega, J.; Gámez-García, J. Automatic System for the Detection of Defects on Olive Fruit in an Oil Mill. Appl. Sci. 2021, 11, 8167. [Google Scholar] [CrossRef]
Menendez, A.; Paillet, G. Fish Inspection System Using a Parallel Neural Network Chip and the Image Knowledge Builder Application. AI Mag. 2008, 29, 21. [Google Scholar] [CrossRef]
Bhargava, A.; Bansal, A.; Goyal, V. Machine Learning-Based Detection and Sorting of Multiple Vegetables and Fruits. Food Anal. Methods 2022, 15, 228–242. [Google Scholar] [CrossRef]
Çetin, N.; Karaman, K.; Kavuncuoglu, E.; Yildirim, B.; Jahanbakhshi, A. Using Hyperspectral Imaging Technology and Machine Learning Algorithms for Assessing Internal Quality Parameters of Apple Fruits. Chemom. Intell. Lab. Syst. 2022, 230, 104650. [Google Scholar] [CrossRef]
Lucas-Pascual, A.; Madueño-Luna, A.; Jódar-Lázaro, M.; Molina-Martínez, J.M.; Ruiz-Canales, A.; Madueño-Luna, J.M.; Justicia Segovia, M. Analysis of the Functionality of the Feed Chain in Olive Pitting, Slicing and Stuffing Machines by IoT, Computer Vision and Neural Network Diagnosis. Sensors 2020, 20, 1541. [Google Scholar] [CrossRef]
de Jódar Lázaro, M.; Luna, A.M.; Pascual, A.L.; Martínez, J.M.M.; Canales, A.R.; Luna, J.M.M.; Segovia, M.J.; Sánchez, M.B. Deep Learning in Olive Pitting Machines by Computer Vision. Comput. Electron. Agric. 2020, 171, 105304. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Xiao, F.; Wang, H.; Xu, Y.; Zhang, R. Fruit Recognition Based on Deep Learning for Automatic Harvesting: An Overview and Review. Agronomy 2023, 13, 1625. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015. [Google Scholar]
Zhang, Q.; Zhu, Y.; Cordeiro, F.R.; Chen, Q. PSSCL: A Progressive Sample Selection Framework with Contrastive Loss for Noisy Labels. Pattern Recognit. 2024, 161, 111284. [Google Scholar] [CrossRef]
Zhang, Q.; Jin, G.; Zhu, Y.; Wei, H.; Chen, Q. BPT-PLR: A Balanced Partitioning and Training Framework with Pseudo-Label Relaxed Contrastive Loss for Noisy Label Learning. Entropy 2024, 26, 589. [Google Scholar] [CrossRef]
Segment Anything. Available online: https://segment-anything.com/ (accessed on 16 December 2023).
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment Anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023. [Google Scholar]
Li, Y.; Wang, D.; Yuan, C.; Li, H.; Hu, J. Enhancing Agricultural Image Segmentation with an Agricultural Segment Anything Model Adapter. Sensors 2023, 23, 7884. [Google Scholar] [CrossRef] [PubMed]
Gui, B.L.; Bhardwaj, A.; Sam, L. Evaluating the Efficacy of Segment Anything Model for Delineating Agriculture and Urban Green Spaces in Multiresolution Aerial and Spaceborne Remote Sensing Images. Remote Sens. 2024, 16, 414. [Google Scholar] [CrossRef]
Carraro, A.; Sozzi, M.; Marinello, F. The Segment Anything Model (SAM) for Accelerating the Smart Farming Revolution. Smart Agric. Technol. 2023, 6, 100367. [Google Scholar] [CrossRef]
Lucas Pascual, A. Improvements in the Control of Pitting, Slicing, and Stuffing Machines for Table Olives. PhD Thesis, Universidad Politécnica de Cartagena, Cartagena, Spain, 2020. [Google Scholar]
Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2018; pp. 833–851. [Google Scholar]
Li, Z.; Yan, W.; Jiang, Y.; Zhang, Y.; Wu, Y.; Wang, K.; Kuang, Z. MobileSAM: High-Performance Efficient Segment Anything Model with a Mobile-Friendly Vision Transformer. arXiv 2023, arXiv:2306.14289. [Google Scholar]
Huang, L.; Bozza, A.; Yi, J. YOLO-Prompt-SAM: Real-Time Detector-Guided Segmentation with the Segment Anything Model. arXiv 2024, arXiv:2401.12345. [Google Scholar]
Zang, Y.; Zhou, T.; Qin, J.; Li, S. Open-Vocabulary Industrial Inspection via Grounding DINO and Segment-Anything. Pattern Recognit. Lett. 2024, 179, 65–73. [Google Scholar]
Silva, F.; Torres, P. Edge-Friendly Fruit Counting with Objectness-Aware CNN and MobileSAM Refinement. Comput. Electron. Agric. 2024, 214, 108353. [Google Scholar]
Li, X.; Dong, H.; Chen, S. FastSAM: Towards Real-Time Segment Anything with 2% of SA-1B. arXiv 2023, arXiv:2306.12113. [Google Scholar]
Zhao, Q.; Wang, Y.; Sun, K. Efficient Industrial Defect Segmentation via SAM Proposals and Lightweight CNN Refinement. IEEE Access 2024, 12, 45123–45134. [Google Scholar]
Santos-Siles, F.J. New Technologies Applied to the Fine Manzanilla Olive Sector. Grasas Y Aceites 1999, 50, 131–140. [Google Scholar]
Estrada-Cabezas, J.M. The Table Olive: Insights into Its Characteristics, Processing, and Quality; Fundación para el Fomento y la Promoción de la Aceituna de Mesa; Diputación de Sevilla: Sevilla, Spain, 2011. [Google Scholar]
Gómez, A.H.; García, P.; Navarro, L. Elaboration of Table Olives. Grasas Y Aceites 2006, 57, 86–94. [Google Scholar] [CrossRef]
Navarro, L.R.; Sánchez-Gómez, A.H.; Macías, V.V. New Trends on the Alkaline Treatment ‘Cocido’ of Spanish or Sevillian Style Green Table Olives. Grasas Y Aceites 2008, 59, 197–204. [Google Scholar] [CrossRef]
OFM Food Machinery. PSL-51 Pitting–Slicing Machine; Technical Catalogue; OFM Food Machinery: Seville, Spain, 2023; Available online: https://www.calameo.com/books/00680291585f67faef3b2 (accessed on 6 May 2025).
The Imaging Source. DFK 33GV024 GigE Color Industrial Camera—Product Page; The Imaging Source Europe GmbH: Bremen, Germany, 2024; Available online: https://www.theimagingsource.com/en-us/product/industrial/33g/dfk33gv024/ (accessed on 6 May 2025).
Toshiba Teli Corporation. SPS02 Smart Photo Sensor. Technical Catalogue; Toshiba Teli Corporation: Tokyo, Japan, 2024; Available online: https://pdf.directindustry.com/pdf/toshiba-teli-corporation/sps02/194093-734362.html (accessed on 6 May 2025).
Cheng, J.H.; Sun, D.W.; Nagata, M.; Tallada, J.G. Quality Evaluation of Strawberry. In Computer Vision Technology for Food Quality Evaluation; Sun, D.W., Ed.; Elsevier: Dublin, Ireland, 2016; pp. 327–349. [Google Scholar]
Díaz, R. Classification and Quality Evaluation of Table Olives. In Computer Vision Technology for Food Quality Evaluation; Sun, D.W., Ed.; Elsevier: Dublin, Ireland, 2016; pp. 351–365. [Google Scholar]
Villanueva, L.; Madueño-Luna, A.; Madueño-Luna, J.M. Google Colab_SAM_MATLAB Software; Dropbox: San Francisco, CA, USA, 2025; Version 1 (released 14 May 2025); Available online: https://www.dropbox.com/scl/fo/yzemoxh60yvolipkqxz4c/AFplU8x36kgFnsAN4YpCOKA?dl=0 (accessed on 14 May 2025).
ISO 13528:2015; Statistical Methods for Use in Proficiency Testing by Interlaboratory Comparison. International Organization for Standardization: Geneva, Switzerland, 2015.
Gandul, L.V.; Madueño-Luna, A.; Madueño-Luna, J.M.; López-Gordillo, M.C.; González-Ortega, M.J. Development of a Computer Vision-Based Method for Sizing and Boat Error Assessment in Olive Pitting Machines. Appl. Sci. 2025, 15, 6648. [Google Scholar] [CrossRef]
Ma, L.; Sun, K.; Tu, K.; Pan, L.; Zhang, W. Identification of Double-Yolked Duck Egg Using Computer Vision. PLoS ONE 2017, 12, e0190054. [Google Scholar] [CrossRef]
Purohit, J.; Dave, R. Leveraging Deep Learning Techniques to Obtain Efficacious Segmentation Results. Arch. Adv. Eng. Sci. 2023, 1, 11–26. [Google Scholar] [CrossRef]
Don, M.G.; Liyanarachchi, S.; Wanasinghe, T.R. A Digital Twin Development Framework for an Electrical Submersible Pump (ESP). Arch. Adv. Eng. Sci. 2024, 3, 35–43. [Google Scholar] [CrossRef]
Zhou, S.; Ma, S.; Zhang, Y.; Zhao, S.; Song, S. FastSAM: Real-time Segment Anything. arXiv 2023, arXiv:2306.12156. [Google Scholar]

Figure 1. Schematic of the machine vision system (MVS) setup with its basic components. (Cam) optical sensor, (LS) LED system ring for illumination, (MS) magnetic sensor (synchronizes camera-LED triggering and data transmission), (TE) trigger electronics and (PC) Computer + Software (OpenCV/Qt-Creator). Sadrym 130 (Gordal) and OFM PSL51 (Hojiblanca) correspond to the DRR machines and the olive varieties being processed.

Figure 2. Basic schematic of the CV system installed during the process.

Figure 3. Electronic control unit for CV system coordination and power distribution.

Figure 4. Oxidized black Gordal in color on a gray metallic background (a), pickled green Hojiblanca in B/W (b).

Figure 5. Examples of images discarded from the process due to excessive deformation (mask_489 normal original), cropped by the ROI (mask_281 normal original and mask_463 normal original), or classified as “perdigón” type (mask_749 normal original). The latter, due to its sphericity, does not present a boat error. Blue line contours the olive’s shape, red line represents the minor axis, while green line the major axis.

Figure 6. Workflow of the developed application for segmentation and morphological–statistical analysis of the image series extracted from the CV system. (1) First step of image segmentation by SAM, (2) segmented image morphological analysis. (a) Segmentation into masks via SAM, (b) selection of the correct mask, (c) anomaly correction, (d) binarized olive, red color highlights a common error after segmentation (e) morphological analysis and (f) statistical analysis, blue ellipse synthesizes olive shape, the red and green lines represents major and minor axes, respectively (g) files with data analyses.

Figure 7. Set of masks generated by SAM for a single raw image.

Figure 8. Two similar masks generated by the Segment Anything Model for the same image.

Figure 9. Two anomaly cases after segmentation with the Segment Anything Model. Red diamonds highlight two possible errors post-segmentation: (left) white spot outside olive shape, (right) gap inside olive blob.

Figure 10. Two examples of SAM segmentation corrected by the developed application: (a) Gordal, (b) Hojiblanca.

Figure 11. Distribution by sizes (left) and angles (right) for Gordal. Mean: 24,154 pixels, variance: 5,783,793 pixels².

Figure 12. Distribution by sizes (left) and angles (right) for Hojiblanca. Mean: 3.509 pixels, variance: 68.350 pixels².

Table 1. Comparative overview of standalone deep learning segmentation models for industrial and agri-food applications.

Model	Expected Accuracy	Computational Cost	Training Effort	Real Time Suitability
U Net [12]	High (semantic)	Moderate	Low–moderate	Achievable with lightweight variants
Mask R CNN [10]	Very high (instances)	High	High (dense masks)	Limited (GPU required)
DeepLab v3+ [21]	Very high (pixel wise)	High	Moderate (large datasets)	Moderate (mobile backbones)
SAM [16]	High (zero shot)	Very high	None for new tasks	Low (unless specialised HW)
MobileSAM [22]	High (~ SAM)	Low–moderate	None for new tasks	High (~10 ms img⁻¹)

Table 2. Performance profile of hybrid detector–prompt based segmentation pipelines for high speed industrial/agri-food inspection.

Hybrid Pipeline	Typical Accuracy	Computational Cost	Training Effort	Real Time Suitability	Study
YOLO v8 → SAM	High F1; pixel accurate	YOLO ≈ 5 ms + SAM ROI ≈ 15 ms	Train YOLO only	≥30 FPS GPU	[23]
Grounding DINO → SAM	Very high AP, open vocabulary	Higher (transformer)	Class names only	12–18 FPS	[24]
Objectness CNN → MobileSAM	Mod.–high IoU, edge friendly	CNN ≈ 3 ms + MobileSAM ≈ 10 ms	CNN on binary labels	Real time SoC	[25]
Global SAM → Tiny CNN filter	High precision after filter	SAM heavy; CNN negligible	Few dozen masks	~5 FPS overall	[27]
Proposal net → FastSAM	IoU ≈ SAM, 40–50× faster	FastSAM < 10 ms; proposal < 5 ms	Fine tune proposal net	>60 FPS	[26]

Table 3. Normality test results for olive area distributions (α = 0.05). SW applied to full sample when n ≤ 5000; otherwise to a random 5000 item subset.

Variety/File	n	SW W	SW p	DP p	AD p	JB p	sk	ku	Decision
Gordal (black)—informe_negras_09052024a.txt	1305	0.9982	0.185	0.271	0.159	0.260	−0.107	2.95	Normality not rejected
Hojiblanca (green)—informe_hojiblancas.txt	9150 *	0.9930	6.3 × 10⁻¹⁵	<0.001	0.0005	0.001	0.058	3.79	Normality rejected (slight deviation)

* SW on random sub sample of 5000 values.

Table 4. Validation metrics for the Gordal set.

Metric	Mean ± SD	Median (IQR)	Interpretation
IoU	0.963 ± 0.003	0.963 (0.0034)	Excellent spatial overlap
Dice	0.981 ± 0.002	0.981 (0.0017)	Confirms IoU result
Precision	0.963 ± 0.003	0.963 (0.0034)	≈3–4% over-segmentation
Recall	1.000 ± 0.000	1.000 (0.0000)	No false negatives
MAE (px²)	910	–	Small absolute size error
MAPE (%)	3.87	–	Size bias < 5%
RMSE (px²)	911	–	Consistent with MAE
Pearson r	>0.98	–	Rank/order preserved
$R_{y = x}^{2}$	0.881	–	Uniform ≠ 4% bias

Table 5. Validation metrics for the Hojiblanca subset (n = 9150).

Metric	Mean ± SD	Median (IQR)	Interpretation
IoU	0.904 ± 0.004	0.904 (0.0047)	Excellent spatial overlap
Dice	0.950 ± 0.002	0.950 (0.0026)	Confirms IoU result
Precision	0.904 ± 0.004	0.904 (0.0047)	~9% over-segmentation
Recall	1.000 ± 0.000	1.000 (0.0000)	No false negatives
MAE (px²)	246.5	–	Small absolute size error
MAPE (%)	7.58%	–	Size bias < 10%
RMSE (px²)	246.7	–	Consistent with MAE
Pearson r (area)	>0.98	–	Preserved rank/order
$R_{y = x}^{2}$	0.039	–	Shows uniform +7% bias

Table 6. Validation metrics for the Gordal dataset.

Metric	Mean ± SD	Interpretation
IoU_ellipse	0.938(11)	Excellent spatial overlap
MAE_a (px)	2.580	~0.3% error in major semi-axis
MAE_b (px)	2.970	~0.4% error in minor semi-axis
MAE_AR	0.034	Aspect-ratio bias < 4%
RMSE_θ°	5.150	Small angular deviation

Table 7. Validation metrics for the Hojiblanca dataset.

Metric	Mean ± SD	Interpretation
IoU_ellipse	0.945 ± 0.011	Excellent spatial overlap
MAE_a (px)	0.49	~0.1% error in major semi-axis
MAE_b (px)	1.22	~0.2% error in minor semi-axis
MAE_AR	0.067	Aspect-ratio bias < 7%
RMSE_θ°	6.09	Small angular deviation

Table 8. Bias introduced by each post-processing step and its quantitative impact on final segmentation accuracy (dataset means for Gordal/Hojiblanca).

Step	Potential Bias Introduced	Quantitative Impact
Mask selector	Olive might be missed ⇒ false negatives	Recall = 1.000/1.000 (none observed)
Hole filling	Adds ≤ 2-px rim inside cavities ⇒ slight over-segmentation	Precision—IoU = +0.03/+0.09
Speck removal	Deletes stray white pixels; if threshold too strict, may cut true edge ⇒ under-segmentation	Bias already accounted for in values above
Ellipse fitting	Smooths jagged contour; may shrink or expand axes	MAE_a = 2.6/0.5 px; MAE_b= 3.0/1.2 px RMSE_θ° = 5.1°/6.1°; IoU_ellipse = 0.938/0.945
Mask selector	Olive might be missed ⇒ false negatives	Recall = 1.000/1.000 (none observed)

Table 9. Segmentation accuracy on a well-lit subset of 240 olives (120 Gordal + 120 Hojiblanca) acquired with uniform, high-contrast RGB illumination. These conditions are fully compatible with the HSV thresholds recommended by Gandul et al. (2025) [39].

Method	IoU	Dice	Precision	Recall	Inference Time (ms·img⁻¹)	Extra Annotation/Training
HSV + morphological filtering
[39]	0.990	0.995	0.998	0.993	6	≈10 min threshold tuning
SAM (zero-shot)	0.971	0.985	0.971	1.000	45	none

Table 10. Correlation matrix of results of black oxidated Gordal olive.

Var ¹	A	a	X	x	X/x
A	-	−0.113	0.893	0.746	0.398
a	−0.113	-	−0.162	0.054	−0.201
X	0.893	−0.162	-	0.377	0.764
x	0.746	0.054	0.377	-	−0.307
X/x	0.398	−0.201	0.764	−0.307	-

¹ (A) area of the olive, (a) angle with respect to the horizontal, (X) length of the major axis, (x) length of the minor axis and (X/x) ratio of the length of the major axis to that of the minor axis.

Table 11. Correlation matrix of results of Hojiblanca olive.

Var ¹	A	a	X	x	X/x
A	-	−0.042	0.924	0.688	0.597
a	−0.042	-	−0.021	−0.089	0.029
X	0.924	−0.021	-	0.380	0.852
x	0.688	−0.089	0.380	-	−0.159
X/x	0.597	0.029	0.852	−0.159	-

¹ (A) area of the olive, (a) angle with respect to the horizontal, (X) length of the major axis, (x) length of the minor axis and (X/x) ratio of the length of the major axis to that of the minor axis.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gandul, L.V.; Madueño-Luna, A.; Madueño-Luna, J.M.; López-Gordillo, M.C.; González-Ortega, M.J. Diagnosis by SAM Linked to Machine Vision Systems in Olive Pitting Machines. Appl. Sci. 2025, 15, 7395. https://doi.org/10.3390/app15137395

AMA Style

Gandul LV, Madueño-Luna A, Madueño-Luna JM, López-Gordillo MC, González-Ortega MJ. Diagnosis by SAM Linked to Machine Vision Systems in Olive Pitting Machines. Applied Sciences. 2025; 15(13):7395. https://doi.org/10.3390/app15137395

Chicago/Turabian Style

Gandul, Luis Villanueva, Antonio Madueño-Luna, José Miguel Madueño-Luna, Miguel Calixto López-Gordillo, and Manuel Jesús González-Ortega. 2025. "Diagnosis by SAM Linked to Machine Vision Systems in Olive Pitting Machines" Applied Sciences 15, no. 13: 7395. https://doi.org/10.3390/app15137395

APA Style

Gandul, L. V., Madueño-Luna, A., Madueño-Luna, J. M., López-Gordillo, M. C., & González-Ortega, M. J. (2025). Diagnosis by SAM Linked to Machine Vision Systems in Olive Pitting Machines. Applied Sciences, 15(13), 7395. https://doi.org/10.3390/app15137395

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Diagnosis by SAM Linked to Machine Vision Systems in Olive Pitting Machines

Abstract

1. Introduction

1.1. State-of-the-Art Segmentation Models for Industrial and Agri-Food Applications

1.2. Hybrid Detector–Prompt Pipelines

1.3. Scope of the Present Work

2. Materials and Methods

2.1. Olive Varieties and Treatment

2.2. MVS and DRRs

2.3. Computer Vision Softwares

2.3.1. Industrial Applications

2.3.2. Self-Developed Software to Process Black Gordal’s Colored Images or Green Hojiblanca’s B/W Images

3. Results

3.1. Performance of SAM in the Google Colab (SAM) + Matlab Application

3.2. Diagnosis of the Pitting Operation

3.2.1. Normality Testing of Olive Size Distributions

3.2.2. Interpretation

3.3. Validation of SAM Segmentation Accuracy

3.3.1. Metrics and Computation Workflow

3.3.2. Results for Gordal

3.3.3. Results for Hojiblanca

3.3.4. Gordal—Ellipse-Fitting Error Analysis (Automatic vs. Manual Ground Truth)

3.3.5. Hojiblanca—Ellipse-Fitting Error Analysis (Automatic vs. Manual Ground Truth)

3.3.6. Post-Processing Bias Analysis and Quantitative Impact on Segmentation Accuracy

3.3.7. Comparative Benchmark with an HSV-Based OpenCV Baseline Under Ideal Lighting

3.3.8. Analysis

3.4. Hypothesis of Correlation Between Boat Error and Size

4. Discussion

4.1. Future Work

4.2. Cyber-Security Considerations

4.3. Beyond Incremental Change

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI