SIRI-MOGA-UNet: A Synergistic Framework for Subsurface Latent Damage Detection in ‘Korla’ Pears via Structured-Illumination Reflectance Imaging and Multi-Order Gated Attention

Zhan, Baishao; Liao, Jiawei; Zhang, Hailiang; Luo, Wei; Wang, Shizhao; Zeng, Qiangqiang; Lai, Yongxian

doi:10.3390/spectroscj3030022

Open AccessArticle

SIRI-MOGA-UNet: A Synergistic Framework for Subsurface Latent Damage Detection in ‘Korla’ Pears via Structured-Illumination Reflectance Imaging and Multi-Order Gated Attention

by

Baishao Zhan

¹,

Jiawei Liao

¹,

Hailiang Zhang

^1,*,

Wei Luo

¹,

Shizhao Wang

²,

Qiangqiang Zeng

¹ and

Yongxian Lai

¹

College of Electrical and Automation Engineering, East China Jiaotong University, Nanchang 330013, China

²

Centre for Basic Experiments and Engineering Practice, East China Jiaotong University, Nanchang 330013, China

^*

Author to whom correspondence should be addressed.

Spectrosc. J. 2025, 3(3), 22; https://doi.org/10.3390/spectroscj3030022

Submission received: 8 May 2025 / Revised: 17 July 2025 / Accepted: 23 July 2025 / Published: 29 July 2025

Download

Browse Figures

Versions Notes

Abstract

Bruising in ‘Korla’ pears represents a prevalent phenomenon that leads to progressive fruit decay and substantial economic losses. The detection of early-stage bruising proves challenging due to the absence of visible external characteristics, and existing deep learning models have limitations in weak feature extraction under complex optical interference. To address the postharvest latent damage detection challenges in ‘Korla’ pears, this study proposes a collaborative detection framework integrating structured-illumination reflectance imaging (SIRI) with multi-order gated attention mechanisms. Initially, an SIRI optical system was constructed, employing 150 cycles·m⁻¹ spatial frequency modulation and a three-phase demodulation algorithm to extract subtle interference signal variations, thereby generating RT (Relative Transmission) images with significantly enhanced contrast in subsurface damage regions. To improve the detection accuracy of latent damage areas, the MOGA-UNet model was developed with three key innovations: 1. Integrate the lightweight VGG16 encoder structure into the feature extraction network to improve computational efficiency while retaining details. 2. Add a multi-order gated aggregation module at the end of the encoder to realize the fusion of features at different scales through a special convolution method. 3. Embed the channel attention mechanism in the decoding stage to dynamically enhance the weight of feature channels related to damage. Experimental results demonstrate that the proposed model achieves 94.38% mean Intersection over Union (mIoU) and 97.02% Dice coefficient on RT images, outperforming the baseline UNet model by 2.80% with superior segmentation accuracy and boundary localization capabilities compared with mainstream models. This approach provides an efficient and reliable technical solution for intelligent postharvest agricultural product sorting.

Keywords:

structured-illumination reflectance imaging; ‘Korla’ pear; latent damage; multi-order gated attention

1. Introduction

Postharvest latent damage poses a significant bottleneck constraining the sustainable development of the pear industry. According to the Food and Agriculture Organization of the United Nations, postharvest losses of perishable crops due to defects reach 15–30%. As a characteristic agricultural product in Xinjiang, China, ‘Korla’ pears with thin, fragile epidermis and delicate flesh are particularly susceptible to invisible mechanical damage during harvesting and transportation [1,2]. Such latent damage triggers browning diffusion within 48–72 h of storage, leading to 40% economic loss in entire batches [3]. Traditional manual inspection relying on empirical judgment suffers from low efficiency and high missed detection rates [4]. Therefore, developing efficient and precise early-stage latent damage detection technology has become a critical challenge for enhancing postharvest quality control.

Current agricultural defect detection technologies primarily rely on optical imaging methods, which can be categorized into spectral imaging and structured imaging systems. Hyperspectral imaging (HSI) demonstrates potential in citrus mold and peach decay detection through continuous spectral analysis [5], but its high equipment costs and time-consuming nature hinder industrial adoption. In contrast, structured-illumination reflectance imaging (SIRI) enhances subsurface defect contrast through light field modulation [6,7,8], showing technical advantages in scenarios like cucumber bruise detection [9]. However, this technology still faces core challenges in pear applications: fruit curvature interference and weak reflectance differences in early-stage damage [10,11].

Deep learning provides new solutions, but direct transplantation of medical imaging architectures like UNet [12] reveals limitations in agricultural scenarios: inadequate capture of micron-scale damage features by traditional encoders, poor adaptability of attention mechanisms in low-contrast images, and degraded cross-variety generalization [13,14]. Emerging trends highlight multiscale feature fusion and lightweight design as key breakthroughs. Improved UNet architectures optimize feature transmission paths [15], while gated mechanisms enhance micro-damage recognition [16]. Model compression techniques balance computational efficiency and detection performance [17,18]. Nevertheless, existing methods remain limited in weak feature extraction under complex optical interference, particularly background noise caused by anisotropic light scattering in fruit tissues, which constitutes the main practical implementation bottleneck.

Addressing these challenges, we propose a novel MOGA-UNet architecture through synergistic optimization of local feature extraction and global context modeling. Differing from Transformers’ tendency to weaken local details, our model preserves UNet’s inherent local inductive bias while innovatively introducing a Multi-Order Gated Aggregation (MOGA) module [19,20]. This module constructs multiscale feature interaction layers through parallel dilated convolutions (d = {1,2,3}), dynamically fusing cross-scale damage features from epidermis to subsurface (0.5–3.0 mm) via GELU gating. To enhance weak edge responses, the decoder employs channel attention (SE) mechanisms for adaptive key-channel reinforcement through global context modeling, effectively suppressing curvature-induced background interference. Considering agricultural real-time requirements, we replace deep convolution stacks with a lightweight VGG16 encoder that preserves high-frequency details while reducing computational redundancy. This “local perception-global calibration” co-design achieves dual breakthroughs in boundary localization and noise suppression under linear computational complexity, establishing a new architectural paradigm for agricultural weak-defect detection.

This study proposes the “SIRI-MOGA-UNet” framework for ‘Korla’ pear latent damage detection, with three key contributions:

(1) To address curvature-induced luminance gradients and low-contrast detection challenges, we develop a SIRI optical system using 150 cycles·m⁻¹ spatial frequency with spherical correction algorithms and three-phase demodulation to generate RT images. Compared with uniform illumination, this system significantly enhances subsurface damage scattering signals while maintaining cost-effectiveness.

(2) We propose a MOGA-SE co-optimization framework achieving deep coupling of cross-scale fusion and global-local semantic enhancement. The lightweight VGG16 encoder preserves high-frequency textures, combining local multiscale perception with global channel screening to avoid the Transformer’s computational overhead.

(3) Establishing a 300-sample multimodal damage database, our model achieves 94.38% mIoU on RT images, surpassing baseline UNet by 2.80%, with single-sample detection time under 5.2 s. Experiments demonstrate superior segmentation accuracy and boundary localization over DeeplabV3+ and HRNet.

The remainder of this paper is structured as follows. Section 2 introduces the materials and methods, including sample preparation, the structured-illumination reflectance imaging (SIRI) system setup, image demodulation techniques, and the construction of the MOGA-UNet architecture. Section 3 presents the experimental results and discusses the performance of the proposed method in comparison with other semantic segmentation models under different imaging modalities. Section 4 concludes the study and outlines potential directions for future research.

2. Materials and Methods

2.1. Sample Preparation

This study utilized Xinjiang’s characteristic ‘Korla’ pears as experimental subjects, whose thin, fragile epidermis and delicate flesh make them particularly susceptible to mechanical damage during postharvest handling. Three hundred defect-free fruits with intact epidermis and no pest infestation were selected through rigorous market screening. A controlled mechanical impact method was employed to simulate postharvest latent damage, constructing a representative damaged sample set. The experimental setup, shown in Figure 1a, consisted of a pendulum impact system comprising a horizontal rotating shaft, an adjustable 30 cm pendulum rope, and a 100 g wooden sphere. Figure 1b shows the position of the pendulum before it falls at an angle that is accurately adjustable. By adjusting the pendulum angle (30°–60°), the impact energy was controlled within 0.05–0.12 J to meet IFPA Class I fresh produce damage standards [21]. The impact energy was calculated as

E = m g h (1 - cos θ)

(1)

where

m = 0.1 kg

,

g = 9.8 m / s^{2}

,

h = 0.3 m

, and

θ

denotes the pendulum angle. Impact points were located at the equatorial region, where mechanical testing confirmed typical stress distribution characteristics.

Microscopic examination within 10 min post-impact verified an unbroken epidermis with surface depression depth ≤ 0.5 mm. Chromatic analysis (D65 illuminant) showed

Δ E < 3.5

, meeting visual imperceptibility criteria for latent damage.

Δ E

is an indicator used to measure the difference between two colors in the Lab* color space. The smaller the value, the weaker the visual difference between the two colors. In industry practice, when

Δ E < 3.5

, the human eye usually cannot distinguish such subtle color changes. This is because the human eye has a physiological threshold for perceiving color differences. Only when the color difference reaches a certain level (generally considered to be when

Δ E > 3.5

) can it be clearly identified by ordinary observers. Histological sectioning revealed 65% ± 7% cell wall integrity loss in damaged tissues, with significantly enlarged intercellular spaces and a 2.8-fold accumulation of browning precursors compared with healthy tissues, as shown in Figure 2b. For comparison, Figure 2a displays the microstructure of healthy tissues.

Samples were stabilized in a climate chamber (20 ± 1 °C, RH 85%) for 12 h before testing to eliminate thermal stress effects. The experimental design adhered to non-destructive principles, with all samples undergoing 48-h accelerated browning tests (25 °C, RH 90%) for posterior validation of damage localization accuracy. To ensure sample diversity, fruits were stratified into 50 g intervals (150–250 g) with diameter variation coefficients controlled within 8%. This protocol generated experimental samples with stable reflectance characteristics and authentic postharvest damage features, validated through preliminary experiments to provide reliable data for subsequent optical detection.

2.2. Structured Illumination Reflectance Imaging (SIRI) System and Image Acquisition

The developed SIRI system enables subsurface defect visualization in fruits through spatial frequency modulation technology, comprising three modules: optical imaging, mechanical control, and data processing. Its core innovation lies in replacing traditional hyperspectral equipment with low-cost visible light sources (450–650 nm wavelength) while integrating polarization filtering to suppress specular reflection [22].

The optical imaging module contains a digital projector (DLP4500, 912 × 1140 pixel resolution) and a scientific monochrome CMOS camera (MV-CA050-10GM, 2592 × 2048 resolution), coaxially mounted on adjustable-height brackets. A precision guide rail enables continuous working distance adjustment from 30 to 50 cm in Figure 3.

To enhance the subsurface scattering signal-to-noise ratio, a cross-polarization design was implemented: linear polarizers (extinction ratio > 1000:1) were installed in both projection and imaging paths, optimized to attenuate specular reflection below 5% of incident intensity [23]. A 450 nm long-pass filter further eliminated ultraviolet environmental interference. The mechanical module integrates a motorized lifting platform and multi-axis rotary clamp, accommodating 60–90 mm diameter fruits with precise optical axis alignment.

Image acquisition follows strict timing protocols: Within 10 min post-impact, samples undergo dark current correction before projection. MATLAB-generated phase-shifted sinusoidal patterns (150 cycles·m⁻¹ spatial frequency, 8-bit grayscale) are projected via DMD chips at a 120 Hz refresh rate. Synchronized camera triggering with 50 ms exposure time (optimized through preliminary experiments) captures three phase-shifted raw images per sample. Daily radiometric calibration using standard reflectance panels ensures <3% illumination inhomogeneity.

This system demonstrates dual advantages: (1) Phase demodulation enables depth-selective subsurface signal extraction with 2–3 mm effective detection depth, meeting early-stage damage detection requirements; (2) Commercial-grade components reduce costs by 75% compared with hyperspectral systems, achieving <5 s per sample detection time for industrial scalability. The dataset includes 300 valid samples covering varying damage levels (0.05–0.12 J impact energy) and physiological states (0–48 h storage), providing comprehensive training data for model development.

2.3. Image Demodulation and Processing

As illustrated in Figure 4, the three-phase shift method was employed for structured-illumination demodulation to extract subsurface defect features. The mathematical foundation can be expressed as follows: When projecting sinusoidal fringe patterns onto the sample surface, the reflected intensity distribution comprises DC (direct current) and AC (alternating current) components, where DC represents surface reflectance properties and AC reflects light-tissue microstructure interactions. Using three phase-shifted patterns with

2 π / 3

intervals (phase angles

ϕ = - 2 π / 3, 0, 2 π / 3

), the reflected intensities

I_{1}, I_{2}, I_{3}

form the following equation system:

I_{DC} = \frac{1}{3} (I_{1} + I_{2} + I_{3})

(2)

I_{AC} = \frac{\sqrt{2}}{3} \sqrt{{(I_{1} - I_{2})}^{2} + {(I_{1} - I_{3})}^{2} + {(I_{2} - I_{3})}^{2}}

(3)

The DC image characterizes overall reflectance under uniform illumination, while the AC image enhances subsurface scattering contrast through amplitude modulation. To eliminate illumination inhomogeneity, we constructed RT images via normalization [24]:

RT = \frac{I_{AC}}{I_{DC}} = \frac{\sqrt{2}}{I_{1} + I_{2} + I_{3}} \sqrt{{(I_{1} - I_{2})}^{2} + {(I_{1} - I_{3})}^{2} + {(I_{2} - I_{3})}^{2}}

(4)

where

I_{1}, I_{2}, I_{3}

denote phase-shifted reflectance images with offsets

- 2 π / 3, 0, 2 π / 3

, respectively. Due to the variation in light penetration depth at different spatial frequencies, selecting an appropriate frequency is crucial for the accurate detection of bruises in pears. Preliminary experiments demonstrated that frequencies in the range of 0 to 250 cycles·m⁻¹ yielded favorable detection performance. Accordingly, spatial frequencies of 50, 100, 150, 200, and 250 cycles·m⁻¹ were selected for imaging. By comparing the demodulated results across these frequencies, the optimal spatial frequency for accurate bruise detection was identified.

Prior to experimentation, sinusoidal fringe patterns with varying spatial frequencies were generated on a computer. By adjusting the parameters of the mathematical equations, fringe images corresponding to each frequency were obtained. These patterns exhibited distinct visual characteristics: high spatial frequencies appeared as dense black-and-white stripes, whereas low frequencies resulted in more widely spaced fringes.

To quantitatively compare the bruise enhancement effect at each spatial frequency, the contrast index (CI) was introduced. CI measures the image contrast and represents the distinguishability of the bruised region relative to the overall fruit tissue. Calculation of CI involves dividing the pear image into bruised and intact regions and computing the ratio between inter-class variance and the total variance of pixel intensities [25].

CI = \frac{N_{x} {(\bar{x} - \bar{z})}^{2} + N_{y} {(\bar{y} - \bar{z})}^{2}}{\sum_{i = 1}^{N_{z}} {(z_{i} - \bar{z})}^{2}}

(5)

where

N_{x}

,

N_{y}

, and

N_{z}

represent the number of pixels in the bruised region, intact tissue, and the entire area, respectively.

\bar{x}

,

\bar{y}

, and

\bar{z}

denote the mean intensities of the bruised region, intact tissue, and the whole region, respectively. The CI value ranges from 0 to 1, with higher values indicating better visibility and distinguishability of the bruised region.

In subsurface bruised regions of pears, RT images provide a more uniform background. Due to the curvature of the pear surface, RT images play a positive role in correcting intensity distortion, effectively mitigating the impact of uneven illumination. In contrast, AC images exhibit noticeably darker edges in the background. As the spatial frequency increases, the overall brightness of RT images rises. However, at higher spatial frequencies, excessive overall brightness leads to reduced bruise contrast, causing the bruised area to fade or be surrounded by darkened regions.

Both samples S1 and S2 achieved the highest CI values at a spatial frequency of 150 cycles·m⁻¹. As shown in Table 1, the CI values at 150 and 200 cycles·m⁻¹ remain at relatively high levels. Therefore, a spatial frequency of 150 cycles·m⁻¹ was selected as the final frequency for subsequent bruise detection across all samples in this study.

The demodulation workflow was automated in MATLAB R2022b: Gaussian filtering (

σ = 1.5

) suppressed high-frequency noise, followed by phase unwrapping to eliminate periodic ambiguity. At 150 cycles·m⁻¹ spatial frequency, the system effectively detected tissue anomalies up to 2 mm subsurface, achieving 62% SNR improvement over uniform illumination. A spherical-model intensity correction algorithm reduced edge intensity deviation from ±18% to <±5% by dynamically adjusting local gains through fruit contour fitting.

Figure 5 shows the simulated modulation transfer function (MTF) curves under different imaging systems. In practical systems, optical aberrations, noise interference, and hardware response fluctuations cause slight high-frequency oscillations in the curves: The red dashed line (Uniform Illumination) simulates traditional illumination, where high-frequency details attenuate rapidly and are heavily affected by noise. The blue solid line (SIRI System) represents the structured light system, which enhances the mid-to-high frequency response and exhibits slight oscillations reflecting system response variations. The green dash-dot line (SIRI + RT Image) further enhances contrast and deep features, demonstrating the strongest detail preservation ability and the smoothest curve.

2.4. Experimental Procedure

During the sample acquisition phase, a customized Structured-Illumination Reflectance Imaging (SIRI) system was employed. A digital projector was used to project three groups of sinusoidal fringe patterns, each with a phase shift of

2 π / 3

and a spatial frequency of 150 cycles·m⁻¹, onto the pear surface. A monochromatic CMOS camera synchronously captured the reflected light intensity distributions. To suppress brightness gradients induced by surface curvature, a cross-polarization configuration combined with a 450 nm long-pass filter was adopted, effectively attenuating specular reflection and ambient light noise. During acquisition, the fruit samples were fixed using a clamp, and raw phase-shifted images were obtained with an exposure time of 50 ms, ensuring alignment of the damaged areas along the optical axis and avoiding motion blur.

In the image demodulation phase, subsurface features were extracted using a three-step phase-shifting method by solving the corresponding phase-shifting equations to separate the direct current (DC) and alternating current (AC) components. To further enhance the contrast of the damaged regions, a relative transmittance (RT) image was constructed by normalizing the AC components, thereby eliminating illumination nonuniformity. Experimental results demonstrated that this demodulation approach significantly improved the signal-to-noise ratio between damaged and healthy tissues, thereby enhancing the visualization of deep damage.

For the segmentation phase, the optimized RT images were input into the MOGA-UNet model. Multiscale features were extracted using a VGG16 encoder, and cross-scale damage features were fused via a multi-order gated aggregation (MOGA) module. Additionally, a squeeze-and-excitation (SE) channel attention mechanism was incorporated to enhance weak edge responses. This entire workflow achieved an end-to-end pipeline from optical imaging to damage segmentation in Figure 6, with the detection time per sample controlled within 5.2 s, providing an efficient and reliable technical basis for subsequent quality grading.

2.5. Image Augmentation

Using the SIRI system, 300 structured-illumination images of damaged pears were acquired at a spatial frequency of 150 cycles·m⁻¹. After applying the three-step phase-shifting demodulation method, 300 relative transmittance (RT) images were obtained to construct a dataset of latent pear damage. A total of 80 RT images were randomly selected as the test set, while the remaining 220 images were used to build the training set.

To increase dataset diversity and enhance model generalization, several image augmentation techniques were applied, including vertical flipping, horizontal flipping, random rotations (with angles between 30° and 150° and 210° and 330°), and random scaling (with scaling factors ranging from 0.5 to 0.8 and 1.5 to 2.0). Additionally, random brightness adjustment (±15%) and contrast scaling (0.8–1.2 times) were employed to simulate variations in real-world lighting conditions. After augmentation, the number of training images was expanded to 1000.

The locations of latent damages on pears were manually annotated using the LabelMe software (version 3.16.7). The dataset was randomly divided into training, validation, and test sets at a ratio of 7:2:1. The test set was used for evaluating the algorithm’s performance, while the training and validation sets were used for model training.

2.6. Semantic Segmentation Network Model

The MOGA-UNet architecture consists of a VGG encoder, a bottom MOGA module, a squeeze-and-excitation (SE) attention mechanism, and a decoder. Figure 7 illustrates the proposed framework. Based on the classical UNet architecture, the network innovatively integrates a multi-order gated aggregation (MOGA) module and an SE attention mechanism, forming a collaborative “encoding–feature fusion–decoding” process.

As shown in Figure 7a, the encoding stage adopts a lightweight VGG16 network to extract multiscale features through five convolution-pooling operations, progressively reducing the spatial dimensions. The first to third layers use standard

3 \times 3

convolutions to capture epidermal texture details, while the fourth and fifth layers employ dilated convolutions (dilation rate = 2) to enhance subsurface damage features. At the end of the encoder, a MOGA module (Figure 7b) is introduced, constructing a multi-order feature interaction layer using three parallel depthwise separable convolutions with dilation rates

d = {1, 2, 3}

. These paths focus on scattering features corresponding to depths of 0.5 mm, 1.5 mm, and 3.0 mm, respectively. A GELU-based gating mechanism dynamically weights the contributions of each scale. This design effectively addresses the traditional models’ missed detection of subtle damages caused by fixed convolution kernels through cross-scale feature fusion [26].

During the decoding stage, a symmetric upsampling path is employed. Shallow detail features from the encoder are integrated with deep semantic features via skip connections. An SE attention module (Figure 7c) is embedded after each deconvolution operation. This mechanism compresses the spatial dimensions through global average pooling to obtain channel-level statistics, generates channel weight vectors through fully connected layers followed by a sigmoid activation, and adaptively enhances damage-relevant feature channels.

The encoder part of this architecture consists of a series of convolutional layers followed by max-pooling layers, progressively reducing the spatial resolution while increasing the depth of the feature maps.

2.6.1. SE Attention Mechanism

The SE module is applied after the final max-pooling layer. It consists of two fully connected layers, a ReLU activation function, and a sigmoid function to recalibrate feature representations. The squeeze operation generates a feature map

{FM}_{C}

of size

(1 \times 1 \times 1 \times C)

to obtain global channel-wise information statistics

S_{C}

via global average pooling, maintaining the number of channels. This process is expressed as

S_{C} = \frac{1}{D \times H \times W} \sum_{i = 1}^{D} \sum_{j = 1}^{H} \sum_{k = 1}^{W} {FM}_{C} (i, j, k),

(6)

where D, H, and W denote the depth, height, and width, respectively [27].

Following the squeeze operation, the excitation operation determines the inter-channel dependencies and consists of two fully connected layers with a ReLU activation followed by a sigmoid function. By eliminating the biases in the fully connected layers, the channel dependencies can be analyzed more effectively. The output length remains consistent with the number of channels in the second fully connected layer, and the sigmoid function constrains the output values between 0 and 1. Finally, a scaling operation recalibrates the feature map by emphasizing informative channels, producing the recalibrated feature map

{FM}_{C}^{'}

as follows:

{FM}_{C}^{'} = S_{C} \otimes {FM}_{C},

(7)

where ⊗ denotes element-wise multiplication.

Integrating the SE (squeeze-and-excitation) attention mechanism into the two convolutional layers of the decoder block enables adaptive reweighting of feature representations [28]. The SE attention module models the interdependencies between channels for the convolutional output features. Within decoder blocks of architectures such as U-Net, the squeeze operation compresses the spatial information of each channel into a representative statistic, followed by an excitation operation that generates corresponding channel-wise weights. This approach adaptively adjusts the importance of each feature channel, allowing the model to focus more on task-relevant features.

The incorporation of this attention mechanism enhances the feature representation capabilities of the decoder blocks. Although two convolutional layers can already extract features, the SE module enables more discriminative information to be captured within the feature space. By dynamically adjusting channel weights, the model can learn more complex and representative feature combinations. For complex image data, adding SE attention helps the decoder block better capture object shapes, colors, and spatial relationships, thereby improving the extraction of higher-level semantic features.

2.6.2. MOGA Module

The MOGA module is designed to aggregate multi-order contextual features. This is achieved by using three parallel depthwise convolution (DWConv) layers with dilation rates

d \in {1, 2, 3}

, which are employed to capture low-order, mid-order, and high-order interactions.

Assume the input feature to the MOGA module is

Z \in R^{C \times H W}

. A

5 \times 5

DWConv with

d = 1

is first applied to extract low-order features. The output is then split along the channel dimension into three parts:

Z_{l} \in R^{C_{l} \times H W}

,

Z_{m} \in R^{C_{m} \times H W}

, and

Z_{h} \in R^{C_{h} \times H W}

, where

C_{l} + C_{m} + C_{h} = C

.

Next,

Z_{l}

and

Z_{h}

are processed by DWConv layers with kernels of size

5 \times 5

and

7 \times 7

and dilation rates

d = 2

and

d = 3

, respectively. Meanwhile,

Z_{m}

is kept unchanged as an identity mapping. Finally, the outputs of

Z_{l}

,

Z_{m}

, and

Z_{h}

are concatenated along the channel dimension to form the multi-order context feature:

Y_{c} = Concat (Y_{l, 1 : C_{l}}, Y_{m}, Y_{h})

(8)

Specifically, the low-order feature subset

Z_{l}

is processed by a depthwise separable convolution (DWConv) with a dilation rate of 2 to obtain the feature map

Y_{l}

, which focuses on the perception of shallow damage within a depth range of approximately 0.5–1.0 mm. The high-order feature subset

Z_{h}

is convolved using a DWConv with a dilation rate of 3 to extract

Y_{h}

, enhancing the response to deeper damage regions around 3.0 mm. In contrast, the middle-order feature subset

Z_{m}

is preserved via an identity mapping strategy, directly retained as the mid-scale contextual feature

Y_{m}

to avoid introducing redundant transformations. Finally, the three output feature maps

Y_{l}

,

Y_{m}

, and

Y_{h}

are concatenated along the channel dimension to form the aggregated multi-order contextual representation

Y_{c} \in R^{C \times H W}

.

In the gated branch, the GELU activation function is used to aggregate the contextual outputs, achieving optimal performance. Therefore, the final output of the MOGA module can be expressed as

H_{Moga} (Z) = H_{Conv 1} (H_{GELU} (H_{Conv 1} (X)) ⊙ H_{GELU} (H_{Conv 1} (Y_{c}))) + X

(9)

where

H_{Conv 1}

denotes a

1 \times 1

convolution operation for channel adjustment and feature compression,

H_{GELU}

represents the nonlinear GELU activation function, ⊙ indicates element-wise multiplication, X is the input feature map,

Y_{c}

is the aggregated contextual feature, and Z is the output feature.

At the bottom of the UNet encoder block, the MOGA module enables more targeted feature extraction. Through its internal mechanisms, it selects features most relevant to the target task from numerous raw features, focusing on key information such as texture and density that differentiate various types and characteristics, while reducing attention to irrelevant background information.

The MOGA module enhances the representativeness of extracted features. During encoding, it optimizes feature combinations so that each feature better reflects the intrinsic structure of the data. The processed features more effectively capture discriminative attributes such as object shapes, providing stronger support for segmentation tasks.

In the UNet architecture, features extracted at the bottom of the encoder are crucial for subsequent multiscale feature fusion. High-quality low-level features extracted by the MOGA module serve as valuable inputs for the decoder, carrying rich global and local structural information to facilitate multiscale feature integration.

Furthermore, the MOGA module improves the feature fusion between encoder and decoder blocks, enabling smoother information transfer. It allows the model to better exploit hierarchical features for precise segmentation of different semantic regions, thus enhancing segmentation accuracy.

2.6.3. VGG Encoder

In MOGA-UNet, the VGG network is employed to replace the standard encoder blocks. Leveraging its deep architecture, VGG captures hierarchical deep features through multiple convolutional layers, mining diverse information ranging from low-level to high-level representations [29]. The rich feature representations generated by VGG result in multiscale feature maps that cover different aspects of the image.

In terms of model performance, VGG’s feature extraction capabilities significantly improve task accuracy, facilitating the precise identification of semantic regions. Its widespread use and compatibility enable MOGA-UNet to adapt to a wide range of application domains and facilitate subsequent extensions, such as adding new layers or integrating advanced techniques. This provides robust support for optimizing and expanding MOGA-UNet across different task scenarios.

2.7. Model Evaluation

The evaluation metrics for the MOGA-UNet model are presented below, including true positives (TP), true negatives (TN), overall accuracy (ACC), mean precision (P), mean recall (R), mean pixel accuracy (MPA), F1 score (Dice coefficient), and mean intersection over union (mIoU). A higher mIoU value indicates better segmentation performance. TP and TN represent the proportions of correctly classified worn samples and healthy samples within their respective classes. ACC denotes the ratio of correctly classified samples to the total number of test samples. Higher values of TP, TN, and ACC suggest superior classification performance. Precision and recall assess the model’s ability to predict specific categories, while higher mIoU values reflect improved segmentation performance [30]. Some of the key formulas are given as follows:

ACC = \frac{TP + TN}{TP + TN + FP + FN}

(10)

P = \frac{TP}{TP + FP}

(11)

R = \frac{TP}{TP + FN}

(12)

F_{1} = \frac{2 \times P \times R}{P + R} = \frac{2 TP}{2 TP + FP + FN}

(13)

mIoU = \frac{TP}{TP + FP + FN}

(14)

During the training of the semantic segmentation networks, no pretrained models were utilized. All models were trained from scratch with randomly initialized parameters to ensure a fair and accurate comparison of network performance on the task of latent damage segmentation in pears. The number of training epochs was set to 100, and the batch size was set to 8. The Adam optimizer was employed for model training. Experiments were conducted on a workstation equipped with a Windows 10 operating system, PyTorch deep learning framework, and an NVIDIA GeForce RTX 4060 GPU (NVIDIA Corporation, Santa Clara, CA, USA).

3. Results and Discussion

3.1. Identification of Latent Damage in Xiang Pears via Image Segmentation

The performance of six semantic segmentation models in detecting latent damage on AC images was evaluated (Figure 8). It was observed that the segmentation boundaries produced by DeeplabV3+ and PSPNet exhibited significant jagged artifacts, particularly resulting in over-segmentation in the transitional regions of the damage edges. In contrast, the MOGA-UNet achieved the highest contour alignment with the ground truth masks, benefiting from the multiscale feature fusion capability of the MOGA module, which contributed to its smoother segmentation boundaries.

Quantitative analysis indicated that the MOGA-UNet achieved a mean Intersection over Union (mIoU) of 94.10% on AC images, representing a 2.52% improvement over the baseline UNet model. This result confirms the effectiveness of the SE attention mechanism in enhancing feature representations under low signal-to-noise ratio conditions in AC images. However, it was also noted that all models exhibited higher false positive rates on AC images compared with RT images (MOGA-UNet: 3.2% vs 2.1%), a phenomenon directly attributed to the inherent brightness inhomogeneity of the AC images.

The RT image modality has a positive effect on model performance (Figure 9). The segmentation results of MOGA-UNet on RT images demonstrate clearer damage boundaries, with a Dice coefficient of 97.02%, representing a 0.15% improvement over the AC images. Notably, the DeeplabV3-DM model exhibits false detection in the fruit stalk region (highlighted by the yellow circle), while MOGA-UNet effectively avoids such errors through channel suppression provided by the SE module. When comparing the models, the mIoU of MOGA-UNet on RT images shows an absolute improvement of 0.28%, significantly outperforming other models. This confirms the synergistic enhancement effect of the proposed normalization approach (RT = AC/DC) combined with the MOGA-UNet architecture.

3.2. Model Evaluation After Image Segmentation

On AC images, MOGA-UNet leads in both mPrecision (97.50%) and mRecall (96.26%), with an F1 score of 96.87%, outperforming the second-best UNet model by 1.54% (Figure 10). PSPNet shows a clear performance gap, with an mIoU of 79.88%, which is attributed to insufficient high-frequency noise suppression in AC images. Notably, all models exhibit lower mRecall than mPrecision, with an average difference of 2.8%, reflecting a tendency for missed detections due to the low contrast between damage regions and the background in AC images.

The RT image significantly improved the overall performance of all models. MOGA-UNet achieved an mIoU of 94.38%, which is a 0.28% improvement over the AC image, and its mRecall (96.27%) and mPrecision (97.80%) are more balanced, with a difference of only 1.53% (Figure 11). This indicates that the homogenization processing of RT images effectively mitigates the class imbalance issue. DeeplabV3-plus showed the largest performance improvement on RT images but still lags behind MOGA-UNet by an absolute difference of 5.6%. This result confirms that the feature enhancement effect of RT images needs to be combined with an adapted model architecture to fully leverage its potential.

The model evaluation results with AC and RT images as independent data inputs are shown in Figure 8 and Figure 9. The charts indicate that the combination of SIRI technology and deep learning for detecting latent damage in ‘Korla’ pears is feasible, achieving satisfactory detection accuracy. The detection accuracy of the RT images is consistently superior to that of the AC images, which is consistent with visual inspection results. The MOGA-UNet model demonstrates the highest detection accuracy and the greatest model stability, exhibiting better detection results with image input.

As shown in Figure 12 and Figure 13, the loss function and validation set mIoU trend of MOGA-UNet over 100 training epochs are presented. The loss value drops rapidly in the first 20 epochs (0.175 → 0.045) and then enters a stable convergence stage, with the final training/validation losses stabilizing at 0.028 ± 0.003 and 0.03 ± 0.005, respectively. The validation set mIoU reaches a peak of 94.38% after 80 epochs, with minimal overfitting (training/validation mIoU difference < 1.2%). An analysis of the curve fluctuations reveals local oscillations at epochs 45 and 78, which are related to the stage-wise activation of the online hard sample mining strategy. However, these fluctuations did not affect the final convergence stability, and the final values reached the desired results after improvement.

The contribution of the model structure improvements was quantified through ablation experiments: removing the MOGA module resulted in a 6.7% decrease in mIoU, eliminating the SE attention mechanism led to a 12.4% increase in misdetection of irregular edge regions, and replacing the VGG encoder with ResNet50 caused a 9.3% increase in missed small damage detections due to over-abstracted features. These results demonstrate that the proposed threefold improvement (VGG deep encoding, MOGA multiscale fusion, and SE channel optimization) forms a complementary enhancement effect—VGG’s shallow convolution retains high-frequency details, MOGA aggregates cross-scale features, and the SE mechanism strengthens semantically important channels. Together, these improvements overcome the limitations of traditional models in segmenting low-contrast, weak-edge damage.

This study validates the effectiveness of combining SIRI and the improved MOGA-UNet model for the identification and segmentation of latent damage in pears. The surface of latent damage in pears does not exhibit obvious features in the early stages. SIRI was used to capture images of latent damage in pears under a spatial frequency of 150 cycles·m⁻¹. Compared with direct current (DC) images, AC images enhance the early attenuation features of the pears. RT images were obtained through brightness correction, which enhanced the contrast between the latent damage and normal areas and reduced the phenomenon of uneven brightness along the spherical edges. When multiple semantic segmentation networks and the improved MOGA-UNet model were applied to both AC and RT images, MOGA-UNet achieved the best segmentation results, with accuracy exceeding 97%. MOGA-UNet can automatically extract and learn attenuation features from images, yielding the best results on RT images, with an overall accuracy of 97.8% and an IoU of 0.94 (Table 2).

Table 3 summarizes the performance of the proposed SIRI + MOGA-UNet model compared with other representative methods from recent studies. Cai et al. (2025) [24] applied YOLOv7 to citrus SIRI images, achieving a Dice coefficient of 95.10% and an mIoU of 93.50%. Zhang et al. (2024) [31] employed a conventional U-Net model on navel orange datasets, reporting an mIoU of 90.30% without providing a Dice score. In contrast, the proposed method outperforms both, achieving the highest segmentation accuracy with a Dice coefficient of 97.02% and an mIoU of 94.38%. These results demonstrate the effectiveness of the MOGA-UNet architecture in capturing subsurface damage features with higher precision, especially when applied to RT images of ‘Korla’ pears.

4. Conclusions

This study proposes the SIRI-MOGA-UNet synergistic framework, enabling high-precision detection of subsurface concealed damage in Korla pears. The RT images generated by the SIRI system significantly enhance the contrast of damaged areas. The MOGA-UNet, through lightweight VGG16 encoding, multi-order gated aggregation, and channel attention mechanisms, achieves an mIoU of 94.38% and a Dice coefficient of 97.02% on RT images, showing a significant improvement compared with the baseline model. The SIRI technology realizes subsurface signal enhancement with a low-cost visible light system, while the MOGA-UNet accurately captures concealed damage through the triple mechanism of “local detail preservation—cross-scale fusion—global channel optimization”. This achievement provides an efficient and reliable technical paradigm for the intelligent detection of postharvest concealed defects in agricultural products, featuring both academic innovation and industrial application value.

Author Contributions

Conceptualization, B.Z. and J.L.; methodology, B.Z.; software, J.L.; validation, Q.Z., B.Z. and J.L.; formal analysis, W.L.; investigation, J.L.; resources, B.Z.; data curation, J.L.; writing—original draft preparation, J.L.; writing—review and editing, W.L. and Y.L.; visualization, H.Z.; supervision, S.W. and Y.L.; project administration, B.Z.; funding acquisition, B.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China Regional Science Foundation Project (Approval numbers: 62265007 and 32260622).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to subsequent studies by other authors needing to be further researched.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Celik, H.K. Determination of bruise susceptibility of pears (Ankara variety) to impact load by means of FEM-based explicit dynamics simulation. Postharvest Biol. Technol. 2017, 128, 83–97. [Google Scholar] [CrossRef]
Fu, X.; Wang, M. Detection of early bruises on pears using fluorescence hyperspectral imaging technique. Food Anal. Methods 2022, 15, 115–123. [Google Scholar] [CrossRef]
Li, J.; Lu, Y.; Lu, R. Identification of early decayed oranges using structured-illumination reflectance imaging coupled with fast demodulation and improved image processing algorithms. Postharvest Biol. Technol. 2024, 207, 112627. [Google Scholar] [CrossRef]
Neupane, C.; Pereira, M.; Koirala, A.; Walsh, K.B. Fruit sizing in orchard: A review from caliper to machine vision with deep learning. Sensors 2023, 23, 3131. [Google Scholar] [CrossRef] [PubMed]
Munera, S.; Blasco, J.; Amigo, J.M.; Cubero, S.; Talens, P.; Aleixos, N. Use of hyperspectral transmittance imaging to evaluate the internal quality of nectarines. Biosyst. Eng. 2019, 182, 54–64. [Google Scholar] [CrossRef]
Lu, Y.; Li, R.; Lu, R. Structured-illumination reflectance imaging (SIRI) for enhanced detection of fresh bruises in apples. Postharvest Biol. Technol. 2016, 117, 89–93. [Google Scholar] [CrossRef]
He, Y.; Xiao, Q.L.; Bai, X.L.; Zhou, L.; Liu, F.; Zhang, C. Recent progress of nondestructive techniques for fruits damage inspection: A review. Crit. Rev. Food Sci. Nutr. 2022, 62, 5476–5494. [Google Scholar] [CrossRef]
Li, J.B.; Huang, W.Q.; Tian, X.; Wang, C.P.; Fan, S.X.; Zhao, C.J. Fast detection and visualization of early decay in citrus using Vis-NIR hyperspectral imaging. Comput. Electron. Agric. 2016, 127, 582–592. [Google Scholar] [CrossRef]
Lu, Y.; Lu, R. Enhancing chlorophyll fluorescence imaging under structured illumination with automatic vignetting correction for detection of chilling injury in cucumbers. Comput. Electron. Agric. 2020, 168, 105145. [Google Scholar] [CrossRef]
Li, J.; Luo, W.; Han, L.; Cai, Z.; Guo, Z. Two-wavelength image detection of early decayed oranges by coupling spectral classification with image processing. J. Food Compos. Anal. 2022, 111, 104642. [Google Scholar] [CrossRef]
Cai, Z.; Sun, C.; Zhang, H.; Zhang, Y.; Li, J. Developing universal classification models for the detection of early decayed citrus by structured-illumination reflectance imaging coupling with deep learning methods. Postharvest Biol. Technol. 2024, 210, 112788. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Fan, S.; Liang, X.; Huang, W.; Zhang, V.J.; Pang, Q.; He, X.; Li, L.; Zhang, C. Real-time defects detection for apple sorting using NIR cameras with pruning-based YOLOV4 network. Comput. Electron. Agric. 2022, 193, 106715. [Google Scholar] [CrossRef]
Chen, Y.; An, X.; Gao, S.; Li, S.; Kang, H. A deep learning-based vision system combining detection and tracking for fast on-line citrus sorting. Front. Plant Sci. 2021, 12, 622062. [Google Scholar] [CrossRef] [PubMed]
Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 2019, 39, 1856–1867. [Google Scholar] [CrossRef]
Zhao, T.; Fu, C.; Song, W.; Sham, C.-W. RGGC-UNet: Accurate deep learning framework for signet ring cell semantic segmentation in pathological images. Bioengineering 2023, 11, 16. [Google Scholar] [CrossRef]
Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning where to look for the pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar] [CrossRef]
Lu, Y.; Lu, R.; Zhang, Z. Detection of subsurface bruising in fresh pickling cucumbers using structured-illumination reflectance imaging. Postharvest Biol. Technol. 2021, 180, 111624. [Google Scholar] [CrossRef]
Lu, Y.; Lu, R. Development of a multispectral structured illumination reflectance imaging (SIRI) system and its application to bruise detection of apples. Trans. ASABE 2017, 60, 1379–1389. [Google Scholar] [CrossRef]
Zhang, M.; Li, C.; Yang, F. Optical properties of blueberry flesh and skin and Monte Carlo multi-layered simulation of light interaction with fruit tissues. Postharvest Biol. Technol. 2019, 150, 28–41. [Google Scholar] [CrossRef]
Xu, T.; Zhang, X.; Zhu, Y.; Xu, X.; Rao, X. Evolution pattern in bruised tissue of ‘red delicious’ apple. Foods 2024, 13, 602. [Google Scholar] [CrossRef]
Lu, Y.; Sardari, H. A structured-illumination reflectance imaging dataset for woody breast assessment of broiler meat. Data Brief 2025, 60, 111612. [Google Scholar] [CrossRef]
Zhang, S. High-speed 3D shape measurement with structured light methods: A review. Opt Laser. Eng. 2018, 106, 119–131. [Google Scholar] [CrossRef]
Cai, Z.; Zhang, Y.; Li, J.; Zhang, J.; Li, X. Synchronous detection of internal and external defects of citrus by structured-illumination reflectance imaging coupling with improved YOLO v7. Postharvest Biol. Technol. 2025, 227, 113576. [Google Scholar] [CrossRef]
Zhang, J.; Chen, L.; Luo, L.; Cai, Z.; Shi, R.; Cai, L.; Li, J. Construction of a stable YOLOv8 classification model for apple bruising detection based on physicochemical property analysis and structured-illumination reflectance imaging. Postharvest Biol. Technol. 2025, 219, 113194. [Google Scholar] [CrossRef]
Bai, D.; Li, G.; Jiang, D.; Tao, B.; Yun, J.; Hao, Z.; Zhou, D.; Ju, Z. Depth feature fusion based surface defect region identification method for steel plate manufacturing. Comput. Electr. Eng. 2024, 116, 109166. [Google Scholar] [CrossRef]
Dixit, R.B.; Jha, C.K. Fundus image based diabetic retinopathy detection using EfficientNetB3 with squeeze and excitation block. Med. Eng. Phys. 2025, 140, 104350. [Google Scholar] [CrossRef] [PubMed]
Jin, X.; Xie, Y.; Wei, X.S.; Zhao, B.R.; Chen, Z.M.; Tan, X. Delving deep into spatial pooling for squeeze-and-excitation networks. Pattern Recogn. 2022, 121, 108159. [Google Scholar] [CrossRef]
Sengupta, A.; Ye, Y.; Wang, R.; Liu, C.; Roy, K. Going deeper in spiking neural networks: VGG and residual architectures. Front. Neurosci. 2019, 13, 95. [Google Scholar] [CrossRef]
Behera, S.K.; Rath, A.K.; Sethy, P.K. Fruits yield estimation using Faster R-CNN with MIoU. Multimed. Tools Appl. 2021, 80, 19043–19056. [Google Scholar] [CrossRef]
Zhang, H.; Zhang, J.; Zhang, Y.; Wei, J.; Zhan, B.; Liu, X.; Luo, W. Structured-illumination reflectance imaging combined with deep learning for detecting early decayed oranges. Postharvest Biol. Technol. 2024, 217, 113121. [Google Scholar] [CrossRef]

Figure 1. (a) Pendulum ball impact device; (b) Pendulum drop start position.

Figure 2. (a) View of damaged pear surface; (b) Healthy versus damaged tissue of pears.

Figure 3. SIRI actual system diagram.

Figure 4. AC and RT images of samples S1 and S2 under capture at different spatial frequencies.

Figure 5. Simulated Modulation Transfer Function (MTF) curves under different imaging systems.

Figure 6. Sample Acquisition, Demodulation and Segmentation Flowchart.

Figure 7. (a) MOGA-UNet architecture diagram, (b) MOGA module, (c) SE attention mechanism.

Figure 8. Segmentation results on AC images using multiple semantic segmentation models and the improved MOGA-UNet model.

Figure 9. Segmentation results on RT images using multiple semantic segmentation models and the improved MOGA-UNet model.

Figure 10. Model evaluation results for implicit damage detection of AC images using six segmentation models.

Figure 11. Model evaluation results for implicit damage detection of RT images using six segmentation models.

Figure 12. Loss profile of MOGA-UNet model during training set training.

Figure 13. mloU variation profile of MOGA-UNet model during training set training.

Table 1. Contrast indices (CIs) of RT images of samples under different spatial frequencies.

Sample	50	100	150	200	250
S1	0.228	0.405	0.611	0.550	0.502
S2	0.256	0.436	0.656	0.586	0.511

Table 2. Model evaluation metrics for six segmentation methods.

Network Structure	Image	mIoU/%	mPA/%	mPrecision/%	mRecall/%	F1/%
PSPNet	RT	80.83	83.45	94.47	83.45	88.61
	AC	79.88	83.05	94.09	82.55	87.94
DeeplabV3-plus	RT	88.78	92.59	94.46	92.95	93.69
	AC	86.50	91.88	94.10	91.88	92.97
DeeplabV3-DM	RT	88.71	92.75	94.58	92.75	93.65
	AC	87.05	92.55	93.40	93.12	93.25
HRNet	RT	89.49	92.73	95.63	92.73	94.15
	AC	89.39	92.42	94.76	92.67	93.70
UNet	RT	91.98	96.05	95.25	96.05	95.64
	AC	91.58	95.65	94.88	95.79	95.33
MOGA-UNet	RT	94.38	96.27	97.80	96.27	97.02
	AC	94.10	96.06	97.50	96.26	96.87

Table 3. Comparison with previous studies using SIRI for bruising detection.

Study	Method	Fruit Type	mIoU (%)	Dice (%)
Cai et al. (2025) [24]	SIRI + YOLOv7	Citrus	93.50	95.10
Zhang et al. (2024) [31]	SIRI + U-Net	Navel orange	90.30	–
This study	SIRI + MOGA-UNet	‘Korla’ pear	94.38	97.02

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhan, B.; Liao, J.; Zhang, H.; Luo, W.; Wang, S.; Zeng, Q.; Lai, Y. SIRI-MOGA-UNet: A Synergistic Framework for Subsurface Latent Damage Detection in ‘Korla’ Pears via Structured-Illumination Reflectance Imaging and Multi-Order Gated Attention. Spectrosc. J. 2025, 3, 22. https://doi.org/10.3390/spectroscj3030022

AMA Style

Zhan B, Liao J, Zhang H, Luo W, Wang S, Zeng Q, Lai Y. SIRI-MOGA-UNet: A Synergistic Framework for Subsurface Latent Damage Detection in ‘Korla’ Pears via Structured-Illumination Reflectance Imaging and Multi-Order Gated Attention. Spectroscopy Journal. 2025; 3(3):22. https://doi.org/10.3390/spectroscj3030022

Chicago/Turabian Style

Zhan, Baishao, Jiawei Liao, Hailiang Zhang, Wei Luo, Shizhao Wang, Qiangqiang Zeng, and Yongxian Lai. 2025. "SIRI-MOGA-UNet: A Synergistic Framework for Subsurface Latent Damage Detection in ‘Korla’ Pears via Structured-Illumination Reflectance Imaging and Multi-Order Gated Attention" Spectroscopy Journal 3, no. 3: 22. https://doi.org/10.3390/spectroscj3030022

APA Style

Zhan, B., Liao, J., Zhang, H., Luo, W., Wang, S., Zeng, Q., & Lai, Y. (2025). SIRI-MOGA-UNet: A Synergistic Framework for Subsurface Latent Damage Detection in ‘Korla’ Pears via Structured-Illumination Reflectance Imaging and Multi-Order Gated Attention. Spectroscopy Journal, 3(3), 22. https://doi.org/10.3390/spectroscj3030022

Article Menu

SIRI-MOGA-UNet: A Synergistic Framework for Subsurface Latent Damage Detection in ‘Korla’ Pears via Structured-Illumination Reflectance Imaging and Multi-Order Gated Attention

Abstract

1. Introduction

2. Materials and Methods

2.1. Sample Preparation

2.2. Structured Illumination Reflectance Imaging (SIRI) System and Image Acquisition

2.3. Image Demodulation and Processing

2.4. Experimental Procedure

2.5. Image Augmentation

2.6. Semantic Segmentation Network Model

2.6.1. SE Attention Mechanism

2.6.2. MOGA Module

2.6.3. VGG Encoder

2.7. Model Evaluation

3. Results and Discussion

3.1. Identification of Latent Damage in Xiang Pears via Image Segmentation

3.2. Model Evaluation After Image Segmentation

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI