Intelligent Classification of Soybean Threshing Mixtures Based on Edge Perception and CNN–Transformer Hybrid Architecture

Wang, Shiguo; Zhang, Caiyuan; Guo, Xiaohu; Fan, Chenlong; Chen, Xuegeng

doi:10.3390/agronomy16111086

Open AccessArticle

Intelligent Classification of Soybean Threshing Mixtures Based on Edge Perception and CNN–Transformer Hybrid Architecture

by

Shiguo Wang

¹,

Caiyuan Zhang

²,

Xiaohu Guo

³,

Chenlong Fan

²

and

Xuegeng Chen

^1,*

¹

College of Mechanical and Electronic Engineering, Shihezi University, Shihezi 832000, China

²

College of Mechanical and Electronic Engineering, Nanjing Forestry University, Nanjing 210037, China

³

Xinjiang Academy of Agricultural and Reclamation Science, Shihezi 832000, China

^*

Author to whom correspondence should be addressed.

Agronomy 2026, 16(11), 1086; https://doi.org/10.3390/agronomy16111086

Submission received: 13 April 2026 / Revised: 29 May 2026 / Accepted: 30 May 2026 / Published: 31 May 2026

(This article belongs to the Special Issue Artificial Intelligence and Human–Computer Interactions in Agricultural Production)

Download

Browse Figures

Versions Notes

Abstract

To address the low monitoring accuracy of traditional methods caused by the complex composition and severe signal overlapping of threshed materials during soybean threshing, this study proposes a high-precision impact signal classification system based on edge perception and a hybrid deep learning architecture, serving as a foundational step for threshing loss monitoring. At the hardware level, a high-speed parallel sensing system was developed to achieve continuous acquisition and high-fidelity mapping of transient impact signals. At the algorithmic level, a CNN–Transformer hybrid network was constructed to effectively extract local signal features and capture long-term temporal dependencies, successfully decoupling complex collision dynamics. Bench tests demonstrate that the hybrid model achieves a comprehensive classification accuracy of 97.36% and F1-scores above 0.96 for soybean grains, stems, and pods, significantly outperforming single networks. Furthermore, feature visualization confirms that the model effectively extracted features strongly correlated with the intrinsic impact dynamics of different materials rather than simply fitting environmental noise. This study provides a highly robust algorithm foundation and bench-level engineering reference for the intelligent classification of soybean harvesting mixtures, laying the groundwork for actual loss estimation under real field conditions.

Keywords:

soybean threshing loss; edge perception; CNN–Transformer; piezoelectric sensor; transient impact signal

1. Introduction

Soybeans have become one of the core cash crops in the global agricultural supply chain as global demand for plant protein and edible oil continues to surge [1,2,3]. Mechanized combined harvesting is a key link to ensure production efficiency in modern soybean intensive production [4,5,6]. Recently, significant advancements in agricultural breeding have introduced new soybean varieties adapted to regional characteristics, featuring high resistance to diseases and pod shattering. While these non-splitting hybrids effectively minimize natural pre-harvest losses, the resilient nature of these pods requires aggressive mechanical forces during the combine threshing and cleaning process to separate the grains. Consequently, the complex dynamics of mechanically breaking these tough pods still make grain loss during harvesting a core bottleneck restricting the maximization of soybean yield [7,8,9,10]. According to statistics, the loss during the threshing process often has a significant impact on the final yield [11,12,13]. In the context of the rapid development of precision agriculture, achieving dynamic closed-loop regulation of the operation parameters of combine harvesters is an inevitable trend to reduce harvest losses [14]. The prerequisite for this closed-loop control is to build an online monitoring system for threshing losses with high real-time performance and high precision [15,16].

In recent years, contact sensors based on acoustic and piezoelectric effects have gradually become the mainstream technical route for grain loss monitoring due to their fast response speed and strong adaptability to harsh environments [17,18,19,20]. However, the by-products from the soybean threshing process are extremely complex in composition, including not only high-hardness soybean grains but also a large number of pods, stems, and leaves with elastoplastic characteristics [21,22]. When these materials with distinct physical properties strike the sensitive element in a dense continuous flow state, they generate highly non-stationary and highly overlapping complex shock vibration signals [23,24]. Conventional fixed-threshold methods based on time-domain peaks or frequency-domain energy have difficulty precisely stripping grain characteristics from complex background mechanical noise and multi-material overlapping signals, resulting in a sharp decline in monitoring accuracy under high impurity field conditions [25,26].

The development of deep learning technology in the field of signal recognition provides a new approach to solving this problem and has been widely applied to loss detection technology systems for crops such as corn and rice [27,28]. Li et al. extracted the features of collision signals of different corn materials in the time domain and frequency domain, and constructed a grain loss identification model using machine learning algorithms, which achieved a classification accuracy of 91.09% for corn grains and their mixtures [29]; Jin et al. integrated the time-domain signals of grains with the machine learning model to develop a novel grain threshing loss detection system, which achieved a prediction accuracy of 96.15% in the validation set [30]; and Dong et al. presented a grain loss sensor based on the piezoelectric effect and adaptive neural fuzzy inference system (ANFIS). However, while these studies demonstrate the potential of data-driven methods, they predominantly rely on shallow networks or traditional machine learning frameworks that require manual feature extraction. More importantly, it remains a critical challenge for these existing approaches to simultaneously capture the local transient impact features (e.g., steep rising edges of rigid grain collisions) and the long-range temporal dependencies (e.g., continuous rolling and elastoplastic damping of stems and pods). Consequently, when faced with highly non-stationary and multi-material overlapping signals under complex field conditions, existing single-scale models are inadequate at precisely decoupling complex collision dynamics, leading to severe feature confusion and a decline in monitoring accuracy.

Based on these challenges, this study proposes and builds a high-precision monitoring system for soybean threshing loss based on high-speed edge perception and CNN–Transformer dual-modal deep learning architecture. The core contribution of this study lies in:

(1): At the hardware level, the traditional microcontroller serial architecture was abandoned, and an FPGA-based heterogeneous parallel sensing system for edge perception was developed, achieving a high-fidelity acquisition of high-frequency transient impulse signals and time–frequency dual-track non-blocking analysis;
(2): At the algorithmic level, a hybrid network architecture that deeply integrates the local receptive field (CNN) with the global self-attention mechanism (Transformer) was designed, and for the first time, cross-time-scale joint modeling of physical features was achieved in the field of soybean loss monitoring;
(3): In terms of mechanism verification, an explainable AI analysis based on 1D Grad-CAM and Attention Maps was introduced to visually reveal the intrinsic learning logic of the deep learning model for the dynamics of polymorphic material impact. This study aims to provide a robust algorithmic and hardware foundation for the intelligent classification of soybean harvesting mixtures. It should be noted that this research deliberately focuses on a bench-scale environment as a necessary scientific prerequisite. Because actual field harvesting involves extreme background noise and uncontrolled multi-material flow, directly testing a novel deep learning architecture in the field makes it difficult to isolate algorithmic capabilities from mechanical interference. Therefore, this controlled bench-scale study is designed to strictly validate the underlying signal decoupling mechanisms, laying the essential and reliable groundwork for future field-level loss monitoring systems.

2. Materials and Methods

2.1. Development of the Signal Acquisition System

In a real soybean combined harvest threshing operation, the mechanical collision process between the threshing material (soybean grains, pods, stems, etc.) and the sensor is extremely short, and the piezoelectric signal excited by it has typical high-frequency, transient, and non-stationary characteristics. Traditional serial acquisition terminals based on microcontrollers (MCUs) are prone to timing jitter and “frame drop” of critical features due to bus throughput bottlenecks when faced with such high-density continuous impacts. To this end, this study abandons the traditional serial data acquisition scheme and innovatively constructs an edge perception high-speed sensing architecture based on 460 (FPGA), aiming to guarantee the high-fidelity acquisition and real-time analysis of transient impulse signals from the hardware bottom layer.

The proposed system adopts a distributed edge-computing paradigm, partitioned into an edge perception node and an edge inference node. The perception node, based on an FPGA hardware platform, performs high-speed raw signal conditioning, real-time decimation, and adaptive threshold triggering to extract impact events at the source. The inference node, represented by the host computer, executes the CNN–Transformer hybrid model for complex material decoupling. This distributed synergy ensures that high-frequency sensor data are processed with zero-loss at the hardware edge, while the sophisticated deep learning decision-making is offloaded to a locally situated edge server on the harvester.

2.1.1. Overall Hardware Architecture

The overall hardware of the system encompasses the physical sensing layer, the high-speed digital mapping layer, the edge perception core layer, and the human–computer interaction layer, as shown in Figure 1. In the physical sensing layer, the customized piezoelectric sensitive board array serves as the electro-mechanical conversion hub, converting the microscopic mechanical dynamic characteristics of material collisions without loss into weak charge signals. Subsequently, the signals are integrated into the high-precision AD conversion matrix to achieve the instantaneous mapping of analog signals to high-dimensional discrete digital sequences.

The piezoelectric ceramic sheet with a diameter of 50 mm selected in this system is a core sensing element based on the positive piezoelectric effect. This element is made of brass as the metal vibration substrate and is combined with the piezoelectric ceramic layer. It has the characteristics of high sensitivity and stable resonant properties. Based on the mechanism of the positive piezoelectric effect, the weak vibration signals generated by the grains colliding with the sensitive element during threshing can be converted into charge signals. Subsequently, the signals are digitized through an analog-to-digital converter, and the data acquisition and real-time analysis are completed by the FPGA main control unit. The processed characteristic parameters are transmitted to the upper computer to achieve real-time monitoring of grain loss. The detailed parameters of the piezoelectric ceramic sheet used in this device are shown in Table 1.

As shown in Figure 2, this device is mainly composed of a protective housing, an FPGA processing unit, an analog-to-digital (A/D) conversion module, a communication module, and the piezoelectric sensitive plate assembly. During operation, soybean seeds impact the sensitive plate. The piezoelectric element converts the mechanical impact into an electrical signal, which is digitized by the A/D converter and sent to the FPGA processing unit. The loss rate of seed inclusion is calculated through the monitoring model, and finally the result is transmitted back through the communication module and displayed in real time at the terminal.

2.1.2. Software Flow

In response to the frequent material impact conditions in the threshing process, the system developed heterogeneous parallel processing links based on finite state machines (FSMs) on the underlying logic processing (as shown in Figure 3). The link fully exploits the advantages of FPGA multi-threaded independent operation and FIFO (First In First Out) cross-clock domain data interaction, dividing the data processing lifecycle into three highly coordinated scientific stages.

Before each job task starts, the system enforces a flush operation of the internal FIFO cache pool at the hardware level to thoroughly clear any possible residual noise signals and prevent historical data from causing “Artifacts” interference to the current weak shock signal. At the same time, the FPGA sends a high-steady-state synchronous clock reference to the peripheral AD chip and communication module to ensure nanosecond-level coordination across the entire physical link at the same time scale.

After completing the high-speed polling of the AD converter, the collected high-frequency 16 bit digital sequence (sampled at a frequency of 200 kHz) is replicated with zero delay and injected into two separate processing buses to achieve hardware-level parallelism of “time-domain fidelity” and “frequency-domain resolution” (detailed circuit schematics of the acquisition and peripheral modules are provided in Figure 4 and Figure 5 for technical reference). Link 1 is a high-fidelity encapsulation of the time-domain signal: the original signal stream is pushed to the dedicated FIFO cache, where hardware logic is used to complete the serial-to-parallel conversion and bit-width reconfiguration. This link does not perform any lossy compression on the original signal amplitude. Its purpose is to provide the upper-level machine’s deep learning model with high-fidelity raw waveform data, in order to preserve its temporal dynamic characteristics. Link 2 is an instantaneous mapping of frequency-domain energy: considering that the frequency distribution of the impact signal is an important prior feature for distinguishing different materials, the FPGA system performs a fixed-length (1024 points) hardware-level fast Fourier transform (FFT) on the signal stream by calling internal high-speed DSP slice resources. This transforms the complex domain matrix operation, which would otherwise be extremely CPU-intensive, into an instantaneous hardware pipeline operation that outputs the amplitude characteristic distribution of the vibration signal over a wide frequency band in real time.

While maintaining high data throughput, the FPGA implements asynchronous responses to instructions from the host computer through independent communication threads. Whether it is the dynamic setting of the target frequency threshold or the audio warning of sudden failures, all computations and feedback are completed within the independent clock domain, fundamentally eliminating the interruption of the core sampling task caused by the human interaction process seizing resources.

2.1.3. Hardware-Level High-Fidelity Signal Closed-Loop Verification

To verify the data fidelity and link reliability of the aforementioned edge perception architecture from a measurement science perspective, a closed-loop test platform based on a standard signal generation mechanism was constructed. During the test process, high-frequency coaxial cables were used to physically short-circuit the DA output channel controlled by the FPGA and the core AD sampling input channel. The digital oscillator (NCO) inside the FPGA generated a standard sinusoidal excitation signal with precise adjustable frequency and amplitude, which was converted into an analog signal by DA and received, parsed, and packaged by the acquisition module. It was then transmitted via Ethernet to the upper computer for storage. The physical and visual data of the grain loss rate monitoring device collector and the loop test are shown in Figure 6.

The data comparison analysis indicates that the recovered high-frequency sine waveforms of the system have a high degree of consistency in phase continuity and amplitude accuracy with the original transmitted signal, and no phase truncation or sampling distortion phenomena were observed. This closed-loop experiment eliminated the systematic errors of the measurement system and verified that the heterogeneous parallel architecture has good temporal stability and data loss-free transmission capability when processing high-frequency continuous analog signals, laying a physical measurement foundation for the subsequent acquisition of real debris impact datasets.

2.2. Bench Design and Testing

Prior to system verification, the physical properties of the materials and the experimental boundary conditions were strictly quantified to ensure the robustness and generalizability of the collected impact signals. The experimental samples utilized the “Xinzhendou No. 1” soybean variety. The moisture content, which fundamentally dictates the biomechanical impact characteristics, was precisely calibrated using the standard oven-drying method (105 °C, 24 h) across 5 independent groups. The average moisture contents were measured at 12.82% for seeds, 26.45% for main stems, and 9.14% for pods. This specific physiological gradient—characterized by high-elasticity stems and brittle seeds/pods—provides the critical physical basis for the subsequent vibration signal decoupling.

In order to accurately reproduce the complex mechanical collision process involving soybean seeds, pods, stems, and sensors in the threshing section of the soybean combine harvester under controlled conditions and to obtain a high-fidelity dataset for deep learning model training, this study independently built a threshing impact dynamics simulation test bench and conducted systematic signal acquisition experiments.

The core design objective of the test bench was to simulate the impact kinetic energy, incident angle, and continuous flow characteristics of the material in real field operations to the greatest extent. As shown in Figure 7, the mechanical structure of the test bench is tightly coupled with a lower-level machine signal sensing link. During the test, a simulated transmission mechanism and feeding diversion device directed soybean grains and impurities (pods, stems) to randomly and successively strike the terminal piezoelectric plate. To ensure valid signal capture without severe sensor saturation, the material flow density was meticulously controlled. The conveyor belt speed was regulated at 1.0 m/s, delivering a continuous material feed rate of approximately 100 g/s. Under this dynamic feeding setup, the terminal impact speeds of the materials were quantitatively constrained within the range of 1.5 to 2.8 m/s. Because the materials naturally tumble and disperse during this dynamic free-fall process, the impact angles form a stochastic, continuous range (effectively 0° to 90° relative to the sensor plane). To ensure experimental repeatability, the signal acquisition was conducted across multiple independent testing batches, eventually constructing a robust dataset of 2270 valid impact samples. By utilizing this setup, the collected dataset faithfully incorporates the random, multi-angle collision dynamics required for robust classification.

At the moment when the materials collide with the sensitive plate, the piezoelectric sensitive element responds to the transient stress waves generated by the collision at a high frequency, and converts them without loss into weak charge signals. To ensure the precise capture of the energy envelope of extremely short-time collisions, especially the high-frequency steep rising edge produced by the grain collision, the system’s digital acquisition channels operate continuously at an ultra-high sampling frequency to complete high-precision analog-to-digital conversion.

The sequence of digital signals excited by material collisions is packaged in zero-latency format within the FPGA and then pushed at high speed to the host computer measurement and control terminal via the Gigabit Ethernet interface. The host computer software performs real-time parsing, noise reduction preprocessing, and serialization reconstruction of the received high concurrency data stream, and finally converts it into a one-dimensional time-domain waveform sequence that can be directly read by the algorithm and stores it online.

To ensure the accuracy of the impact signals, a systematic sensor calibration was performed prior to the experiments. A dynamic zero-point calibration was implemented within the FPGA logic to suppress environmental noise. Additionally, the response sensitivity of the piezoelectric plate was verified using standardized impact tests with known masses, confirming a linear relationship between the mechanical kinetic energy and the digitized signal amplitude before data collection.

As shown in the typical impact waveform captured in Figure 8, the measurement and control link successfully capture the sudden energy release at the moment of material collision and the subsequent damping ring-down process. This high-fidelity closed-loop mapping from “microscopic physical shock” to “macroscopic digital sample” not only validates the reliability of the bench system and the measurement and control architecture, but also lays a solid data quality foundation for the subsequent construction of large-scale multimodal datasets and training of CNN–Transformer hybrid models.

Beyond the qualitative closed-loop verification, it is imperative to establish the quantitative system-level metrics that support the real-time edge-computing claims. During continuous operation, the acquisition node maintains a strict sampling rate of 200 kHz with a 16 bit depth, which inherently generates a stable raw data throughput of 3.2 Mbps per channel. To rigorously quantify the analog front-end stability, standard calibration signals were injected using a precision Arbitrary Waveform Generator (AWG) and monitored via a high-bandwidth digital oscilloscope. Over 1000 independent testing cycles, the system demonstrated exceptional consistency, maintaining a baseline signal-to-noise ratio (SNR) exceeding 85 dB and an amplitude error of less than 0.5% (specifically, 0.41 ± 0.05%). Furthermore, the data transmission reliability and computational responsiveness were verified through a continuous 2 h stress test operating at the peak sampling rate. Crucially, the integration of hardware-level ping-pong FIFOs enables direct memory access (DMA), ensuring a robust 0% packet loss rate even under this peak continuous data flow. To evaluate end-to-end latency, FPGA-generated hardware timestamps were synchronized with host-side reception timestamps. The results indicate that the FPGA edge-preprocessing and decimation latency is bounded under 2 milliseconds. Transferring the aligned sequence to the host GPU (NVIDIA RTX 3090, ASUS Computer (Shanghai) Co., Ltd., Shanghai, China) and executing the CNN–Transformer inference requires approximately 5 to 15 milliseconds depending on the operating system scheduling. Overall, the system guarantees an aggregated end-to-end computational response time of strictly less than 50 milliseconds (with a tested average of 42.3 ms, σ = 2.1 ms). Considering the typical mechanical inertia and operational speed of agricultural machinery, this sub-50 ms latency decisively meets the rigid requirements for non-blocking, online parameter regulation in high-speed harvesting environments.

2.3. Development of Deep Learning Models

2.3.1. Model Architecture Design

The by-products in the threshing process of the soybean combine harvester are complex in composition, mainly including soybean grains, stems, and pods. When materials with distinct physical properties such as mass, hardness, and geometric shapes collide with the piezoelectric sensitive plate, the differences in their collision dynamics are directly mapped to significant variations in the form of electrical signals. The system successfully collected and reconstructed the shock vibration time-domain waveforms of three typical materials. High-frequency transient impact characteristics of soybean grains: Soybean grains have a high density, high surface hardness, and are nearly ellipsoidal in shape. When they strike the sensitive plate, they mainly experience approximately elastic impacts. There is an extremely rapid transfer of kinetic energy during this process. As shown in Figure 9, the impact waveforms of the grains exhibit typical transient high-frequency responses, characterized by “steep rising edges” and “extremely high instantaneous amplitudes”. The energy is highly concentrated at the moment of contact, followed by a rapid exponential damping attenuation under the restoring force of the plate, and the dwell time of the waveform is extremely short. Elastoplastic wideband oscillation characteristics of the soybean stem: The soybean stem is cylindrical, and although it has a certain stiffness, the porous fibrous structure inside determines that the impact process is accompanied by elastoplastic deformation. This deformation absorbs and dissipates some of the kinetic energy, resulting in a longer contact time between the stem and the sensitive plate. In terms of the waveform, the impact amplitude of the stem is generally lower than that of the grain, its peaks are wider, and because the non-centroid impact of the long strip structure often causes secondary vibration, the waveform shows a relatively slow damping ring-down process, and the envelope line is wider. The hollow inelastic impact characteristics of soybean pods: The pods are light in weight, flat in structure, hollow inside, and not only have a large air damping ability, but also are mostly highly inelastic impacts when hitting the sensitive plate. The initial energy of a single impact is small, and due to its irregular shape, it is prone to rolling, sliding, and other multi-point continuous contacts on the surface of the sensitive plate. As reflected in the time-domain waveform, the amplitude of a single impact is the lowest, but when complex collisions occur, a broad, dense, and relatively chaotic low-frequency vibration envelope is formed, and the regularity of the signal is weaker than that of grains and stems.

In summary, the significant differences among the three materials in impact amplitude, peak slope, wave width (contact time), and attenuation envelope constitute the natural “fingerprint characteristics” that distinguish their physical categories. However, in the actual dense working environment, these waveforms are very likely to overlap with each other. This complex, non-stationary time series, which contains rich local morphologies and long time-domain evolution patterns, is difficult to precisely segment using traditional single-threshold methods. This provides a solid physical basis and data processing necessity for the subsequent introduction of deep learning hybrid architectures that can both extract local geometric features (CNNs) and capture long-range context dependencies (Transformers).

All the experimental data were derived from bench tests simulating soybean harvesting and threshing operations. Through real-time collection by piezoelectric sensors, the transient impact vibration signals of three typical threshed materials, namely grains, pods, and stems, colliding with the sensitive plate were accurately captured. During the data input phase, a strict and consistent data threshing and standardization strategy was implemented to eliminate the interference of the original signal due to environmental noise or sensor drift and to ensure that all models learned under the same feature distribution.

It is important to clarify that the deep learning models exclusively utilize the 1D time-domain waveforms. In the sample generation phase, to effectively filter out slow environmental baseline drifts, the system employs a dynamic gradient-based triggering mechanism rather than a static amplitude threshold. Specifically, an active collision event is registered when the differential change between consecutive sampling values exceeds a threshold of 500 raw ADC digital counts (LSBs), which accurately captures the steep rising edge of a material impact. The selection of this specific threshold was guided by preliminary sensitivity analysis. During empty-load calibration, the baseline background noise (comprising belt vibrations and electrical static) consistently fluctuated under ±200 ADC counts. A sensitivity evaluation revealed that lowering the threshold (e.g., 300) induced frequent false-positive triggers from background mechanical noise, whereas elevating it (e.g., 700) resulted in false negatives, causing the system to overlook the extremely weak low-energy impacts of empty pods. Thus, the 500-count threshold serves as a robust 2.5× safety margin, providing an optimal balance between sensitivity and noise immunity.

The system then continuously extracts the subsequent signal until the variation amplitude shrinks back to within 500, marking the natural end of the elastoplastic damping envelope. Due to the varying biomechanical properties of the materials, the contact durations—and consequently the raw sampling lengths (N points) of these dynamically extracted events—are inherently variable. To satisfy the strict uniform tensor dimension requirements of the deep learning framework, a 1D linear interpolation (resampling) algorithm was applied. This method systematically normalizes the temporal axis, mathematically mapping the variable-length raw sequence into exactly 100 discrete time steps (1 × 100 input tensor). This preprocessing protocol ensures that the overall macroscopic morphological envelope (the structural “shape” of the impact) is losslessly preserved, while neutralizing the inconsistencies caused by absolute temporal duration differences. All the extracted sequences were processed using the Z-Score normalization method. To strictly prevent data leakage, the normalization scaler was fitted exclusively on the training set and subsequently applied to transform the validation and test sets.

After statistics and screening, the final constructed complete dataset contained a total of 2270 valid impact samples, including 510 soybean grain samples, 885 stem samples, and 875 pod samples. This deliberate class imbalance reflects the varying signal complexities of different materials. As mentioned earlier, the transient impact characteristics of soybean grains are extremely significant and regular, and the model can effectively capture their core features with a relatively small number of samples. The collision process between the stem and the pod is often accompanied by complex elastoplastic deformation and multi-point rolling contact, and the waveforms show variable broadband envelopes and non-stationary features, significantly increasing the difficulty of feature extraction. As a result, more abundant stem and pod samples were retained in the dataset, aiming to force the model to fully learn the complex intra-class feature distribution and enhance overall generalization and anti-interference capabilities. To strictly prevent any potential data leakage from continuous acquisition streams, the dataset was divided chronologically based on a strict hold-out validation principle. Specifically, the continuous time-series data was segmented sequentially prior to sample extraction: the first 80% of the timeline was allocated as the training set, and the final 20% of the timeline was strictly isolated as the test set. The signal windowing and extraction were performed independently within these isolated time blocks, ensuring zero temporal overlap between training and testing data. This effectively treats the test set (comprising 454 independent impact samples: 102 seeds, 177 stems, and 175 pods) as an independent operational session. Furthermore, to ensure a fair and absolute benchmark for horizontally comparing the four distinct deep learning architectures, all models were evaluated on this exactly identical, unseen subset. This fixed-set approach eliminates the performance variance introduced by data shuffling, while the sample size of 454 provides a robust basis for calculating statistical confidence intervals.

In this study, four different mechanisms of deep learning architectures were designed to explore the analytical capability of different feature extraction mechanisms for shock signals while maintaining the consistency of input and output dimensions. The rationale for this selection is that conventional machine learning algorithms (e.g., SVM, Random Forest) typically require manual feature engineering, which acts as a lossy information bottleneck. The collision dynamics between rigid grains and elastoplastic impurities are highly non-stationary, with crucial classification cues often hidden in fine-grained transient spikes and non-linear damping envelopes. Manual feature extraction inevitably averages out or discards these microscopic temporal details. Therefore, to ensure a strictly fair evaluation based on completely lossless information representation, this study focused on deep learning paradigms that could directly ingest raw waveforms. The comparative claims of the proposed hybrid model are strictly confined to the domain of end-to-end deep learning baselines, focusing on evaluating its structural efficacy in automated raw signal feature extraction. This allows for a rigorous “apples-to-apples” assessment of the structural limits of automated feature extraction without the bias of human-crafted information loss. First, a long short-term memory network (LSTM) based on a cyclic mechanism was constructed, which uses a bidirectional structure to capture both the forward evolution and backward dependency of shock signals in the time dimension, with a focus on representing the dynamic temporal features of the sequence. Secondly, a one-dimensional convolutional neural network (CNN) was designed, which uses convolutional kernels to perform sliding operations on the time axis in conjunction with pooling layers to extract local morphological features of the signal, such as the slope, width, and inflection point information of the peaks, with a focus on capturing local geometric features. The third architecture is the Transformer model based on the self-attention mechanism, which abandons the loop structure and computes the correlation strength between any two time points in the sequence in parallel through the multi-head attention mechanism, aiming to overcome the long-distance dependency problem and establish the feature representation of the global context. Finally, this study proposes and constructs a CNN–Transformer hybrid model that uses a concatenated design, with convolutional layers as feature extractors at the front end for dimensionality reduction and denoising of the original high-dimensional signal, and Transformer encoders at the back end for deep temporal modeling of the abstracted feature sequence. The aim is to combine the local inductive bias ability of convolutional networks with the global modeling advantage of Transformers.

2.3.2. Training Methods and Processes

The network directly ingests the 1D fixed-length raw signal window of size 1 × 100. The front-end CNN module consists of two extremely lightweight sequential 1D convolutional blocks. The first block applies a Conv1D layer (kernel size = 3, padding = 1, out_channels = 3) followed by a ReLU activation and an AvgPool1D layer (kernel size = 2). The second block further extracts features using another Conv1D layer (kernel size = 3, padding = 1, out_channels = 6), followed by ReLU and AvgPool1D. This progressive downsampling compresses the temporal length to 25 while keeping the parameter count minimal for edge devices. The flattened feature maps are then linearly projected to a 64-dimensional space and combined with learnable Positional Encodings before entering the Transformer module. The back-end Transformer consists of 2 stacked Encoder layers, each equipped with Multi-Head Self-Attention (4 attention heads) to capture global secondary impact logic. Finally, a Global Average Pooling (GAP) layer compresses the sequential output, which is fed into a Multilayer Perceptron (MLP) classifier head (comprising a linear layer of 32 units, ReLU, Dropout with p = 0.2, and a final output linear layer) to predict the probabilities for the three target categories. The detailed dimensional transformations are summarized in Table 2.

This study unified all experimental environments and hyperparameter configuration logic. All models were built on the PyTorch (2024.1.3) deep learning framework and accelerated by CUDA. During the optimization process, the AdamW optimizer was uniformly used for parameter updates with a weight decay of 0.01, and the CrossEntropy Loss function was used as the sole criterion for measuring the difference between the predicted distribution and the real label. To mitigate overfitting, a dropout rate of 0.2 was applied to the network. The batch size was uniformly set to 64 during training, and the total number of training epochs was set to 30. An initial learning rate of 0.001 was adopted, coupled with a ReduceLROnPlateau scheduler (factor = 0.5, patience = 5) for dynamic tuning, to ensure that each model could be trained on its optimal convergence trajectory. The entire experimental process strictly follows the norms of learning parameters on the training set and evaluating performance on the test set, focusing on the generalization ability of each model on unseen data, and quantitatively analyzing model performance through multiple dimensions of metrics such as confusion matrix, accuracy, precision, recall, and F1 score.

3. Results

Based on the evaluation on the same test set, the four models showed clear and progressive differences in classification accuracy, revealing fundamental differences in the interpretation of soybean material impact signals by different network structures.

First, as shown in Table 3, the LSTM model used as the benchmark performed poorly, with an overall test set accuracy of only 59.69%. A deeper analysis of its confusion matrix and performance metrics reveals that the model has serious flaws in identifying “pod” signals. As shown in Figure 10, the model correctly identified only 60 pod samples but misjudged 106 of them as “stems”, resulting in an extremely low recall rate (0.34) for the pod category. Although LSTM is good at handling time dependencies, its serial processing mechanism struggled to capture the distinguishable short-term morphological features in these transient, non-stationary shock signals. This leads to the formation of ambiguous decision boundaries in the feature space and a tendency to get stuck in local optima, ultimately failing to effectively distinguish between pod and stem signals with similar physical properties.

In sharp contrast, as shown in Table 4, the overall accuracy of the CNN model jumped to 78.41%. This result strongly suggests that local waveform features—such as the shape, slope, and width of the impact peak—are crucial for distinguishing different soybean materials. As shown in Figure 11, the CNN model correctly identified 171 instances of “pods” and achieved a recall rate of 0.98, largely correcting the failure of LSTM. However, this high sensitivity to local features also had a side effect: it misjudged 61 “stem” samples as “pods”, resulting in a precision of only 0.73 for the “pod” category. This indicates that while CNNs are good at capturing shock “fingerprints,” their inability to extract features from the global temporal context leads to confusion when facing signals with similar local shapes.

As shown in Table 5 and Figure 12, the introduction of the Transformer model further boosts accuracy to 85.24%, highlighting the importance of global context modeling. The model demonstrated excellent balance, with F1 scores of 0.85, 0.86, and 0.84 for its three categories, indicating a significant reduction in performance differences among the categories. The multi-head self-attention mechanism of the Transformer not only focuses on local variations but also computes in parallel the relationship between any two points in the sequence. This global perspective effectively corrects the obfuscation that occurs in CNN models, improves precision while maintaining high recall rates, and demonstrates that capturing long-range dependencies is decisive for eliminating inter-class misjudgments.

Most notably, the CNN–Transformer hybrid model proposed in this study demonstrated the best comprehensive classification performance, achieving a maximum accuracy of 97.36%. As shown in Table 6 and Figure 13, the model performed well on all metrics, as can be seen from its confusion matrix and performance table. The F1 scores for pods, stems, and grains reached 0.97, 0.96, and 0.97, respectively, and the precision and recall rates for each category remained stable above 0.96. This result profoundly reveals the complementary advantages of the hybrid architecture: the front-end CNN effectively acts as a learnable feature filter, smoothing the high-frequency noise in the original signal and transforming sparse data into compact advanced feature representations, significantly reducing the difficulty of subsequent processing. The Transformer at the back end precisely constructs temporal logic in this pure and semantic-rich feature space. The results confirm that for sensor signals of this type that have both transient morphological features and temporal evolution patterns, a hybrid architecture combining local inductive bias and global context modeling is currently a highly effective approach, capable of achieving high-precision and high-robustness identification of various material components during soybean harvesting.

To comprehensively justify the significant performance improvement of the hybrid architecture (97.36%) over the baseline methods, a systematic comparison of their feature extraction mechanisms is essential. The LSTM baseline (59.69%) processes data sequentially, making it susceptible to information decay when handling the long-term damping envelopes of stems and pods, leading to its severe misclassifications. The pure CNN baseline (78.41%) excels at anchoring local transient peaks due to its weight-sharing mechanism, but its restricted receptive field completely misses the global temporal context, causing feature confusion between materials with similar local impact edges. The pure Transformer (85.24%) successfully resolves the long-range dependency issue via self-attention; however, it lacks the local inductive bias required to efficiently filter out the dense high-frequency noise inherent in raw mechanical signals. The proposed CNN–Transformer elegantly bridges all these gaps. By utilizing the CNN as a front-end to denoise and extract robust local morphological features, it feeds a highly purified, low-dimensional semantic sequence into the back-end Transformer. The Transformer can then dedicate its multi-head attention mechanism exclusively to modeling the global collision logic (e.g., secondary rebounds and long-term rolling friction). This powerful algorithmic synergy is the fundamental theoretical justification for the hybrid model’s significant performance advantage in decoupling complex collision dynamics.

To provide statistical verification for these results, a 95% confidence interval (CI) was calculated for the hybrid model’s performance on the 454 independent test samples. The accuracy of 97.36% yields a 95% CI of [95.89%, 98.83%]. This narrow interval statistically confirms the structural robustness of the CNN–Transformer architecture and indicates that the high accuracy is significant and reliable.

Furthermore, to assess the training stability and rule out the variance of network weight initialization, the proposed CNN–Transformer model was repeatedly trained across five different random seeds. The model demonstrated excellent robustness, yielding an overall accuracy of 97.2 ± 0.2% (mean ± standard deviation), which confirms that its superior classification capability is structurally inherent rather than a result of stochastic optimization.

However, it is important to objectively interpret this performance. The exceptionally high accuracy represents the model’s theoretical performance limit under controlled bench conditions. In this experimental setup, the signal-to-noise ratio is relatively ideal, as it effectively isolates the complex background mechanical vibrations typically encountered in actual field harvesting (such as engine operations and terrain-induced chassis shaking). While these results firmly demonstrate the excellent feature extraction capability of the model, actual field applications will inherently introduce additional background noise. Therefore, collecting multi-condition field datasets for further model testing and calibration remains a critical direction for future research.

To further explore the intrinsic optimization mechanism of the best-performing CNN–Transformer hybrid model, this study provides an in-depth analysis of its training dynamics.

As shown in Figure 14, the loss graph shows the convergence process of the model over 30 training epochs. The training loss (blue line) and validation loss (red line) both showed a sharp decline in the first 10 cycles, indicating that the model learned quickly in the early stages. After this stage, the loss values continued to decline at a slower rate and began to stabilize in the later stages of training. Although the validation loss showed some fluctuations and was always slightly higher than the training loss, the overall downward trend confirmed that the model was learning effectively and converging normally without divergence.

The parameter space trajectory uses principal component analysis (PCA) to project the model’s high-dimensional parametric space onto a two-dimensional plane, tracing its trajectory from the start to the end of training. As shown in Figure 15, the graph shows that the model parameters undergo intense gradient updates in the early stages of training (such as from Start to E7 nodes), presenting large-scale leapfrog optimization in the parameter space. As the training cycle progresses, the parameter update steps converge and tend towards fine-tuning (for example, from “E19” to the red “End” marker), indicating that the optimizer has found a stable optimal solution region and is fine-tuning the parameter values. This convergence behavior is consistent with the trend observed in the loss curve, jointly verifying the success of the training process.

To explore whether the model’s decision-making process aligns with the physical boundaries of the impact events rather than merely fitting the environmental noise, interpretability analysis methods were introduced in this study. For the convolutional layers at the front end, one-dimensional gradient-weighted class activation mapping (1D Grad-CAM) techniques were used to visualize their local feature responses. For the Transformer encoder at the back end, the spatiotemporal distribution of self-attention weights was extracted and visualized, as shown in the Figure 16. As shown in the 1D Grad-CAM activation diagram in the left column, the CNN front end exhibits extremely strong “translation invariance”. Regardless of where the shock occurs on the timeline, the CNN’s convolutional kernels act as “local anchors,” precisely locking the highest activation weights onto abrupt peaks where the signal slope changes the most and the energy release is the most intense. This mechanism effectively filters out the redundant interference during the pre-trigger period and the collision decay period, ensuring the model’s anti-noise stability in non-stationary environments.

After the CNN has completed local extremum anchoring, the Transformer self-attention distribution in the right column further demonstrates its capability to correlate features across the entire temporal window. In the complex impact event of the extracted grain, two independent high-frequency impacts with very short time intervals were captured within the window. The Transformer not only focused on the impact peaks themselves, but also assigned extremely high attention weights to the periods between the two impacts. While this does not prove the model has learned the physical dynamics in a strict mechanistic sense, it indicates a strong spatial–temporal correlation: the model’s attention distribution closely aligns with the expected “fast collision–rebound–secondary contact” envelope characteristic of hard grains. This suggests the model successfully exploits these region-specific morphological variations as highly discriminative features, rather than simply peak fitting. Facing the long-term rolling friction of the stem, the Transformer’s attention is widely and uniformly spread across the continuous high-frequency oscillation range, which corresponds well with the extended contact time of the stem’s elastoplastic deformation. For the hollow and chaotic collisions of the pod, its attention weights effectively span multiple discrete, small-value collision groups. By synthesizing the wideband envelope patterns throughout the event window, the model successfully utilizes the overall temporal sequence of the pod’s multi-point irregular contacts as a classification basis.

The CNN front end overcame the randomness of the collision phase by using translation invariance to achieve precise localization of local transient energy; the back end of the Transformer, using global perception capabilities, successfully captured the temporal correlations associated with complex phenomena such as multiple impacts and secondary rebounding. Although this data-driven feature extraction highly coincides with theoretical collision mechanics, it fundamentally remains a statistical pattern recognition process rather than a true physical understanding. Nevertheless, the deep integration of the two modules demonstrates that the hybrid architecture accurately focuses on physically meaningful signal regions, providing a highly robust algorithmic foundation for high-precision online loss monitoring of soybean harvesters under real conditions of high impurity and high density.

In practical field applications, the CNN–Transformer hybrid model is designed to act as a highly precise digital filter. Every time a detected transient collision is positively classified as a “Seed”, a real-time loss event counter is incremented, while signals classified as stems or pods are discarded as impurities. By accumulating these valid seed counts over a predefined rolling time window, the system can calculate a Real-time Loss Intensity index (expressed as seeds per unit time) to drive the closed-loop regulation of harvesting parameters. However, it must be emphasized that the current stage of this research is primarily dedicated to solving the fundamental bottleneck of high-precision material classification. Accurately decoupling rigid seeds from elastoplastic impurities under complex noise is the critical prerequisite for any reliable monitoring system. The proposed evaluation successfully achieves this classification task, laying a highly robust algorithmic and data foundation. Translating and calibrating this classification output into an actual threshing-loss rate under dynamic field conditions will be the primary focus of subsequent research phase.

Beyond algorithmic performance, it is essential to translate these findings into concrete agronomic implications and practical applications. The system establishes an efficient collaborative pipeline: the FPGA edge-sensing node ensures the low-latency, non-blocking acquisition of complex transient impacts, avoiding the loss of high-frequency collision signals. Meanwhile, the CNN–Transformer hybrid model deployed on the host terminal handles the precise classification of threshed materials. Agronomically, by accurately distinguishing rigid grain impacts from elastoplastic stems and pods, the model completely filters out impurity interference, providing a highly robust data foundation for estimating the real-time grain loss rate. In practical harvesting scenarios, this reliable loss metric provides crucial feedback to the operator or an intelligent control system. If an abnormal spike in grain loss is detected, core parameters can be adjusted on the fly—such as reducing the threshing drum speed to minimize kernel damage or optimizing the cleaning fan volume to balance impurity separation and seed retention. Ultimately, this closed-loop regulation significantly reduces actual field losses, maximizing the harvestable soybean yield and delivering tangible economic benefits to agricultural production.

4. Conclusions

(1) To address the challenges of complex threshed material compositions and severe overlapping of high-frequency impact signals during soybean combine harvesting, this study constructed an impact signal classification system based on a high-speed edge perception sensing architecture and a CNN–Transformer dual-modal deep learning network. At the hardware level, the FPGA-based heterogeneous parallel processing effectively addressed the throughput bottlenecks of traditional microcontrollers when facing high-frequency transient impacts, ensuring synchronized acquisition and high-fidelity mapping of raw physical signals.

(2) Algorithm evaluations demonstrate that the proposed CNN–Transformer hybrid model exhibits superior comprehensive classification performance in distinguishing soybean grains, stems, and pods. It achieved an overall test accuracy of 97.36%, with F1-scores for all categories consistently exceeding 0.96. Compared with single LSTM, CNN, or Transformer models, the hybrid architecture successfully integrated local extremum extraction with long-range temporal modeling, providing a reliable solution for resolving feature confusion between elastoplastic impurities and rigid grains.

(3) Explainable artificial intelligence (XAI) analysis based on 1D Grad-CAM and self-attention weights revealed the intrinsic decision-making logic of the hybrid model. The front-end CNN demonstrated robust translation invariance and extremum anchoring capabilities; the back-end Transformer effectively captured complex temporal evolution patterns, such as the “collision–rebound flight–secondary contact” of grains and the “long-term damping oscillation” of stems. This provides evidence that the model successfully characterized the intrinsic collision dynamics of different materials rather than merely fitting environmental noise.

(4) This study offers a robust algorithmic and hardware foundation for classifying threshed materials of soybean harvesters. While the current validation is limited to bench-scale classification, the proposed system paradigm of “edge-computing perception + explainable hybrid deep learning” bridges the gap between raw signal acquisition and complex material decoupling. This provides valuable theoretical insights and paves the way for future field testing to realize actual threshing-loss estimation under high-noise, real-world harvesting conditions.

Author Contributions

S.W.: Writing—original draft, Writing—review and editing, Project administration, and Data curation. C.Z.: Data curation, Writing—original draft, and Writing—review and editing. X.G.: Resources and Supervision. C.F.: Formal analysis, Investigation, and Conceptualization. X.C.: Software, Formal analysis, and Visualization. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China (52565027), the National Natural Science Foundation of China (Project No. 52405277) and the Natural Science Foundation of Jiangsu Province (Project No. BK20230402).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Acknowledgments

This research was funded by National Natural Science Foundation of China (52565027), the National Natural Science Foundation of China (Project No. 52405277), and the Natural Science Foundation of Jiangsu Province (Project No. BK20230402).

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

dos Reis, J.G.M.; Aktas, E.; Machado, S.T. Soybean Supply Chains, Markets, and Global Trade. In Soybean Production Technology: Physiology, Production and Processing; Singh, K.P., Singh, N.K., T, A., Eds.; Springer Nature: Singapore, 2025; pp. 429–446. [Google Scholar]
Voora, V.; Bermudez, J.; Le, H.; Larrea, C.; Luna, E. Global Market Report|Soybean Prices and Sustainability; IISD Market Report 2(5); IISD: Winnipeg, MB, Canada, 2024. [Google Scholar]
Nair, R.M.; Boddepalli, V.N.; Yan, M.-R.; Kumar, V.; Gill, B.; Pan, R.S.; Wang, C.; Hartman, G.L.; Silva e Souza, R.; Somta, P. Global Status of Vegetable Soybean. Plants 2023, 12, 609. [Google Scholar] [CrossRef]
Wang, J.; Shan, C.; Gou, F.; Qian, Z.; Ni, Y.; Liu, Z.; Jin, C. A Review of Key Technologies and Intelligent Applications in Soybean Mechanized Harvesting: Chinese and International Perspectives. J. Biosyst. Eng. 2025, 50, 79–104. [Google Scholar] [CrossRef]
Son, J.; Lee, J.-H.; Kang, S.-H.; Choi, W.-Y.; Park, B.; Park, H.-G.; Ha, Y. A Comprehensive Review of Legume Crop Harvesting Machinery: Development, Intelligent Mechanization, and Future Prospects. Precis. Agric. Sci. Technol. 2025, 7, 369–387. [Google Scholar]
Lima, F.B.F.D.; Silva, M.A.F.D.; Silva, R.P.D. Quality of Mechanical Soybean Harvesting at Two Travel Speeds. Eng. Agríc. 2017, 37, 1171–1182. [Google Scholar] [CrossRef]
Chen, Y.; Tang, Z.; Li, B.; Wang, S.; Liu, Y.; Zhou, W.; Jing, J.; He, X. Analysis of Damage Characteristics and Fragmentation Simulation of Soybean Seeds Based on the Finite-Element Method. Agriculture 2025, 15, 780. [Google Scholar] [CrossRef]
Ni, Y.; Jin, C.; Chen, M.; Qian, Z.; Yang, T.; Xu, J.; Liu, G. Soybean Crushing Forms by Mechanical Harvesting and Factors Affecting the Proportions of Different Forms. Food Sci. Technol. 2023, 43, e111822. [Google Scholar] [CrossRef]
Jin, C.; Yan, K.; Guo, H.; Wang, T.; Yin, X. Experimental Research on the Influence of Threshing Roller Structures on the Quality of Mechanically-Harvested Soybeans. Nongye Gongcheng Xuebao 2021, 37, 49–58. [Google Scholar]
Liu, J.; Zhang, Y.; Jiang, Y.; Sun, H.; Duan, R.; Qu, J.; Yao, D.; Liu, S.; Guan, S. Formation Mechanism and Occurrence Law of Pod Shattering in Soybean: A Review. Phyton-Int. J. Exp. Bot. 2022, 91, 1327. [Google Scholar] [CrossRef]
Wang, J.; Ni, Y.; Liu, Z.; Gou, F.; Qian, Z.; Bai, H.; Jin, C. Design and Performance Testing of the 4GS-457 Low-Loss Soybean Combine Harvester Header. J. Biosyst. Eng. 2025, 50, 364–378. [Google Scholar] [CrossRef]
Liu, P.; Wang, X.; Jin, C. Research on the Adaptive Cleaning System of a Soybean Combine Harvester. Agriculture 2023, 13, 2085. [Google Scholar] [CrossRef]
Li, Y.; Xu, L.; Lv, L.; Shi, Y.; Yu, X. Study on Modeling Method of a Multi-Parameter Control System for Threshing and Cleaning Devices in the Grain Combine Harvester. Agriculture 2022, 12, 1483. [Google Scholar] [CrossRef]
Gu, X.; Tang, Z.; Wang, B. Sensor-Centric Intelligent Systems for Soybean Harvest Mechanization in Challenging Agro-Environments of China: A Review. Sensors 2025, 25, 6695. [Google Scholar] [CrossRef]
Jiang, L.; Wang, G.; Xu, B.; Husnain, N.; Wang, Q. Intelligent Systems for Combine Harvesters: A Comprehensive Review of Technologies and Trends. IEEE Access 2025, 13, 189074–189095. [Google Scholar] [CrossRef]
Zhang, W.; Guo, H.; Zhao, B.; Zhou, L.; Wang, F.; Wang, D.; Liu, Y. Full-Condition Monitoring and Intelligent Yield Prediction and Decision-Making Technology for Wheat Combine Harvesters. Int. J. Agric. Biol. Eng. 2025, 18, 202–211. [Google Scholar] [CrossRef]
Dong, J.; Zhao, S.; Zhang, A.; Meng, Z.; Wang, F.; Qin, W.; Li, M. Research Status and Trend of Grain Loss Monitoring Sensor Technology. INMATEH-Agric. Eng. 2025, 77, 238–250. [Google Scholar] [CrossRef]
Rossi, S.; Rubio Scola, I.; Bourges, G.; Šarauskis, E.; Karayel, D. Improving the Seed Detection Accuracy of Piezoelectric Impact Sensors for Precision Seeders. Part I: A Comparative Study of Signal Processing Algorithms. Comput. Electron. Agric. 2023, 215, 108449. [Google Scholar] [CrossRef]
Griffin, C.; Giurgiutiu, V. Piezoelectric Wafer Active Sensor Transducers for Acoustic Emission Applications. Sensors 2023, 23, 7103. [Google Scholar] [CrossRef] [PubMed]
Xu, L.; Wei, C.; Liang, Z.; Chai, X.; Li, Y.; Liu, Q. Development of Rapeseed Cleaning Loss Monitoring System and Experiments in a Combine Harvester. Biosyst. Eng. 2019, 178, 118–130. [Google Scholar] [CrossRef]
Tu, B.; Liu, C.; Zhang, Q.; Liu, X. Quantitative Differences in Pod Valve Composition Affect Shattering in Vegetable and Grain Soybean. Ital. J. Agron. 2025, 20, 100050. [Google Scholar] [CrossRef]
Ndeke, V.; Tembo, L.; Chigeza, G.; Akoroda, M. A Review of Factors Affecting Pod Shattering in Soybean (Glycine Max). Int. J. Plant Soil. Sci. 2024, 36, 659–668. [Google Scholar] [CrossRef]
Li, Y.; Zhihong, C.; Li, Y.; Lv, L.; Shi, M.; Shi, Y.; Yu, X. Sensor for Multi-Crop Cleaning Loss Monitoring in Grain Combine Harvesters. SSRN 2023. [Google Scholar] [CrossRef]
Zhou, L.; Yuan, Y.; Zhang, J.; Niu, K. Improving Design of a PVDF Grain Loss Sensor for Combine Harvester. In Proceedings of the Computer and Computing Technologies in Agriculture XI; Li, D., Zhao, C., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 208–217. [Google Scholar]
Dong, J.; Cui, T.; Zhang, D.; Yang, L.; He, X.; Xiao, T.; Li, C.; Xing, S.; Jiang, Y.; Wang, H. Design and Test of Real-Time Monitoring System for Maize Entrainment Loss Based on Piezoelectric Signal Classification. Measurement 2025, 242, 116050. [Google Scholar] [CrossRef]
Zhang, M.; Jiang, L.; Wu, C.; Wang, G. Design and Test of Cleaning Loss Kernel Recognition System for Corn Combine Harvester. Agronomy 2022, 12, 1145. [Google Scholar] [CrossRef]
Qu, H.-R.; Su, W.-H. Deep Learning-Based Weed–Crop Recognition for Smart Agricultural Equipment: A Review. Agronomy 2024, 14, 363. [Google Scholar] [CrossRef]
Wang, D.; Cao, W.; Zhang, F.; Li, Z.; Xu, S.; Wu, X. A Review of Deep Learning in Multiscale Agricultural Sensing. Remote Sens. 2022, 14, 559. [Google Scholar] [CrossRef]
Li, Y.; Tan, D.S.; Cui, T.; Fan, H.; Xu, Y.; Zhang, D.; Qiao, M.; Hou, Y.; Xiong, L. Design and Validation of Novel Maize Grain Cleaning Loss Detection System Based on Classification Models of Particle Time-Domain Signals. Comput. Electron. Agric. 2024, 220, 108908. [Google Scholar] [CrossRef]
Jin, M.; Zhao, Z.; Chen, S.; Chen, J. Improved Piezoelectric Grain Cleaning Loss Sensor Based on Adaptive Neuro-Fuzzy Inference System. Precis. Agric. 2022, 23, 1174–1188. [Google Scholar] [CrossRef]

Figure 1. Working principle diagram of the grain loss rate monitoring device.

Figure 2. Structural schematic diagram of the grain loss monitoring device. 1. Top cover; 2. FPGA processing unit; 3. AD7606; 4. AD9767; 5. Communication module; 6. Bottom shell; 7. Piezoelectric ceramic sheet; 8. Copper base; 9. Stainless-steel sensitive plate.

Figure 3. Flowchart of the grain loss rate monitoring procedure.

Figure 4. Schematic diagram of the AD7606 circuit.

Figure 5. Schematic diagram of the FPGA Artix-7 processing chip and peripheral circuit.

Figure 6. Physical data of the grain loss rate monitoring device collector and visualized data of the loop test.

Figure 7. Schematic diagram and physical image of the collision signal acquisition test bench.

Figure 8. Signal waveforms of the soybean grain impact-sensitive plate.

Figure 9. Time-domain waveforms of shock and vibration of three materials.

Figure 10. Confusion matrix of LSTM model.

Figure 11. Confusion matrix of the CNN model.

Figure 12. Confusion matrix of the Transformer model.

Figure 13. Confusion matrix of CNN + Transformer model.

Figure 14. Loss curve graph.

Figure 15. Parametric space trajectory.

Figure 16. Visualization of attention weights.

Table 1. Piezoelectric ceramic sheet parameter table.

Number	Parameter	Value
1	Resonance frequency	3.2 ± 0.3 KHz
2	Resonance impedance	≤200 Ω
3	Capacitor	42,000 ± 30% PF
4	Metal sheet material	Brass
5	Metal plate diameter	50 ± 0.1 mm
6	Ceramic sheet diameter	25 ± 0.2 mm
7	Metal plate thickness	0.18 ± 0.02 mm
8	Total thickness	0.45 ± 0.03 mm
9	Operating temperature range	−20~70 °C
10	Resonance frequency	Weld 0.15 × 7 multi-strand wires. After bearing a vertical force of 2.5 N and a horizontal force of 20 N, the silver layer shows no mechanical damage

Table 2. Layer-by-layer architectural details of the proposed 1D CNN–Transformer hybrid model.

Module	Configuration	Input Tensor Shape	Output Tensor Shape
Input	1D Raw Waveform	$(B, 1, 100)$	$(B, 1, 100)$
CNN Block 1	Conv1D ( $k = 3, s = 1, p = 1$ ), ReLU, AvgPool1D ( $k = 2, s = 2$ )	$(B, 1, 100)$	$(B, 3, 50)$
CNN Block 2	Conv1D ( $k = 3, s = 1, p = 1$ ), ReLU, AvgPool1D ( $k = 2, s = 2$ )	$(B, 3, 50)$	$(B, 6, 25)$
Projection	Linear Projection ( $6 \to 64$ ) + Positional Encoding	$(B, 25, 6)$	$(B, 25, 64)$

Table 3. Performance of LSTM models by category.

Categories	Precision	Recall	F1-Score
Accuracy	59.69%
Pod	0.63	0.34	0.44
Stem	0.51	0.71	0.59
Seed	0.76	0.83	0.79

Table 4. Performance of CNN models by category.

Categories	Precision	Recall	F1-Score
Accuracy	78.41%
Pod	0.73	0.98	0.84
Stem	0.77	0.63	0.70
Seed	0.95	0.72	0.82

Table 5. Performance of the Transformer model by class.

Categories	Precision	Recall	F1-Score
Accuracy	85.24%
Pod	0.78	0.95	0.85
Stem	0.93	0.79	0.86
Seed	0.90	0.79	0.84

Table 6. Performance of the CNN + Transformer model in various categories.

Categories	Precision	Recall	F1-Score
Accuracy	97.36%
Pod	0.98	0.97	0.97
Stem	0.97	0.96	0.96
Seed	0.95	0.99	0.97

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, S.; Zhang, C.; Guo, X.; Fan, C.; Chen, X. Intelligent Classification of Soybean Threshing Mixtures Based on Edge Perception and CNN–Transformer Hybrid Architecture. Agronomy 2026, 16, 1086. https://doi.org/10.3390/agronomy16111086

AMA Style

Wang S, Zhang C, Guo X, Fan C, Chen X. Intelligent Classification of Soybean Threshing Mixtures Based on Edge Perception and CNN–Transformer Hybrid Architecture. Agronomy. 2026; 16(11):1086. https://doi.org/10.3390/agronomy16111086

Chicago/Turabian Style

Wang, Shiguo, Caiyuan Zhang, Xiaohu Guo, Chenlong Fan, and Xuegeng Chen. 2026. "Intelligent Classification of Soybean Threshing Mixtures Based on Edge Perception and CNN–Transformer Hybrid Architecture" Agronomy 16, no. 11: 1086. https://doi.org/10.3390/agronomy16111086

APA Style

Wang, S., Zhang, C., Guo, X., Fan, C., & Chen, X. (2026). Intelligent Classification of Soybean Threshing Mixtures Based on Edge Perception and CNN–Transformer Hybrid Architecture. Agronomy, 16(11), 1086. https://doi.org/10.3390/agronomy16111086

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intelligent Classification of Soybean Threshing Mixtures Based on Edge Perception and CNN–Transformer Hybrid Architecture

Abstract

1. Introduction

2. Materials and Methods

2.1. Development of the Signal Acquisition System

2.1.1. Overall Hardware Architecture

2.1.2. Software Flow

2.1.3. Hardware-Level High-Fidelity Signal Closed-Loop Verification

2.2. Bench Design and Testing

2.3. Development of Deep Learning Models

2.3.1. Model Architecture Design

2.3.2. Training Methods and Processes

3. Results

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI