Next Article in Journal
Energy-Efficient and Adversarially Resilient Underwater Object Detection via Adaptive Vision Transformers
Previous Article in Journal
From Fingerprinting to Advanced Machine Learning: A Systematic Review of Wi-Fi and BLE-Based Indoor Positioning Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Conveyor Belt Deviation Detection for Mineral Mining Applications Based on Attention Mechanism and Boundary Constraints

1
School of Electrical and Information Engineering, Anhui University of Technology, Ma’anshan 243002, China
2
Anhui Masteel Mining Resources Group Co., Ltd., Ma’anshan 243071, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(22), 6945; https://doi.org/10.3390/s25226945
Submission received: 2 October 2025 / Revised: 5 November 2025 / Accepted: 11 November 2025 / Published: 13 November 2025
(This article belongs to the Section Industrial Sensors)

Abstract

To address the issue of material spillage and equipment wear caused by conveyor belt deviation in complex industrial scenarios, this study proposes a detection method based on an improved U-Net. The approach adopts U-Net as the backbone network, with a ResNet34 encoder to enhance feature extraction capability. At the skip connections, a Multi-scale Adaptive Guidance Attention (MASAG) module is embedded to strengthen the fusion of semantic and detailed features. In the loss function design, a boundary loss is incorporated to improve edge segmentation accuracy. Furthermore, the segmentation results are refined via edge detection and RANSAC regression, and a reference line is constructed based on the physical stability of rollers in the image to enable quantitative measurement of deviation. Experiments on a self-constructed dataset demonstrate that the proposed method achieves higher accuracy (99.77%) compared with the baseline U-Net (99.65%) and also surpasses other categories of approaches, including detection-based (YOLOv5s), anchor-point-based (UFLD), and segmentation-based approaches represented by SEU-Net and DeepLabV3+, thereby exhibiting strong robustness and real-time performance across diverse complex operating conditions. The results validate the effectiveness of this method in practical applications and provide a reliable technical pathway for the development of intelligent monitoring systems for mining conveyor belts.

1. Introduction

A mining conveyor belt is a type of equipment that employs a flexible belt body as the load-bearing component, while material transportation is achieved through the interaction of drive pulleys and idler rollers. The primary function of the conveyor belt lies in its ability to efficiently and continuously carry and transport bulk materials or unit goods, thereby meeting the demands of large-scale material handling in mining, metallurgy, and port operations [1,2]. Depending on the application scenario, conveyor belts can be extensively deployed in both surface environments and underground mines. A typical application is illustrated in Figure 1.
During long-term operation, conveyor belt systems frequently experience belt deviation, which refers to the lateral displacement of the belt from its designated central trajectory during motion. This phenomenon is highly common in continuous transportation systems such as those in mining, metallurgy, and port industries. If not detected and corrected in time, even minor deviations can lead to material spillage and equipment wear, while severe cases may cause belt tearing, roller-frame damage, or other safety hazards that threaten the continuity and stability of production. Belt deviation usually results from a combination of mechanical and operational factors. The main causes include inaccurate positioning of idlers and pulleys during installation, frame misalignment, improper tension adjustment leading to uneven stress distribution, poor joint quality that introduces thickness or stiffness inconsistencies, uneven material loading that causes lateral bias, and belt aging or surface wear during prolonged service. These factors collectively cause the belt to gradually deviate from its centerline, producing either continuous or intermittent offset behavior. Therefore, real-time monitoring and automated identification of conveyor belt deviation are of great engineering significance for ensuring the safe and stable operation of conveying systems and reducing maintenance costs [3].
The phenomenon of conveyor belt deviation not only disrupts the normal operation of conveying systems but also leads to a series of serious consequences. Deviation causes continuous friction between the belt edge and structural components such as the frame and idlers, resulting in abnormal local wear and surface peeling of the belt. In severe cases, this can lead to belt tearing or breakage, significantly reducing the service life and operational reliability of the conveyor system. Moreover, serious deviation often causes material spillage, which increases cleaning and maintenance workloads and may even block sensors or motor inlets, further affecting the continuity of system operation. When the deviation becomes excessive, the conveyor must be shut down for adjustment, leading to production interruptions, economic losses, and potential safety risks for both equipment and personnel. Consequently, conveyor belt deviation has become one of the major hidden hazards affecting the stable operation of conveying systems in industries such as mining and metallurgy [4].
Traditionally, mechanical anti-deviation devices have been employed to address the problem of conveyor belt deviation. Vertical rollers are installed on both sides of the belt, and when deviation occurs, the belt comes into contact with the rollers and is subjected to a lateral force that forces it back to the normal position [5]. Such devices provide correction and protection when severe deviations occur. However, they are only activated once the belt has deviated beyond a certain threshold, can counteract only part of the displacement, and are incapable of quantifying the real-time operational status of the conveyor belt. Moreover, prolonged friction between the belt and the vertical rollers leads to edge wear, thereby shortening the service life of the belt. Consequently, mechanical anti-deviation devices fall short of meeting the requirements for real-time monitoring and reliable detection of belt deviation. An example of their application in an operational conveyor belt system is illustrated in the figure below. Figure 2 illustrates the mechanical anti-deviation guide rollers positioned on both sides of the conveyor belt, as indicated by the arrows in (a) and (b).
Machine vision, as an important branch of artificial intelligence and computer vision, utilizes computers and image processing algorithms to simulate human visual perception, extracting essential information from images or videos to accomplish tasks such as recognition, detection, and measurement. In recent years, with the rapid advancement of deep learning and computing power, machine vision has gradually replaced traditional manual inspection and, in certain cases, mechanical sensor-based detection, becoming a key enabling technology for industrial automation and intelligent manufacturing.
The development of machine vision is closely tied to the continuous evolution of image recognition algorithms. The rise of deep learning in the field of computer vision has driven the rapid advancement of convolutional neural network (CNN) algorithms over the past decade. Numerous researchers have proposed a wide range of efficient models for tasks such as object detection, object classification, and semantic segmentation, thereby accelerating technological progress. The iterative improvement of these algorithms has not only enhanced the accuracy and efficiency of image recognition but also opened new avenues for conveyor belt deviation detection in industrial settings.
Object detection aims to determine the presence of objects within an image and to provide their locations and bounding boxes. Early algorithms, such as R-CNN and Faster R-CNN, established the two-stage framework based on region proposals [6]. Single-stage methods, including the YOLO series and SSD, introduced an end-to-end approach, significantly improving detection speed [7,8]. In recent years, both the YOLO family and Transformer-based methods such as DETR have further achieved a balance between speed and accuracy [9]. In the context of conveyor belt deviation detection, some studies have attempted to use object detection techniques to extract belt edge regions, estimating the edges through the diagonals of rectangular bounding boxes. However, this approach is highly dependent on boundary precision and exhibits limited robustness in complex operational environments.
Object classification aims to determine the category of an image based on learned feature patterns. Networks such as AlexNet, VGG, ResNet, DenseNet, and EfficientNet have progressively improved classification performance, serving as important backbones for subsequent tasks [10,11,12,13,14]. More recently, the emergence of Vision Transformer (ViT) has further advanced large-scale image classification [15]. In the context of conveyor belt deviation detection, classification-based methods are typically employed to determine whether a belt is in a misaligned state. While these methods are straightforward to implement and computationally efficient, they can only provide the overall status of the conveyor belt and cannot specify the degree of deviation or the precise location of deviation, thereby falling short of the spatial information requirements in real-world operational scenarios.
Semantic segmentation involves assigning a semantic category to each pixel in an image, thereby providing a detailed representation of target regions. Compared with object detection and classification tasks, semantic segmentation offers a higher level of spatial information. It has been widely applied in fields such as medical imaging, autonomous driving, and industrial inspection. The Fully Convolutional Network (FCN) first introduced an end-to-end framework for pixel-level prediction, enabling segmentation tasks to be optimized using deep convolutional neural networks [16]. Architectures such as U-Net and SegNet leverage symmetric encoder–decoder structures and multi-scale feature fusion to further enhance segmentation performance, particularly for small targets and complex backgrounds [17,18]. More recently, the DeepLab series has employed atrous convolutions and conditional random fields to model multi-scale contextual information [19], HRNet maintains high-resolution features to enhance edge detail representation [20], and Transformer-based methods such as SegFormer and Mask2Former combine global modeling with multi-scale feature representation [21,22], achieving a new balance between accuracy and computational efficiency. In conveyor belt deviation detection, semantic segmentation offers distinct advantages: by performing pixel-wise recognition of the entire belt region, it enables precise extraction of the left and right edges, facilitating accurate delineation of the conveyor belt boundaries.
In summary, traditional mechanical anti-deviation devices exhibit notable limitations in terms of real-time performance and reliability. Although existing machine vision methods have achieved significant progress in various aspects, their applicability to conveyor belt deviation detection remains constrained. To address these challenges, this study aims to achieve precise detection and reliable recognition of conveyor belt deviation, building upon existing research. The main contributions of this work are as follows:
(1)
Based on the existing Multi-scale Adaptive Guidance Attention (MASAG) mechanism, the structure is adapted to the conveyor belt deviation detection task and embedded into the skip connections of U-Net. This integration enhances the cross-scale fusion of semantic and edge features, thereby improving the model’s capability for accurate edge recognition under complex working conditions. Compared with SEU-Net, the proposed design achieves more effective boundary feature enhancement and yields superior edge segmentation performance.
(2)
A belt center localization method based on the physical stability of idler edges is developed, allowing for the quantitative analysis of deviation severity.
(3)
A cross-method experimental validation strategy is employed: U-Net is used as the baseline model to verify improvements in segmentation accuracy, and comparative experiments are conducted with the object detection model YOLOv5s and the lane detection model UFLD [23], evaluating performance differences among different methodological approaches.
(4)
A comprehensive dataset was constructed, covering multiple environmental conditions such as normal illumination, strong light, low light, and rainy weather. The dataset provides a reliable foundation for model training and performance evaluation under diverse real-world scenarios.

2. Materials and Methods

Semantic segmentation methods provide dense, pixel-level predictions that enable the extraction of continuous and detailed belt contours, offering a more interpretable and geometrically consistent foundation for deviation computation. This pixel-wise modeling capability makes semantic segmentation particularly suitable for handling complex backgrounds and lighting variations commonly present in mining environments.
To achieve precise recognition of potential conveyor belt edges, this study proposes an improved U-Net architecture to address the insufficient edge-feature extraction capability of existing semantic segmentation models under complex environmental conditions [24].
Compared with high-complexity models such as DeepLab, FPN, and Vision Transformer, U-Net features a relatively simple structure with strong customizability and scalability, allowing flexible adaptation to specific industrial scenarios. Therefore, U-Net is selected as the baseline framework of this study, upon which structural optimizations are performed for the conveyor belt deviation detection task.
The proposed improvement integrates a ResNet encoder and a Multi-scale Adaptive Guidance Attention (MASAG) mechanism [25].
In this design, the ResNet encoder preserves low-level spatial features essential for edge recognition through residual connections, while the MASAG module adaptively fuses multi-scale contextual information to enhance the model’s ability to detect potential edges under complex operating conditions, thereby improving the accuracy and robustness of deviation detection.
After obtaining the binarized segmentation of the conveyor belt region, the edges are preliminarily visible but often exhibit pixel-level jaggedness or noise, resulting in insufficient geometric accuracy for potential edge localization. To address this, the Canny edge detection algorithm is introduced to post-process the segmentation results. The Canny algorithm, with its noise robustness and single-pixel edge localization precision, can extract continuous and smooth structural contours from the binarized regions produced by the segmentation method [26]. This step effectively mitigates the problem of coarse edges in semantic segmentation outputs and provides an accurate set of edge points for subsequent line fitting.
After obtaining the edge point set, the RANSAC regression algorithm is employed for line fitting. By iteratively performing random sampling [27], RANSAC can effectively eliminate outliers caused by residual noise or local defects, allowing for the estimation of optimal linear models for the edges on both sides of the conveyor belt and achieving precise edge localization. Ultimately, by calculating the pixel distance between the fitted lines and a predefined central reference line, the degree of conveyor belt deviation can be accurately measured, enabling automated deviation detection. Figure 3 illustrates the architecture of the improved U-Net network.

2.1. U-Net Network

U-Net features a classical symmetric encoder–decoder architecture that has become a foundational paradigm for semantic segmentation. It continues to serve as a core backbone in numerous state-of-the-art studies and has been widely adopted and extended. The model extracts hierarchical semantic features through the encoder path via downsampling, while the decoder path performs upsampling to recover spatial details. Its pioneering skip connection mechanism enables the fusion of multi-scale features from the encoder with corresponding stages in the decoder, effectively mitigating the loss of fine-grained information in deep networks.
Although U-Net was originally developed for biomedical image analysis, its modular design endows it with strong extensibility. The conveyor belt edge recognition task addressed in this study requires capturing subtle and highly variable edge features from a large volume of industrial images. Therefore, U-Net is employed as the baseline model, allowing the network to achieve powerful semantic representation while producing high-precision segmentation contours rich in detail. The U-Net network model is illustrated in Figure 4.
However, the original U-Net model exhibits limitations in conveyor belt region recognition tasks under complex industrial environments. Its encoder has restricted feature extraction capability, and the model shows insufficient adaptability to varying illumination and complex backgrounds, which constrains further improvements in segmentation accuracy. To address these issues, two modifications are introduced based on the U-Net baseline. First, the original encoder is replaced with a ResNet residual network to enhance feature extraction capability and mitigate network degradation. Second, a multi-scale attention gating module is integrated to improve the model’s robustness under complex conditions, thereby enabling more accurate and stable segmentation of conveyor belt regions.

2.2. Improved U-Net Model

To enhance the model’s ability to extract potential conveyor belt edges under complex operational conditions, the original U-Net encoder is replaced with a ResNet-34 network [28]. ResNet, through its core residual learning mechanism, addresses the problems of gradient vanishing and network degradation in deep network training.
The fundamental principle of ResNet is the introduction of residual blocks with shortcut connections, which allow the network to learn the residual mapping between inputs and outputs. The input features are transformed through a series of convolutional layers along the main path, while an identity mapping preserves the original input along a shortcut path. The outputs from both paths are then combined via element-wise addition. This structure effectively mitigates the gradient vanishing problem in U-Net, facilitates the learning of identity mappings, and allows the network to increase in depth without performance degradation, while efficiently preserving and integrating low-level detail features with high-level semantic representations. The structure of the residual block is illustrated in Figure 5.
To integrate the ResNet network as the feature extraction backbone with the U-Net decoder, the final average pooling and fully connected layers of ResNet are removed. The structure of the ResNet module used in this study is illustrated in Figure 6.
However, conveyor belt region recognition presents an inherent challenge: the idlers supporting the belt exhibit similar color and texture characteristics to the belt itself. Consequently, both the original U-Net encoder and the ResNet residual structure capture not only the true conveyor belt edges but also interfering idler information. In the original U-Net, the skip connection mechanism concatenates all encoder features directly with the decoder, which makes it difficult to effectively distinguish between such similar features. Replacing the encoder with a more expressive ResNet enhances high-level semantic features, but the shallow features transmitted from the encoder also become richer and more complex, thereby exacerbating the limitations of the original skip connections in feature selection. As a result, the model cannot adaptively focus on the conveyor belt itself, causing the decoder to struggle in reconstructing details accurately; this may lead to erroneous segmentation boundaries or inclusion of adjacent idlers within the segmented belt region. Therefore, a fusion strategy capable of actively guiding the model to focus on the target while suppressing similar interferences is required to replace simple concatenation.
To address the issues arising from both the inherent task challenge and the limitations of the original skip connections, a MASAG module is introduced into the skip connections. This module takes as input the multi-scale low-level features from the encoder and the high-level semantic features from the current decoder stage, and computes adaptive weights to generate an attention map that dynamically focuses on key regions. This approach suppresses feature responses associated with background interferences such as idlers while actively enhancing the representation of potential conveyor belt edges. The MASAG module guides task-relevant regions into the decoder, directly mitigating interference from similar objects and ultimately improving segmentation accuracy and boundary clarity. The structure of the MASAG module used in this study is illustrated in the figure below, with a comparison of results before and after the introduction of this attention mechanism shown in Figure 7.
As shown in Figure 8, the MASAG module receives high-resolution detail features ( X ) from the ResNet encoder and high-level semantic features ( G ) from the U-Net decoder.
The information processing in MASAG mainly consists of four stages: multi-scale fusion, spatial selection, spatial interaction with cross-modulation, and recalibration.
In the multi-scale fusion stage, the encoder features X undergo local context extraction. The input features are first processed through a depth-wise convolution to capture high-resolution local spatial information and fine-grained details. This is followed by a dilated convolution to further capture high-resolution local spatial information and detail features. Finally, a 1 × 1 convolution is applied to obtain the fused local context features.
For the decoder guidance features G global context modeling is performed. A dual-path pooling strategy is employed, consisting of global average pooling and global max pooling on G , aggregating semantic information from different perspectives. The resulting feature vectors are concatenated along the channel dimension and fused and compressed through a 1 × 1 convolution. This stage enhances the segmentation accuracy and boundary clarity of the conveyor belt region.
The multi-scale feature fusion can be expressed mathematically as follows:
X = C ( D * ( D ( X ) ) )
G = C g ( C o n c a t [ A V G P O O L ( G ) , M A X P O O L ( G ) ] )
F = X + G
In the above equations, D ( ) denotes a depth-wise convolution, D * ( ) denotes a dilated convolution, and C ( ) denotes a 1 × 1 convolution. A V G P O O L and M A X P O O L represent global average pooling and global max pooling, respectively, while C o n c a t [ ] denotes channel-wise concatenation. C g ( ) represents the 1 × 1 convolution fusion.
During the spatial selection phase, the feature F from Equation (3) is projected onto dual-channel feature maps to align it with the input features X and G . A Softmax function is applied along the channel dimension to compute the spatial selection weights [29], which are used to finely adjust the principal stress distributions of X and G . This process yields spatially selected features X 1 and G 1 , helping to reduce interference from irrelevant background regions in conveyor belt scenarios. The procedure is mathematically formulated as follows:
M = C ( F )
[ W 1 , W 2 ] = S o f t max ( M , dim = 1 )
This operation ensures that at each spatial position ( h , w ) , the sum of the two-channel weights remains equal to 1. The weight maps W 1 and W 2 quantitatively represent the contributions from the encoder detail feature X and the decoder feature G , respectively, with their specific values being adaptively determined based on the local content at each position.
X 1 = X + ( W 1 X )
G 1 = G + ( W 2 G )
Among these, denotes element-wise multiplication. The weights are used to modulate the original input features, thereby generating spatially selected versions of the features, denoted as X 1 and G 1 .
In the spatial interaction and cross-modulation stage, the feature X 1 , which contains rich fine-grained local details, is gated via a Sigmoid activation mechanism [30] and modulated by G 1 that incorporates weighted global contextual information. This process generates an enhanced feature X 2 , where detailed local information and global semantics are effectively fused. The computational procedure can be expressed as follows:
X 2 = σ ( G 1 ) X 1
Among these, σ ( ) denotes the Sigmoid activation function. The purpose of this operation is to modulate each local detail feature in X 1 with the corresponding global contextual importance from G 1 .
Meanwhile, the feature G 1 , which encapsulates weighted global contextual information, incorporates the fine-grained local context from X 1 through a similar mechanism, thereby evolving into an enhanced feature G 2 :
G 2 = σ ( X 1 ) G 1
Finally, X 2 and G 2 are fused to generate cross-enhanced features, ensuring the model accurately segments both the conveyor belt regions and their boundaries.
F 1 = X 2 G 2
In the recalibration stage, the module performs refined processing on the output of Equation (10) to generate the final optimized feature output. Specifically, F 1 is first processed by a convolutional layer, followed by a Sigmoid activation function σ , to produce an attention map A . This attention map is then used to recalibrate the original input feature X from the encoder. By performing element-wise multiplication between X and A , spatial weighting is achieved for the feature representation. The computational procedure is formulated as follows:
A = σ ( C ( F 1 ) X )
This process effectively enhances feature responses in conveyor belt regions while suppressing irrelevant background information. The weighted features are then projected and refined through convolution after integrating the information and adjusting the channel dimensions, ultimately yielding the final output X . The procedure is formulated as follows:
X = C A
Through the series of transformations described above, the recalibration stage equips the encoder feature X with adaptive multi-scale receptive fields tailored by the preceding stages—multi-scale fusion, spatial selection, and interactive modulation. The final output feature X incorporates precise contextual awareness, enabling effective integration with features in the decoder component of the U-Net architecture. Thereby, it significantly enhances both the accuracy and robustness of conveyor belt region segmentation under complex environmental conditions.

2.3. Region–Boundary Joint Loss Function Design

In semantic segmentation tasks, the design of the loss function significantly influences model learning performance. For the specific application scenario of conveyor belt deviation addressed in this paper, we propose a region–boundary joint loss function to ensure segmentation accuracy at the regional level while further improving the precision of boundary contours.
The objective of the region loss function is to measure the discrepancy between the prediction and ground truth at the holistic region level. In this work, we adopt a combined loss consisting of Cross-Entropy loss and Dice loss [31,32].
For the conveyor belt deviation task, we formulate the problem as a binary segmentation task, distinguishing between the conveyor belt region and the background. To ensure accurate pixel-wise classification, Binary Cross-Entropy (BCE) loss is introduced under this binary setting. This loss measures the pixel-level differences between predicted probabilities and the ground truth. In the context of conveyor belt deviation segmentation, BCE effectively guides the model to distinguish the conveyor belt region from the background. However, due to the large proportion of the conveyor belt area in the image, relying solely on BCE tends to bias the model toward predicting the dominant large region, thereby insufficiently focusing on the edges. This can lead to blurred boundaries and adversely affect the subsequent accurate detection of belt deviation. The Binary Cross-Entropy loss is defined as follows:
L C E = 1 N i = 1 N [ y i log ( p i ) + ( 1 y i ) log ( 1 p i ) ]
Among these, N is the total number of pixels, y i { 0 , 1 } denotes the ground truth label, and p i [ 0 , 1 ] represents the predicted probability from the model.
To mitigate this limitation, we further introduce the Dice loss, which directly measures the overlap between the predicted region and the ground truth, offering inherent robustness to region size imbalance. The Dice loss optimizes the overall regional overlap ratio, enabling the model to maintain prediction accuracy for large areas while still accounting for precision in smaller regions. However, the Dice loss still provides insufficient constraint on boundaries, potentially resulting in blurred or jittery predicted contours. This is detrimental to the precise detection of conveyor belt deviation, which requires accurate edge information. The Dice loss is formulated as follows:
L D i c e = 1 2 i = 1 N p i y i + ϵ i = 1 N p i + i = 1 N y i + ϵ
Among these, ϵ is a small constant introduced for numerical stability, approaching zero in value. In all experiments conducted in this study, ϵ is set to 10 6 , thereby preventing division-by-zero errors when certain classes are entirely absent in the image or mask.
To address the boundary blurring issue, we employ a boundary loss based on the distance transform, which assigns higher weights to pixels closer to the true boundary. This approach directs the model to focus more intensively on the edge regions of the conveyor belt during training. Such a design is chosen to effectively improve edge prediction accuracy, thereby reducing errors in edge extraction for the precise quantification of conveyor belt deviation. Although the introduction of boundary loss increases computational overhead, its substantial advantage in improving edge precision is critical for the conveyor belt deviation detection task. The boundary loss is calculated as follows:
L B o u n d a r y = 1 N i = 1 N D ( y i ) | y i p i |
Among these, D ( y i ) denotes the distance from pixel i to the nearest true boundary, thereby guiding the model to focus more on edge information during training.
In this work, we combine the Binary Cross-Entropy loss, Dice loss, and Boundary Loss through a weighted summation to form a region–boundary joint loss function:
L R B C L = λ 1 ( α L C E + ( 1 α L D i c e ) ) + λ 2 L B o u n d a r y
where α controls the weighting between the Cross-Entropy and Dice losses for regional optimization, while λ 1 and λ 2 regulate the contributions of the regional loss and boundary loss, respectively.

2.4. Precise Quantification Method for Conveyor Belt Deviation

In visual inspection systems for conveyor belt deviation, the installation position and angle of the camera significantly influence the establishment of the detection reference. In practical industrial settings, due to constraints such as equipment layout, spatial limitations, and safety regulations, cameras often cannot be installed at theoretically ideal positions, which may result in the optical axis not being strictly perpendicular to the conveyor belt plane. Such non-ideal installation conditions directly alter the geometric relationship between the conveyor belt and background structures in the image, introducing systematic errors into pixel distance-based deviation measurements.
To address the above issue, this paper adopts a reference line construction method based on the edge positions of fixed idler rollers. The core advantage of this approach lies in shifting the measurement reference from the image boundaries, which depend on the camera coordinate system, to the idler rollers, which are physically fixed in position within the scene. As shown in Figure 9, images of the idler rollers were captured from different perspectives on site.
As a key load-bearing and guiding component of the conveyor belt system, the idler roller is manufactured from high-strength metal or composite materials, offering excellent mechanical stability and resistance to deformation. Mechanically, the idler roller is securely mounted onto the conveyor frame via rigid brackets, maintaining a constant spatial position and axial orientation during system operation. This material and structural invariance allow the edges of the idler roller to form a reliable spatial reference in the physical environment, unaffected by operational vibrations or long-term use.
By locating both edges of the idler roller and extending them to the image boundaries, a reference line is constructed by connecting the corresponding intersection points at the upper and lower image borders. Since this reference is derived directly from the inherent physical properties of the idler roller, it effectively mitigates deviation issues between the image coordinate system and the real-world coordinate system caused by non-ideal camera placement. This approach grounds the reliability of the reference construction on the fixed physical structures within the scene, rather than on the image coordinate system which is susceptible to installation variations, thereby enhancing the robustness and accuracy of deviation detection.
After obtaining the binary conveyor belt region output by the semantic segmentation model, the pixel-based area is converted into measurable geometric features to achieve precise quantification of the deviation magnitude. Edge detection is applied to the binary region to extract its contour as a continuous set of pixel points. A line fitting method is then used to derive the two edge lines representing both sides of the conveyor belt. The centerline between these two edge lines is calculated and compared pixel-wise with the reference line established based on the idler roller. This process transforms the belt’s deviation state into an accurate pixel-distance measurement.

2.4.1. Reference Line Construction

The edge lines from both sides are extended to the upper and lower boundaries of the image, yielding two sets of intersection points. By connecting the corresponding upper and lower intersection points, a reference line for deviation measurement is constructed. This construction process is illustrated in Figure 10. The method for drawing the centerline is demonstrated below:

2.4.2. Edge Extraction and Conveyor Belt Boundary Fitting

This section takes the binary conveyor belt region output by the semantic segmentation module as input, and performs edge extraction and boundary line fitting based on this. The binary image output after semantic segmentation effectively isolates interference from complex backgrounds, providing a relatively clean conveyor belt region for edge detection. As shown in Figure 11, performing edge detection directly on the entire original image contains excessive complex information, making it difficult to accurately identify the conveyor belt edges.
The extracted binary conveyor belt region may exhibit ill-defined potential edges or locally irregular boundaries. Directly fitting lines to these potential edges could lead to inaccurate fitted lines for the conveyor belt, consequently compromising the accuracy of the deviation calculation. Therefore, an edge extraction and fitting method is further introduced based on the region obtained via semantic segmentation. The procedure is as follows:
G x = I x G y = I y G = ( G x ) 2 + ( G y ) 2 θ = arctan ( G y G x )
where I ( x , y ) represents the input binary image.
After obtaining the gradient information, non-maximum suppression is applied to refine the edges to a single-pixel width. This is followed by a dual-threshold connection process (with lower threshold T L and higher threshold T H ). Although the input in our application scenario is already a binary conveyor belt region, the dual thresholds are utilized to assist in generating well-defined and continuous edges. The final output is an edge set E ( x , y ) . The overall procedure is summarized as follows:
E ( x , y ) = 1 , G ( x , y ) T H 1 , T L G ( x , y ) T H 0 , O t h e r s
As shown in Figure 12, the proposed method achieves more accurate edge localization.
After obtaining the edge-detected binary image, a row-wise scanning process is performed to record the leftmost and rightmost pixel positions in each row, forming two endpoint sets:
L e f t P o i n t s = { ( y , x l e f t ( y ) ) | E ( y , x l e f t ( y ) ) > 0 }
R i g h t P o i n t s = { ( y , x r i g h t ( y ) ) | E ( y , x r i g h t ( y ) ) > 0 }
At this stage, the edge image may still contain local noise and outliers. Direct fitting could result in lines that deviate from the actual conveyor belt boundaries. To address this, the collected sets of left and right edge points are, respectively, fitted using RANSAC regression to obtain robust and accurate boundary lines.
x l e f t ( y ) = a L y + b L x r i g h t ( y ) = a R y + b R
The fitted lines are plotted on the image to visualize the conveyor belt boundary positions. Simultaneously, the centerline between the left and right edges is calculated to quantify the belt’s deviation status.
x c e n t e r ( y ) = x l e f t ( y ) + x r i g h t ( y ) 2
As shown in Figure 13, the schematic diagram illustrates the quantification of conveyor belt deviation.

3. Results

3.1. Dataset Construction and Hardware Environment

To validate the robustness of the proposed model in real industrial environments, this study adopts a case study approach, establishes a long-term, real-scene dataset collected from conveyor belt surfaces. The sample images are shown in Figure 14.
All data were collected from an actual mining site in Ma’anshan City, Anhui Province, with a collection period spanning six months. The acquisition was performed using fixed industrial cameras. The open structure on both sides of the collection site resulted in significant variations in internal lighting conditions, strongly influenced by time of day, weather, and seasonal changes, creating continuous variations from daytime brightness to nighttime darkness. This challenging lighting environment provides valuable test data for evaluating model performance under different illumination intensities, as demonstrated in Figure 15, Figure 16, Figure 17 and Figure 18.
All acquired original images underwent precise semantic segmentation to extract the conveyor belt regions. After screening, a curated selection of 10,000 representative images from the hundreds of thousands captured by the cameras was annotated. To rigorously evaluate the model’s generalization capability, the dataset was partitioned into training, validation, and test sets according to the actual environmental conditions. The value of this dataset lies in its authentic representation of the dynamic and challenging lighting variations present in real industrial settings. As shown in Figure 19, the specific distribution of the dataset is presented.
The on-site deployment environment is consistent with the experimental environment used in this study. The training parameters used in this study are summarized in Table 1 and Table 2.

3.2. Ablation Study

To verify the effectiveness of each proposed module, a series of systematic ablation experiments were conducted on the self-constructed dataset. The experimental settings and results are summarized in Table 3. The classical U-Net architecture was adopted as the baseline model, upon which ResNet34, the MASAG module, and the boundary loss function (Boundary Loss) were progressively integrated to evaluate the contribution of each component to model performance.
As shown in Table 3, the ablation experiments progressively validate the impact of each proposed component in this scenario. The results demonstrate that adopting ResNet34 as the encoder effectively improves performance, while the MASAG module further enhances feature representation through an attention mechanism. Finally, the introduction of the boundary loss function refines the segmentation accuracy along object boundaries. Overall, the proposed model achieves superior segmentation performance.
The loss curves during the training process are illustrated in the following figure.
Figure 20 presents a comparison between the U-Net model and the proposed ResNet + MASAG + Boundary configuration during the training process. All experimental data and conditions were kept identical.

3.3. Comparative Experiments

To comprehensively validate the effectiveness of the proposed model in the task of conveyor belt deviation detection, comparative experiments were conducted from multiple perspectives. At present, commonly used approaches can be broadly categorized into semantic segmentation-based methods and detection-based methods. Representative methods from both perspectives are introduced, with their principle illustrations and training curves provided, followed by comparative analysis against the proposed approach.
In the segmentation-based category, U-Net is adopted as the baseline model, while DeepLabV3+ and SEU-Net are used as comparative methods; in the object detection category, YOLOv5s is selected; and in the anchor-based detection category, UFLD is employed.
Table 4 summarizes the quantitative comparison of different methods in terms of accuracy, frame rate, and computational complexity (GFLOPs).
From the results presented in the table, it can be observed that segmentation-based methods demonstrate a significant advantage in pixel-level boundary localization, whereas object detection and anchor-based detection methods achieve higher frame rates but relatively lower accuracy.
In the context of this study, conveyor belt deviation detection prioritizes highly precise extraction of potential belt edges, as the quantitative measurement of deviation directly depends on the accuracy of edge localization. Even minor boundary errors can propagate through subsequent geometric fitting and quantitative deviation calculations, leading to cumulative errors that affect the reliability of early warnings. Therefore, although segmentation-based methods are slightly slower during inference, they offer irreplaceable advantages in terms of accuracy.
It should be noted that conveyor belt deviation represents a slowly developing anomaly, where displacement changes typically accumulate over seconds or even minutes. Hence, the frame rate requirement for such detection systems is relatively low. To balance real-time performance and accuracy, this study employs a frame-skipping detection approach, which significantly reduces computational load while maintaining sufficient temporal resolution to meet real-time monitoring needs.
To further verify the stability of the proposed method for conveyor belt deviation detection under different environmental conditions, the mean square error (MSE) between the detected centerline and the reference line was calculated.
All data were computed based on the testing set, covering four representative scenarios: normal illumination, strong-light interference, rainy conditions, and low-light environments.
This metric reflects the stability and accuracy of the algorithm in tracking the conveyor belt’s center position over time; a smaller MSE indicates that the detected trajectory is closer to the actual operating path and that the deviation measurement is more stable.
The results are summarized in Table 5.
In some studies, researchers have employed object detection models to achieve belt deviation detection by predicting rectangular bounding boxes of the conveyor belt region and aligning their diagonals with the belt edges. As shown in Figure 21, the basic principle of the proposed method is illustrated together with the training curve of YOLOv5s for mAP@0.5:0.95 in this scenario.
Another category of methods is based on anchor-point partitioning, such as UFLD, which locates conveyor belt edges through a set of predefined anchor points. The figure below illustrates the principle of this method, along with its accuracy curve in the present scenario. As shown, the UFLD method reaches an accuracy of approximately 0.85 after 100 epochs, but exhibits slow convergence and significant oscillations, indicating insufficient stability of this approach in the conveyor belt deviation detection task. As shown in Figure 22.
As shown in Figure 23, the accuracy curves of the proposed method, SEU-Net, and DeepLabV3+ are presented. Compared with YOLOv5s and UFLD, the proposed approach not only achieves a higher final accuracy of 99.77%—surpassing the baseline U-Net (99.65%) and significantly outperforming detection-based methods—but also demonstrates superior convergence stability and boundary representation accuracy.

3.4. Visualization Analysis

This section presents the recognition performance and robustness of the proposed method under different scenarios, and provides a visual comparison among five methods—the proposed method, YOLOv5s, UFLD, SEU-Net, and DeepLabV3+. The evaluation scenarios include normal illumination, strong light interference, rainy conditions, low-light environments, and severe belt deviation.
As illustrated in Figure 24, under normal lighting conditions, all methods are able to recognize conveyor belt edges to a reasonable extent. However, the results indicate that YOLOv5s, due to its reliance on the diagonal approximation of bounding boxes, exhibits noticeable boundary offsets; UFLD suffers from jitter during anchor-point fitting; SEU-Net exhibits limited capability in edge extraction and tends to produce adhesion in regions with similar color tones, while DeepLabV3+ generates a higher number of false detections in complex backgrounds. In contrast, the proposed method produces more complete, smoother, and more accurate edge delineations that align closely with the ground-truth boundaries.
As shown in Figure 25, under strong light interference, partial regions exhibit overexposure. The bounding boxes generated by YOLOv5s are significantly affected, resulting in obvious boundary offsets. The UFLD method shows unstable anchor-point predictions, leading to edge curves that deviate from the true conveyor belt boundary. SEU-Net still exhibits adhesion between the roller and the belt regions, while DeepLabV3+ remains susceptible to background interference and continues to produce false detections in complex scenes. In contrast, the proposed method maintains stable edge detection performance.
As illustrated in Figure 26, in rainy environments, raindrops and blurring effects make it difficult for conventional methods to extract edges reliably. YOLOv5s suffers from missing detection boxes and boundary shifts. UFLD fails to generate sufficient anchor points, which leads to inaccurate edge fitting. SEU-Net continues to exhibit adhesion between the roller and the conveyor belt regions, whereas DeepLabV3+ shows adhesion in areas where the background and the belt have similar color tones. By leveraging global context modeling in the segmentation framework, the proposed method effectively distinguishes between background and edge regions.
As shown in Figure 27, under low-light conditions, the right side of the conveyor belt exhibits insufficient brightness. YOLOv5s demonstrates imprecise boundary detection, and UFLD yields significantly shifted outputs. SEU-Net tends to produce adhesion in shadow-affected regions, whereas DeepLabV3+ exhibits similar adhesion problems and further misclassifies a considerable amount of background as target regions. Benefiting from the strong feature representation of deep layers, the proposed method is able to sustain stable detection accuracy and maintain edge continuity even under dim illumination.
As illustrated in Figure 28, under severe deviation, the right roller is almost completely occluded by the conveyor belt. YOLOv5s continues to experience missing detection boxes with relatively low confidence levels, and UFLD presents noticeable deviations in anchor-point localization. SEU-Net still exhibits adhesion artifacts and imprecise edge detection of the conveyor belt, while DeepLabV3+ is strongly affected by background noise, leading to numerous false responses. The proposed method, however, successfully extracts edge information and produces boundary delineations that closely match the true belt edges, demonstrating its practicality and robustness.
From the visual comparisons across diverse environments, YOLOv5s demonstrates limited adaptability to complex industrial scenarios because of its dependence on rectangular bounding boxes, whereas UFLD suffers from degraded edge continuity under sparse anchor distributions or strong environmental interference. SEU-Net shows inadequate precision in segmenting edge details, leading to adhesion between the rollers and the conveyor belt, while DeepLabV3+ is prone to numerous false detections in cluttered or complex backgrounds. In contrast, the proposed method consistently delivers accurate and stable recognition across all conditions, and shows enhanced robustness and application potential in extreme cases such as strong light, rainfall, and severe belt deviation.

4. Conclusions

This study proposes a vision-based deep learning solution to address the practical engineering requirements of conveyor belt deviation detection. The effectiveness of the proposed method is validated through extensive experiments under multiple working conditions. The results demonstrate that the method can stably and accurately extract belt edges in complex environments and, by combining the constructed centerline reference, enables real-time identification of deviation states and quantitative measurement of displacement. The findings further verify the superior performance of segmentation-based approaches in this task. However, several limitations and possible improvements remain:
(1) Limited robustness to illumination changes and contamination. Although the model performs well under strong- and low-light conditions, severe dust or lens contamination may still degrade edge features, affecting the stability of deviation estimation.
(2) High model complexity. While the current network architecture ensures high accuracy, it incurs substantial computational costs and relatively low inference speed. Future work will consider lightweight network design or model pruning to improve real-time performance.
(3) Restricted experimental scope. The dataset used in this study was mainly constructed from a single mining site and does not cover extreme conditions such as rain, snow, or heavy occlusion. Expanding the dataset diversity will be important for improving generalization capability.
(4) Incomplete deployment optimization. Although the system can operate in real time on an industrial computer, further optimization for embedded edge devices is still needed. Future research will explore the integration of edge computing and multi-modal fusion to enhance the system’s stability and adaptability in practical deployments.
Overall, this study verifies the feasibility of applying deep learning techniques to conveyor belt deviation detection from a methodological perspective, while also highlighting innovation in both experimental design and engineering value. Future work will further explore lightweight model design, edge computing deployment, and multimodal information fusion to enhance real-time performance and strengthen applicability and deployment potential in industrial field environments.

Author Contributions

Conceptualization, J.H. and L.M.; methodology, L.M. and C.D.; software, L.M.; validation, L.M., C.D. and T.F.; formal analysis, T.F. and W.L.; investigation, W.L. and X.H.; resources, W.L. and X.H.; data curation, L.M.; writing—original draft preparation, L.M.; writing—review and editing, J.H.; visualization, L.M.; supervision, J.H.; project administration, J.H.; funding acquisition, J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by ‘National Science and Technology Major Project for Deep Earth Probe and Mineral Resources Exploration, grant number: 2025ZD1008400 and 2025ZD1008406’.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author, upon reasonable request.

Conflicts of Interest

Authors Jiaming Han, Wensheng Liu and Xianhua He were employed by the company Anhui Masteel Mining Resources Group Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Zhang, H.; Xiao, D. A High-Precision and Lightweight Ore Particle Segmentation Network for Industrial Conveyor Belt. Expert Syst. Appl. 2025, 273, 126891. [Google Scholar] [CrossRef]
  2. Xu, X.; Zhao, H.; Fu, X.; Liu, M.; Qiao, H.; Ma, Y. Real-Time Belt Deviation Detection Method Based on Depth Edge Feature and Gradient Constraint. Sensors 2023, 23, 8208. [Google Scholar] [CrossRef]
  3. Han, J.; Fang, T.; Liu, W.; Zhang, C.; Zhu, M.; Xu, J.; Ji, J.; He, X.; Wang, Z.; Tang, M. Applications of Machine Vision Technology for Conveyor Belt Deviation Detection: A Review and Roadmap. Eng. Appl. Artif. Intell. 2025, 161, 112312. [Google Scholar] [CrossRef]
  4. Bortnowski, P.; Kawalec, W.; Król, R.; Ozdoba, M. Types and Causes of Damage to the Conveyor Belt–Review, Classification and Mutual Relations. Eng. Fail. Anal. 2022, 140, 106520. [Google Scholar] [CrossRef]
  5. Ni, Y.; Cheng, H.; Hou, Y.; Guo, P. Study of Conveyor Belt Deviation Detection Based on Improved YOLOv8 Algorithm. Sci. Rep. 2024, 14, 26876. [Google Scholar] [CrossRef]
  6. Saffarini, R.; Khamayseh, F.; Awwad, Y.; Sabha, M.; Eleyan, D. Dynamic Generative R-CNN. Neural Comput. Appl. 2025, 37, 7107–7120. [Google Scholar] [CrossRef]
  7. Chen, Y.; Yuan, X.; Wang, J.; Wu, R.; Li, X.; Hou, Q.; Cheng, M.-M. YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-Time Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 4240–4252. [Google Scholar] [CrossRef] [PubMed]
  8. Tan, L.; Wu, H.; Xu, Z.; Xia, J. Multi-Object Garbage Image Detection Algorithm Based on SP-SSD. Expert Syst. Appl. 2025, 263, 125773. [Google Scholar] [CrossRef]
  9. Yang, H.; Wang, J.; Bo, Y.; Wang, J. ISTD-DETR: A Deep Learning Algorithm Based on DETR and Super-Resolution for Infrared Small Target Detection. Neurocomputing 2025, 621, 129289. [Google Scholar] [CrossRef]
  10. Nawaz, S.A.; Li, J.; Li, D.; Shoukat, M.U.; Bhatti, U.A.; Raza, M.A. Medical Image Zero Watermarking Algorithm Based on Dual-Tree Complex Wavelet Transform, AlexNet and Discrete Cosine Transform. Appl. Soft Comput. 2025, 169, 112556. [Google Scholar] [CrossRef]
  11. Chaturvedi, A.; Prabhu, R.; Yadav, M.; Feng, W.-C.; Cao, G. Improved 2-D Chest CT Image Enhancement with Multi-Level VGG Loss. IEEE Trans. Radiat. Plasma Med. Sci. 2025, 9, 304–312. [Google Scholar] [CrossRef]
  12. Jabeen, K.; Khan, M.A.; Hamza, A.; Albarakati, H.M.; Alsenan, S.; Tariq, U.; Ofori, I. An EfficientNet Integrated ResNet Deep Network and Explainable AI for Breast Lesion Classification from Ultrasound Images. CAAI Trans. Intell. Technol. 2025, 10, 842–857. [Google Scholar] [CrossRef]
  13. Zheng, Q.; Zhang, H.; Liu, H.; Xu, H.; Xu, B.; Zhu, Z. Intelligent Prediction Model for Pitting Corrosion Risk in Pipelines Using Developed ResNet and Feature Reconstruction with Interpretability Analysis. Reliab. Eng. Syst. Saf. 2025, 264, 111347. [Google Scholar] [CrossRef]
  14. Hussain, T.; Shouno, H.; Mohammed, M.A.; Marhoon, H.A.; Alam, T. DCSSGA-UNet: Biomedical Image Segmentation with DenseNet Channel Spatial and Semantic Guidance Attention. Knowl. Based Syst. 2025, 314, 113233. [Google Scholar] [CrossRef]
  15. Haruna, Y.; Qin, S.; Chukkol, A.H.A.; Yusuf, A.A.; Bello, I.; Lawan, A. Exploring the Synergies of Hybrid Convolutional Neural Network and Vision Transformer Architectures for Computer Vision: A Survey. Eng. Appl. Artif. Intell. 2025, 144, 110057. [Google Scholar] [CrossRef]
  16. Tang, Q.; Xin, J.; Jiang, Y.; Zhang, H.; Zhou, J. Dynamic Response Recovery of Damaged Structures Using Residual Learning Enhanced Fully Convolutional Network. Int. J. Struct. Stab. Dyn. 2025, 25, 2550008. [Google Scholar] [CrossRef]
  17. Munia, A.A.; Abdar, M.; Hasan, M.; Jalali, M.S.; Banerjee, B.; Khosravi, A.; Hossain, I.; Fu, H.; Frangi, A.F. Attention-Guided Hierarchical Fusion U-Net for Uncertainty-Driven Medical Image Segmentation. Inf. Fusion 2025, 115, 102719. [Google Scholar] [CrossRef]
  18. Qi, Y.; Li, J.; Li, Q.; Huang, Z.; Wan, T.; Zhang, Q. SEGNet: Shot-Flexible Exposure-Guided Image Reconstruction Network. Vis. Comput. 2025, 41, 11491–11504. [Google Scholar] [CrossRef]
  19. Huang, Z.; Pan, Y.; Huang, W.; Pan, F.; Wang, H.; Yan, C.; Ye, R.; Weng, S.; Cai, J.; Li, Y. Predicting Microvascular Invasion and Early Recurrence in Hepatocellular Carcinoma Using DeepLab V3+ Segmentation of Multiregional MR Habitat Images. Acad. Radiol. 2025, 32, 3342–3357. [Google Scholar] [CrossRef] [PubMed]
  20. Shi, J.; Cui, R.; Wang, Z.; Yan, Q.; Ping, L.; Zhou, H.; Gao, J.; Fang, C.; Han, X.; Hua, S. Deep Learning HRNet FCN for Blood Vessel Identification in Laparoscopic Pancreatic Surgery. npj Digit. Med. 2025, 8, 235. [Google Scholar] [CrossRef]
  21. Yang, S.; Wang, Y.; Zhao, K.; Liu, X.; Mu, J.; Zhao, X. Partial Convolution-Simple Attention Mechanism-SegFormer: An Accurate and Robust Model for Landslide Identification. Eng. Appl. Artif. Intell. 2025, 151, 110612. [Google Scholar] [CrossRef]
  22. Dai, Z.; Sun, H.; Zhu, Y.; Wu, B.; Qin, X.; Liu, L.; Jia, J. Label-Fusion Progressive Segmentation of Ore and Rock Particles in Complex Illumination Conditions Based on SAM-Mask2Former. Expert Syst. Appl. 2025, 297, 129300. [Google Scholar] [CrossRef]
  23. Chen, Z.; Luo, C. Towards Real-Time and Robust Lane Detection: A DCN–EMCAM Empowered UFLD Model. Eng. Res. Express 2025, 7, 025295. [Google Scholar] [CrossRef]
  24. Chen, Z.; Sun, Q. Weakly-Supervised Semantic Segmentation with Image-Level Labels: From Traditional Models to Foundation Models. ACM Comput. Surv. 2025, 57, 111. [Google Scholar] [CrossRef]
  25. Kolahi, S.G.; Chaharsooghi, S.K.; Khatibi, T.; Bozorgpour, A.; Azad, R.; Heidari, M.; Hacihaliloglu, I.; Merhof, D. MSA2Net: Multi-Scale Adaptive Attention-Guided Network for Medical Image Segmentation. arXiv 2024, arXiv:2407.21640. [Google Scholar]
  26. Ji, S.; Granato, D.; Wang, K.; Hao, S.; Xuan, H. Detecting the Authenticity of Two Monofloral Honeys Based on the Canny-GoogLeNet Deep Learning Network Combined with Three-Dimensional Fluorescence Spectroscopy. Food Chem. 2025, 485, 144509. [Google Scholar] [CrossRef] [PubMed]
  27. Shi, P.; Yan, S.; Xiao, Y.; Liu, X.; Zhang, Y.; Li, J. RANSAC Back to SOTA: A Two-Stage Consensus Filtering for Real-Time 3D Registration. IEEE Robot. Autom. Lett. 2024, 9, 11881–11888. [Google Scholar] [CrossRef]
  28. Zheng, Y.; Chen, B.; Liu, B.; Peng, C. Milling Cutter Wear State Identification Method Based on Improved ResNet-34 Algorithm. Appl. Sci. 2024, 14, 8951. [Google Scholar] [CrossRef]
  29. Qian, Q.; Wen, Q.; Tang, R.; Qin, Y. DG-Softmax: A New Domain Generalization Intelligent Fault Diagnosis Method for Planetary Gearboxes. Reliab. Eng. Syst. Saf. 2025, 260, 111057. [Google Scholar] [CrossRef]
  30. Moloney, B.M.; Mc Carthy, C.E.; Bhayana, R.; Krishna, S. Sigmoid Volvulus—Can CT Features Predict Outcomes and Recurrence? Eur. Radiol. 2025, 35, 897–905. [Google Scholar] [CrossRef]
  31. Huang, M.; Zhu, X.; Shi, W.; Qin, Q.; Yang, J.; Liu, S.; Chen, L.; Ding, R.; Gan, L.; Yin, X. Manipulating the Coordination Dice: Alkali Metals Directed Synthesis of Co-NC Catalysts with CoN4 Sites. Sci. Adv. 2025, 11, eads6658. [Google Scholar] [CrossRef] [PubMed]
  32. Zhao, W.; Zhu, X.; Shi, H.; Zhang, X.-Y.; Zhao, G.; Lei, Z. Global Cross-Entropy Loss for Deep Face Recognition. IEEE Trans. Image Process. 2025, 34, 1672–1685. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Representative applications of conveyor belts in different operational scenarios.
Figure 1. Representative applications of conveyor belts in different operational scenarios.
Sensors 25 06945 g001
Figure 2. The arrows in (a,b) indicate the mechanical anti-deviation guide rollers located on the left and right sides of the conveyor belt, respectively.
Figure 2. The arrows in (a,b) indicate the mechanical anti-deviation guide rollers located on the left and right sides of the conveyor belt, respectively.
Sensors 25 06945 g002
Figure 3. Architecture of the improved semantic segmentation network.
Figure 3. Architecture of the improved semantic segmentation network.
Sensors 25 06945 g003
Figure 4. U-Net network model.
Figure 4. U-Net network model.
Sensors 25 06945 g004
Figure 5. Structure of the residual block.
Figure 5. Structure of the residual block.
Sensors 25 06945 g005
Figure 6. ResNet34 residual structure.
Figure 6. ResNet34 residual structure.
Sensors 25 06945 g006
Figure 7. Panel (a) shows the segmented conveyor belt region without the MASAG module, where the right side includes the binarized idler area. Panel (b) shows the segmented conveyor belt region with the MASAG module, effectively mitigating the influence of idlers on the edge detection.
Figure 7. Panel (a) shows the segmented conveyor belt region without the MASAG module, where the right side includes the binarized idler area. Panel (b) shows the segmented conveyor belt region with the MASAG module, effectively mitigating the influence of idlers on the edge detection.
Sensors 25 06945 g007
Figure 8. MASAG Module.
Figure 8. MASAG Module.
Sensors 25 06945 g008
Figure 9. Images of the idler roller captured from different perspectives.
Figure 9. Images of the idler roller captured from different perspectives.
Sensors 25 06945 g009
Figure 10. Illustration of two methods for drawing the conveyor belt centerline: (a) the simulated centerline constructed based on idler roller positions; (b) the actual centerline drawn on an in situ image.
Figure 10. Illustration of two methods for drawing the conveyor belt centerline: (a) the simulated centerline constructed based on idler roller positions; (b) the actual centerline drawn on an in situ image.
Sensors 25 06945 g010
Figure 11. (a) Original image of the conveyor belt scene; (b) Result of applying edge detection directly to the original conveyor belt image.
Figure 11. (a) Original image of the conveyor belt scene; (b) Result of applying edge detection directly to the original conveyor belt image.
Sensors 25 06945 g011
Figure 12. Binary image after edge detection.
Figure 12. Binary image after edge detection.
Sensors 25 06945 g012
Figure 13. (a) Schematic illustration of deviation simulation in the conveyor belt region, (b) Schematic diagram of deviation detection in the binary image. (a) Demonstrates the extraction of actual conveyor belt edge lines through edge detection and Probabilistic Hough Transform line fitting, with the calculation of the displacement between the centerline and reference line for simulating deviation status; (b) Shows the algorithm’s detection results of the deviation state on the binary conveyor belt image.
Figure 13. (a) Schematic illustration of deviation simulation in the conveyor belt region, (b) Schematic diagram of deviation detection in the binary image. (a) Demonstrates the extraction of actual conveyor belt edge lines through edge detection and Probabilistic Hough Transform line fitting, with the calculation of the displacement between the centerline and reference line for simulating deviation status; (b) Shows the algorithm’s detection results of the deviation state on the binary conveyor belt image.
Sensors 25 06945 g013
Figure 14. (a) The arrow indicates the installation scenario of the on-site camera; (b) The two arrows point to the supplemental lighting; (c) The arrow indicates the installation position of the on-site camera.
Figure 14. (a) The arrow indicates the installation scenario of the on-site camera; (b) The two arrows point to the supplemental lighting; (c) The arrow indicates the installation position of the on-site camera.
Sensors 25 06945 g014
Figure 15. Normal Lighting Conditions.
Figure 15. Normal Lighting Conditions.
Sensors 25 06945 g015
Figure 16. Strong Lighting Conditions.
Figure 16. Strong Lighting Conditions.
Sensors 25 06945 g016
Figure 17. Low Lighting Conditions.
Figure 17. Low Lighting Conditions.
Sensors 25 06945 g017
Figure 18. Rainy Conditions.
Figure 18. Rainy Conditions.
Sensors 25 06945 g018
Figure 19. (a) Data Distribution Across Environmental Conditions, (b) Train/Validation/Test Split Within Each Environmental Category. Schematic diagram of data distribution.
Figure 19. (a) Data Distribution Across Environmental Conditions, (b) Train/Validation/Test Split Within Each Environmental Category. Schematic diagram of data distribution.
Sensors 25 06945 g019
Figure 20. Loss curves of the Baseline U-Net and the improved RMU-Net during training.
Figure 20. Loss curves of the Baseline U-Net and the improved RMU-Net during training.
Sensors 25 06945 g020
Figure 21. (a) Object detection with diagonal alignment of bounding boxes for belt edge detection, (b) Training curve of YOLOv5s. Conveyor belt edge detection based on object detection.
Figure 21. (a) Object detection with diagonal alignment of bounding boxes for belt edge detection, (b) Training curve of YOLOv5s. Conveyor belt edge detection based on object detection.
Sensors 25 06945 g021
Figure 22. Principle illustration of fast lane detection (UFLD) and its training curve in the present scenario.
Figure 22. Principle illustration of fast lane detection (UFLD) and its training curve in the present scenario.
Sensors 25 06945 g022
Figure 23. Accuracy curves of the proposed method and the baseline.
Figure 23. Accuracy curves of the proposed method and the baseline.
Sensors 25 06945 g023
Figure 24. Recognition comparison under normal lighting conditions.
Figure 24. Recognition comparison under normal lighting conditions.
Sensors 25 06945 g024aSensors 25 06945 g024b
Figure 25. Recognition comparison under strong light conditions.
Figure 25. Recognition comparison under strong light conditions.
Sensors 25 06945 g025
Figure 26. Recognition comparison under rainy conditions.
Figure 26. Recognition comparison under rainy conditions.
Sensors 25 06945 g026
Figure 27. Recognition comparison under low-light conditions.
Figure 27. Recognition comparison under low-light conditions.
Sensors 25 06945 g027
Figure 28. Recognition comparison under severe deviation conditions (the right roller is almost entirely occluded by the belt).
Figure 28. Recognition comparison under severe deviation conditions (the right roller is almost entirely occluded by the belt).
Sensors 25 06945 g028aSensors 25 06945 g028b
Table 1. Training Hyperparameters.
Table 1. Training Hyperparameters.
Training MetricValueValidation MetricValue
Epochs100Betas0.9, 0.999
Batch Size4OptimizerAdamW
Weight Decay0.0001Learning Rate0.0005
Table 2. Experimental Software and Hardware Configuration.
Table 2. Experimental Software and Hardware Configuration.
ComponentSpecification
Operating SystemWindows 10 Professional
Experimental FrameworkPython 3.9.21/PyTorch 2.7.1
CPU14th Gen Intel(R) Core(TM) i9-14900
GPUNVIDIA GeForce RTX 5070
RAM32 G
Table 3. Ablation study results.
Table 3. Ablation study results.
BaselineResNetMASAGLoss(CE + Dice)Loss(CE + Dice + Boundary)Accuracy(%)IoU(%)
99.65599.403
99.75899.516
99.76299.521
99.76899.530
99.77899.538
The check marks (“√”) in the table indicate that the corresponding component or setting is included in that experiment. For example, a check mark under “ResNet” means that the ResNet encoder was used in that configuration.
Table 4. Comparison of different methods in terms of Accuracy, Frame Rate, and Computational Complexity.
Table 4. Comparison of different methods in terms of Accuracy, Frame Rate, and Computational Complexity.
ModelCategoryAccuracy (%)FPSGFLOPs
BaselineSegmentation99.65520226.80
DeepLabV3+Segmentation99.45212378.72
SEU-NetSegmentation99.71518252.62
YOLOv5sObject Detection93.1055681.36
UFLDThe Lane Detection85.1215090.72
OursSegmentation99.77816283.87
Table 5. Mean square error (MSE) between the detected centerline and the reference line under different environmental conditions.
Table 5. Mean square error (MSE) between the detected centerline and the reference line under different environmental conditions.
Environmental ConditionNumber of ImagesMSE
Normal lighting conditions100025.82
Strong light interference50026.71
Rainy environments50028.56
Low-light conditions50027.66
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ma, L.; Han, J.; Dong, C.; Fang, T.; Liu, W.; He, X. Conveyor Belt Deviation Detection for Mineral Mining Applications Based on Attention Mechanism and Boundary Constraints. Sensors 2025, 25, 6945. https://doi.org/10.3390/s25226945

AMA Style

Ma L, Han J, Dong C, Fang T, Liu W, He X. Conveyor Belt Deviation Detection for Mineral Mining Applications Based on Attention Mechanism and Boundary Constraints. Sensors. 2025; 25(22):6945. https://doi.org/10.3390/s25226945

Chicago/Turabian Style

Ma, Long, Jiaming Han, Chong Dong, Ting Fang, Wensheng Liu, and Xianhua He. 2025. "Conveyor Belt Deviation Detection for Mineral Mining Applications Based on Attention Mechanism and Boundary Constraints" Sensors 25, no. 22: 6945. https://doi.org/10.3390/s25226945

APA Style

Ma, L., Han, J., Dong, C., Fang, T., Liu, W., & He, X. (2025). Conveyor Belt Deviation Detection for Mineral Mining Applications Based on Attention Mechanism and Boundary Constraints. Sensors, 25(22), 6945. https://doi.org/10.3390/s25226945

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop