Fish-Tail Bolt Loosening Detection Under Tilted Perspectives

Yu, Junqin; Wu, Qiwen; Xie, Kai; Cao, Yun; Wang, Xiaofei; Wen, Chang; Zhang, Wei

doi:10.3390/electronics14071281

Open AccessArticle

Fish-Tail Bolt Loosening Detection Under Tilted Perspectives

by

Junqin Yu

^1,†

,

Qiwen Wu

^1,†

,

Kai Xie

^1,*

,

Yun Cao

¹

,

Xiaofei Wang

²,

Chang Wen

³

and

Wei Zhang

⁴

¹

School of Electronic Information and Electrical Engineering, Yangtze University, Jingzhou 434023, China

²

School of Information and Mathematics, Yangtze University, Jingzhou 434023, China

³

School of Computer Science, Yangtzeu University, Jingzhou 434023, China

⁴

School of Electronic Information, Central South University, Changsha 410083, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2025, 14(7), 1281; https://doi.org/10.3390/electronics14071281

Submission received: 2 March 2025 / Revised: 19 March 2025 / Accepted: 23 March 2025 / Published: 24 March 2025

(This article belongs to the Special Issue Deep Learning for Computer Vision, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

As a critical fastener connecting steel rails, fish-tail bolts ensure the safety of railway transportation. To improve the efficiency of fish-tail bolt loosening detection, this paper proposes a computer vision-based method for detecting fish-tail bolt looseness under tilted perspectives. The method first identifies bolt positions and coordinates of corner points on rail clamp edges through object detection and key point detection. Then, considering diverse rail clamp dimensions and combining with bolt positions, it employs dual perspective transformations for image rectification. Finally, utilizing the Lightweight OpenPose network, angle recognition of key bolt edges is achieved through Gaussian ring-shaped smooth labels, with loosening determination made by comparing angular variations across temporal frames. In experimental validation, tests were first conducted on a public dial-reading dataset for pointer angle recognition, showing a minimum average error of only 0.8°, which verifies the algorithm’s feasibility. Subsequently, based on fish-tail bolt images captured under various tilted perspectives, we constructed a self-made dataset of bolt key edges and performed loosening detection experiments. For bolt images in boundary postures, after rotation preprocessing, the average detection error was reduced to 0.7°. When the loosening threshold was set to 2.1°, the detection accuracy reached 97%. Experimental results indicate that the proposed method effectively identifies fish bolt loosening, providing crucial technical reference for railway safety maintenance.

Keywords:

fish-tail bolt; loosening detection; object detection; angle rectification; angle recognition

1. Introduction

The rail clamp, also known as the track joint clamp, is a fastening device used to connect two sections of rail, typically used in conjunction with fishplate bolts to support the connection and stability of rails. The high-frequency vibrations generated during high-speed train operation can loosen the bolts, leading to a loss of fastening effect. If the rail connection becomes unstable, it could result in train derailment during high-speed travel, which would have dire consequences. Therefore, it is particularly important to monitor the condition of the bolts regularly. Traditional methods for detecting loose track bolts generally require manual on-site inspections, including visual inspections, hammer-sounding methods, and torque measurement methods. However, for railways with a large number of bolts spread over a wide area, manual inspections are time-consuming and labor-intensive, and repetitive mechanical work makes it inevitable for people to overlook certain details.

To address the disadvantages of manual inspection in the field of bolt-loosening detection, some scholars have proposed the use of sensors to collect bolt condition information for automated monitoring. For example, ultrasonic sensors have been employed to detect the tightness of the bolts in connected structures [1,2]. Most of these ultrasonic methods are based on pulse echo technology, where ultrasonic pulses are emitted and received to measure the round-trip time of the pulses through the bolt. Different axial forces cause slight variations in travel time, allowing for the identification of bolt tightness. Another approach uses piezoelectric impedance technology to monitor bolt fastening [3], by installing piezoelectric materials on the head of the bolt and utilizing the electromechanical coupling effect of the piezoelectric material to detect changes in the preload. These methods can be used to obtain accurate results and effectively identify bolt loosening. However, owing to the high installation and maintenance costs of detection devices and difficulty in resolving signal attenuation issues, they are not easily scalable for large-scale deployment. Additionally, some methods use acoustic signals generated by tapping [4] and vibration signals [5] to identify bolt loosening. However, these methods are susceptible to noise and it is difficult to improve their robustness.

With the development of computer vision technology and artificial intelligence, image-based bolt loosening identification has become a feasible solution that benefits from ease of deployment and cost-effectiveness. Detection methods based on visual technology can be classified into external marker comparison methods and self-feature comparison methods. External marker comparison methods typically require pre-marking on bolts, and the changes in the markers are monitored to determine the condition of the bolt. Deng Xinjian et al. [6] proposed a bolt loosening recognition method based on marked ellipses and points, which integrates computer vision and geometric imaging theories to calculate the loosening angle of marked bolts; Ji Wang et al. [7] proposed a cascade deep convolutional neural network to extract marker line features on bolts to detect loosening. They used an improved SSD network to locate bolts and PFN to extract marker line regions, and finally determined whether the bolts were loose based on the angle of the marker lines. Jun Luo et al. [8] suggested installing a regular hexagonal cap on the nut as a reference object and detecting the difference between the hexagonal cap and nut to determine the loosening angle. These external marker-based methods rely on marker recognition, susceptible to environmental influences, and are unsuitable for outdoor environments.

Self-feature comparison methods usually focus more on the recognition of the features of bolts, such as edge characteristics. Tianyi Wu et al. [9] proposed a high-resolution cross-scale transformer deep learning backbone to construct a bolt 3D model and extract high-precision key-points. Then, a monocular vision measurement model was established to obtain the exposed length of the bolt and evaluate its connection loosening status. Thanh-Canh Huynh et al. [10] developed a deep learning algorithm based on regional convolutional neural networks, automatically detecting and cropping bolts in connection images combining the Hough line transformation to extract nut edges to calculate the rotation angle of the bolt, and finally determining the loosening by comparing it with a preset threshold. Lu Qizhe et al. [11] proposed a bolt-loosening angle detection method based on machine vision and deep learning. The method first constructs a virtual lab in Unreal Engine 5 to automatically generate and label the synthetic datasets. The YOLOv7 framework was then used to train datasets accurately to recognize bolt key points under different angles and lighting conditions. Finally, the loosening angle was calculated using the key points and their adjacent positional relationships. Daoyuan Song et al. [12] proposed a bolt loosening detection method based on Canny edge detection. The marker regions were extracted through the HSV color space and processed with expansion, erosion, and intermediate filtering, followed by applying the Canny edge detector for edge segmentation, and loosening was determined by fitting the edge slope. Young-Jin Cha et al. [13] proposed a visual bolt loosening detection method. This method captures bolt images using a smartphone camera, extracts features such as bolt head dimensions through image processing techniques like the Hough transform, trains a linear support vector machine classifier to distinguish between tightened and loosened bolts, and evaluates the algorithm’s performance using leave-one-out cross-validation.

During the installation of rail clamps, fish-tail bolts usually need to be installed alternately in opposite directions to maintain balance. To ensure the safety of train operation, inspection cameras cannot be installed inside the track, therefore they can only be placed outside the track to capture images at a downward angle. However, images captured from such an oblique perspective often result in the loss of some bolt features, making it difficult to restore rigid geometric characteristics and extract and filter effective corner features.

Despite the growing application of computer vision-based methods in bolt loosening detection, there is currently no dedicated approach specifically designed for detecting fish-tail bolt looseness in railway environments. The unique challenges posed by railway infrastructure, such as the oblique perspectives of inspection cameras and the structural characteristics of fish-tail bolts, make it difficult to directly apply existing methods.

With the rapid advancement of computer vision and deep learning, image-based detection techniques have emerged as a promising solution, offering numerous advantages, including low operational and maintenance costs, strong generalization capability, and non-contact monitoring. By strategically deploying detection devices at key railway locations—such as stations, bridges, and tunnel entrances—real-time monitoring can be achieved, significantly reducing both time and labor costs while effectively preventing potential safety hazards that may arise from undetected bolt loosening. Moreover, continuous monitoring of bolt loosening in critical areas enhances railway safety and provides valuable data for fish-tail bolt quality assessment and preventive maintenance. By analyzing loosening trends, maintenance teams can take proactive measures to reinforce or replace bolts before failures occur, ultimately extending component lifespan and reducing maintenance costs. Given these benefits, developing an accurate and efficient fish-tail bolt loosening detection method is of great significance for railway safety and operational efficiency.

In summary, for fish-tail bolt loosening detection in the context of an oblique perspective scene, current computer vision-based methods for bolt loosening detection face the following two significant issues:

Restoring the rigid geometric features of the bolt from an oblique perspective is challenging, making it difficult to map the relative angles to true angles.
Extracting and filtering the corner features of the bolt from an oblique perspective is also problematic.

To address the difficulty in angle mapping, this study combines the rigid characteristics of rail clamps with perspective projection algorithms. By constructing and querying a standard clamp size database, an adaptive correction method of “one transformation, two corrections” was designed based on the size characteristics of rail clamps. To tackle the issue of extracting and filtering corner features, this paper combines the angular characteristics of bolts under tilted perspectives and proposes an approach that employs the Lightweight OpenPose network to integrate spatial positional features of bolt edges, enabling angle recognition of key bolt edges based on Gaussian ring-shaped smooth labels.

The remainder of this paper is organized as follows. Section 2 provides a detailed introduction to the methods and processes involved in bolt loosening detection. Section 3 covers the experiments and analysis, including the platform setup, data collection, and analysis of experimental results. Finally, Section 4 concludes the paper.

2. Methodology

2.1. Unique Angle of the Bolt

When observed from the front, the fish-tail bolt has a regular hexagon shape, which perfectly overlaps with its previous state after a 60° rotation. This indicates that without marking the bolt, its orientation can be defined to identify the rotational angle within a 60° range. Within this range, the angle of any edge of the bolt accurately reflects its overall angle. Owing to the fact that fish-tail bolts are typically installed in opposite directions and for the safety of train operation, cameras cannot be installed within the tracks for photography. Therefore, bolts within the tracks need to be photographed and inspected from outside the tracks at an oblique angle. However, under an oblique perspective, although the bolt’s central symmetry allows it to align with its previous state after a 60° rotation, the edges of the bolt are no longer able to directly reflect its overall angle due to the influence of perspective projection. To solve this problem, we can utilize perspective transformation to restore the geometric relationship of some edges of the bolt in a frontal view, a process illustrated in Figure 1. By identifying the angles of the key edges of the bolt image after perspective transformation from different oblique perspectives, we can obtain the true angle of the bolt.

2.2. Bolt Loosening Detection Procedure

The core of the fish-tail bolt loosening detection method proposed in this study is to compare the differences in the key edge angles of the bolt between different time frames to determine whether the bolt has loosened, provided that differences exceed a certain threshold. The detection process is illustrated in Figure 2 and consists of three sequential steps: fish-tail bolt localization, bolt image rectification, and bolt loosening detection.

Initially, object detection and keypoint extraction were performed on the captured rail images, identifying the rail clamp, nut, bolt head, and corner points of the rail clamp edges. Subsequently, based on the corner coordinates and the positional relationship of the bolt bounding boxes, combined with the size standards of the rail clamp, perspective transformation was applied to the image to restore the geometric relationship lost owing to the tilted viewing angles between the bolt’s key edges and the stretcher bar. Subsequently, the transformed bolt image was cropped, and the key edge angles were identified. In the figure,

t_{1}

and

t_{2}

represent different time frames. By comparing the angle differences of the same bolt position at different time frames, bolt loosening can be detected. To reduce false detections caused by angle recognition errors, a loosening threshold is introduced. A bolt is only determined to be loose if the angle difference reaches the predefined threshold. Through repeated experiments, this threshold is set to three times the minimum average error of the key edge angle recognition model.

2.3. Object Detection

Object detection is a computer vision algorithm used to identify the location of specific objects in an image. In this study, we employed an object detection algorithm to locate rail clamps and fish-tail bolts. Object detection algorithms can be broadly classified into two categories: one-stage and two-stage. Two-stage object detection typically involves two separate steps: candidate region extraction and target information regression. Conversely, one-stage object detection omits candidate region extraction and directly predicts the location and category information of a target through regression. While two-stage object detection often achieves higher accuracy in complex scenes, it also requires higher computational overhead. For simpler scenes such as rail clamp and fish-tail bolt detection, one-stage object detection algorithms are undoubtedly a better choice.

Among the one-stage object detection algorithms, the YOLO series [14] stands out because of its excellent balance of speed and accuracy, making it a preferred choice for many object detection scenarios. In this study, we selected YOLOv8n to detect rail clamps and bolts. Although YOLOv8 [15] is not the latest version of the YOLO series, its exceptional stability makes it a top choice for current object detection algorithms. Figure 3 illustrates the network structure of YOLOv8, which is divided into three parts: backbone, neck, and head networks. The image was first resized to 640 × 640 and input into the backbone network for feature extraction. The backbone network then outputs feature maps of three scales, 80 × 80, 40 × 40, and 20 × 20, to the neck network. The neck network performs feature fusion and enhancement on these three scales of feature maps and similarly outputs three scales of feature maps to the head network. Finally, the head network predicted the target locations and categories of small, medium, and large objects in the image based on the three scales of the feature maps through two separate branches. Due to the fusion of features from different scales and separate predictions for targets of different sizes, YOLOv8 achieved good results for small targets that are difficult to detect.

2.4. Angle Correction

In images of rails captured from an inclined angle, the geometric relationship between the bolts and rail clamps is not represented as it would be in images taken perpendicular to the bolts (orthographic projection). Therefore, the angle of the bolt edges did not accurately reflect the true angle of the bolts. To obtain the true angle of the bolts, it is necessary to use a perspective transformation [16] to restore the geometric relationship between the bolts and rail clamps under orthographic projection. Perspective transformation is a mathematical tool that maps image coordinate points to a new view plane, and the mapping formula can be expressed as:

(\begin{matrix} x_{l} \\ y_{l} \\ z_{l} \end{matrix}) = H \cdot (\begin{matrix} x_{r} \\ y_{r} \\ z_{r} \end{matrix}), H = (\begin{matrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & h_{33} \end{matrix})

(1)

Here,

{(x_{r}, y_{r}, z_{r})}^{T}

represents the homogeneous coordinates of the original image,

{(x_{l}, y_{l}, z_{l})}^{T}

represents the homogeneous coordinates of the transformed image, and H is a

3 \times 3

perspective transformation matrix. Although the H matrix has nine unknowns, the normalization of the transformed homogeneous coordinates at the end ensures that multiplying anything other than zero on the right side of the equation does not affect the result. When solving for the H matrix, it is common to multiply the H matrix by

h_{33}

, constraining the number in the third row and third column of the H matrix to one, and the homogeneous coordinates of the original image are generally initialized to one. Therefore, the perspective transformation formula can also be expressed as:

(\begin{matrix} x_{l} \\ y_{l} \\ z_{l} \end{matrix}) = (\begin{matrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & 1 \end{matrix}) \cdot (\begin{matrix} x_{r} \\ y_{r} \\ 1 \end{matrix})

(2)

The above formula can be expressed in the form of the following system of equations:

\{\begin{matrix} x_{a} = \frac{x_{l}}{z_{l}} = \frac{h_{11} x_{r} + h_{12} y_{r} + h_{13}}{h_{31} x_{r} + h_{32} y_{r} + 1} \\ y_{a} = \frac{y_{l}}{z_{l}} = \frac{h_{21} x_{r} + h_{22} y_{r} + h_{23}}{h_{31} x_{r} + h_{32} y_{r} + 1} \end{matrix}

(3)

A pair of coordinates before and after mapping can provide two equations in the H matrix; at least four pairs of coordinates are required to solved the eight unknowns in the H matrix. Here, the four coordinates before mapping are the four corner points of the rail clamp, whereas the four coordinates after mapping are generated based on the dimensions of the clamp with the top-left corner as the origin. This study used keypoint detection to obtain the four corner coordinates of the rail clamp without the need for additional model algorithms. As shown in Figure 4, by adding a keypoint detection branch to the output branches of the YOLOv8 head network and referencing YOLO-Pose [17], it is possible to detect the four corner points of the rail clamp while detecting the clamp itself. The newly added keypoint prediction branch has a structure identical to that of other two branches, and the number of output channels is the number of keypoints multiplied by three.

After obtaining the four corner coordinates of the rail clamp, it was also necessary to generate the corner coordinates in a frontal view. To achieve this, we need to determine the dimensions of the clamp. Different rail clamps have their own standard sizes, so it is necessary to perform a “one transformation, two correction” using the bolt target detection box. One transformation refers to the initial use of the four corner points of the rail clamp target box as mapped coordinates to perform a perspective transformation on the bolt positions, restoring their horizontal positional relationships. Then, using the horizontal positional relationship of the bolts, that is the proportional relationship of the x coordinates of the centers of each bolt target box, combined with the number of bolts within the rail clamp, the rail clamp model can be determined. The four corner coordinates of the mapping are then generated based on the width and height of the rail clamp to perform a second perspective transformation, correcting the image.

2.5. Angle Recognition of Key Bolt Edges

In traditional approaches, methods such as edge detection and line detection are typically employed to locate, filter, and recognize angles of bolt edges. However, experiments revealed that robustness struggles to improve due to the influence of viewing angles and lighting conditions. Additionally, under tilted perspectives, distinct bolt edges are challenging to differentiate and filter. Consequently, we aimed to develop an algorithm capable of distinguishing, localizing, and recognizing angles of bolt key edges while exhibiting strong robustness to varying perspectives and illumination environments. Deep learning algorithms based on convolutional neural networks (CNNs) emerged as a compelling choice.

Directly using regression networks to predict angles of bolt key edges, however, risks overfitting and training instability due to the periodic nature of angular data. To address this, our angle prediction design draws inspiration from Gaussian Circular Smooth Labels (GCSL) [18], where each angle is assigned a smoothed label value and converted into a feature map for prediction. For network architecture, we incorporate the multi-stage refinement philosophy and large receptive field concepts from OpenPose [19,20], alongside the lightweight design of Lightweight OpenPose [21], integrating spatial positional information of bolt key edges to refine angle learning. As shown in Figure 5, two types of feature maps are designed for each bolt key edge: a Gaussian heatmap and a Gaussian circular smooth label map. The Gaussian heatmap guides the network through multi-stage refinement by providing spatial positional features, while the Gaussian circular smooth label map predicts the angular orientation of the bolt key edge.

OpenPose is originally a multi-stage CNN architecture designed for human pose estimation, and Lightweight OpenPose represents its lightweight variant. In its original form, the network includes two output branches at each stage—one for predicting human keypoints and another for their interconnections. In this study, these branches are repurposed to predict spatial positional information and angular information of bolt key edges. However, since only angular comparisons are required in the final output, the last stage retains a single branch dedicated to angle prediction.

As shown in Figure 6, the network operates in a cascaded manner. The first stage employs the lightweight MobileNetV1 backbone to extract hierarchical features from input images. In the second stage, depthwise separable convolutions are used to rapidly compress and reduce the feature dimensions, optimizing computational efficiency. The third stage generates the initial spatial position and angular information of the bolt key edges, based on the compressed features. In the fourth stage, the outputs from the second and third stages are fused and refined using dilated convolutions, expanding the receptive fields to capture broader contextual relationships. Ultimately, the network produces a Gaussian circular label map that encodes the angular deviations for each key bolt edge.

2.6. Bolt Loosening Detection

After processing with the bolt angle recognition model, Gaussian circular smooth label maps of the two highest-priority key edges of the bolt can be obtained. Prior to angle calculation, the feature maps are first flattened:

V (k) = F (i, j), k = i^{*} s + j, i, j \in [0, s - 1]

(4)

where F represents the predicted Gaussian circular smooth label map, and s denotes the side length of the feature map. The predicted key edge angle is then derived by projecting the maximum value in the smoothed label onto the angular interval:

α = \frac{arg max_{k} V (k)}{s^{2} - 1} * 180

(5)

Finally, the bolt loosening angle is determined by comparing the average angular difference of key edges across distinct temporal frames:

γ = \frac{1}{n} \sum_{i = 1}^{n} min (|α i_{t 2} - α i_{t 1}|, 180 - |α i_{t 2} - α i_{t 1}|)

(6)

where n is the number of key edges,

t_{1}

and

t_{2}

represent different temporal frames, and

α_{i}

corresponds to distinct key edges. Due to inherent errors in the angle recognition model, a threshold must be defined to identify bolt loosening. A bolt is flagged as potentially loosened when the loosening angle exceeds this threshold. In this study, the threshold is set to three times the mean error of the angle recognition model results.

3. Experimental Results and Discussion

3.1. Platform Setup and Data Collection

To ensure maximum data diversity and improve the generalization ability of the model, a data collection platform was built, as shown in Figure 7. The platform consists of a bracket, consumer-grade camera, rail block, rail clamp, and various fishtail bolt models. Based on this platform, images of fish-tail bolts were captured under different environments, lighting conditions, and viewing angles. Additionally, publicly available railway scene images from the internet were collected. In total, 815 images were compiled into the dataset, as illustrated in Figure 8. Then, as shown in Figure 9, the target bounding boxes were annotated for the rail clamp, nuts, and bolt heads in the images, and key points were annotated for the edge corners of the rail clamps. Next, perspective transformation was applied to the images based on the dimensions of the rail clamp, and the transformed bolt regions were cropped. Straight-line annotation was performed on the edges requiring detection in the bolt images, and two feature maps were generated for each edge: a Gaussian heatmap of the edge and a Gaussian circular smooth label map for angle estimation. The annotation process for key edge angle recognition is illustrated in Figure 10. The Gaussian heatmap for the edges was constructed through the following steps:

First, the direction vectors of the edge in the x and y directions were computed from the two endpoints of the edge:

\{\begin{matrix} d x = x_{2} - x_{1} \\ d y = y_{2} - y_{1} \end{matrix}

(7)

where

x_{1}, y_{1}, x_{2},

and

y_{2}

represent the coordinates of the endpoints of a line segment.

Next, we calculated the length ratio of the projection for each point on the edge:

k (x, y) = \frac{(x - x_{1}) \cdot d x + (y - y_{1}) \cdot d y}{d^{2} x + d^{2} y}

(8)

The projection coordinates of each point on the edge are obtained based on the length ratio:

\{\begin{matrix} p_{x} = x_{1} + k \cdot d x \\ p_{y} = y_{1} + k \cdot d y \end{matrix}

(9)

The three Gaussian values corresponding to each point are then calculated using the two endpoints of the edge and the projected coordinates of each point as the center of the Gaussian distribution:

\{\begin{matrix} g_{1} (x, y) = e - \frac{{(x - x_{1})}^{2} + {(x - y_{1})}^{2}}{2 σ^{2}} \\ g_{2} (x, y) = e - \frac{{(x - x_{2})}^{2} + {(x - y_{2})}^{2}}{2 σ^{2}} \\ g_{3} (x, y) = e - \frac{{(x - p_{x})}^{2} + {(x - p_{y})}^{2}}{2 σ^{2}} \end{matrix}

(10)

Simultaneously calculate the distances between the projection point and the two endpoints of the edge, as well as the distance between the two endpoints:

\{\begin{matrix} d_{1} = {(p_{x} - x_{1})}^{2} + {(p_{y} - y_{1})}^{2} \\ d_{2} = {(p_{x} - x_{2})}^{2} + {(p_{y} - y_{2})}^{2} \\ d = {(x_{2} - x_{1})}^{2} + {(y_{2} - y_{1})}^{2} \end{matrix}

(11)

Finally, we determined whether the projection point lies on the edge by checking whether the sum of the distances from the projection point to the two endpoints equals the distance between the endpoints. If the projection point is on the edge, select the Gaussian value with the projection point as the Gaussian distribution center. if the projection point is outside the edge, select the Gaussian value with the nearest endpoint as the Gaussian distribution center:

g (x, y) = \{\begin{matrix} g_{1} (x, y), & d_{1} + d_{2} > d a n d d_{1} < d_{2} \\ g_{2} (x, y), & d_{1} + d_{2} > d a n d d_{1} > d_{2} \\ g_{3} (x, y), & d_{1} + d_{2} < = d \end{matrix}

(12)

For the Gaussian circular smooth label map, a 1D circular smooth label is first generated for each key edge:

GCSL (x) = e^{- \frac{{(\frac{180 * x}{s^{2}} - μ)}^{2}}{2 σ^{2}}}, x \in [0, s * s - 1]

(13)

where

μ

is the true angle of the bolt key edge, s denotes the side length of the feature map, x is the angular encoding range, and

σ

controls the smoothing factor. This 1D label is then reshaped into a 2D Gaussian circular smooth label map through tensor reshaping:

F (i, j) = G C L S (x), i = x ∣ s, j = x % s

(14)

The labeled images were used to train and validate the models. All models were trained and validated based on the pytorch-2.0.1 deep learning framework, and all experiments were performed on an Intel Core i5-12400F processor (Intel, USA) and GeForce RTX 4060 GPU (NVIDIA, USA) computer platform.

3.2. Analysis of Object Detection and Keypoint Detection Results

During the training of the YOLOv8n-pose model, we divided the 815 images into an 80% training set (652 images) and a 20% validation set (148 images). The training set was input into the network in batches for training, and after each training epoch, the model’s performance was validated using the validation set. In the field of object and keypoint detection, the mean Average Precision (mAP) is commonly used as a performance metric. The mAP value was derived from two other evaluation metrics precision and recall. It is the average of the area enclosed by the Precision-Recall (PR) curve and the coordinate axes in a PR curve plot, showing how precision varies with increasing recall for each class. In object detection, the accuracy of predicted bounding boxes is typically evaluated using Intersection over Union (IoU), which is calculated as the ratio of the area of overlap between the predicted box and the ground truth box to the area of their union. Generally, an IoU value greater than 0.5 is considered a valid prediction. Conversely, in keypoint detection, Object Keypoint Similarity (OKS) is used to determine the validity of predicted keypoints, with an OKS value greater than 0.5 typically indicating that the keypoint has been correctly predicted. On this basis, precision measures the proportion of correctly predicted results out of all predictions, while recall reflects the proportion of correctly predicted results out of all actual results that should have been predicted. However, precision may overlook false negatives, whereas recall may overlook false positives. mAP takes both precision and recall into account, avoiding the limitations of single metrics. Its value ranges from 0 to 1, with higher values indicating better model performance, making it easy to understand and compare. Therefore, mAP is a relatively comprehensive evaluation metric. Figure 11 shows the mAP curves for the target box and keypoints as the number of training sessions increases. It can be seen that the mAP values continue to improve with training. Due to the use of the pre-trained model from the COCO dataset, the model is fitted faster, the target frame is basically fitted when it is close to 20 rounds and the keypoints when it is close to 15 rounds, and the highest mAP value reaches 0.98, which indicates that the model is good at detection.

3.3. Analysis of Perspective Transformation Results

After object and keypoint detection, the image can be corrected based on the detected bounding box positions and corner coordinates. The experiment established a rail clamp size library for adaptive correction, which documents the width, height, number of bolt holes, and the position ratio of each bolt hole from left to right on various rail clamp models, ranging from light rail clamps of 9 kg/m to heavy rail clamps of 75 kg/m. First, the midpoint of the x-coordinates and minimum value of the y-coordinates of the target frame of the bolt were are used as the coordinates of the bolt position relationship. The four corner points of the rail clamp and the rectangle that circumscribes the clamp serve as mapping points for calculating the first perspective transformation matrix. This transformation matrix adjusts the bolt position coordinates to restore the horizontal position relationship of the bolts. Subsequently, using the transformed coordinates, the closest rail clamp model was found in the size library, and its width and height were extracted to calculate the aspect ratio. Finally, using the width of the circumscribed rectangle of the rail clamp in the image as a baseline, a new rectangle was formed using the aspect ratio. Four corner points of the new rectangle and the steel rail clamp’s four corner points are used to calculate a second perspective transformation matrix. Applying this transformation matrix to the image produces a corrected steel rail clamp image. As shown in Figure 12, in the corrected image, the angles between the key edges of bolts in the images are restored to 120°, demonstrating a good correction effect.

3.4. Analysis of Critical Edge Angle Recognition Results

In the experiments on key edge angle recognition for bolts, due to the absence of a public fish-tail bolt dataset and the limitations of existing bolt loosening detection methods in handling fish-tail bolts under tilted perspectives, a comparative validation experiment was conducted using a public dial dataset containing 523 frontal dial images. This experiment aimed to evaluate the performance of the proposed method against existing bolt loosening detection methods in key edge angle recognition, with the comparison results shown in Table 1. Bolt loosening detection methods based on mainstream keypoint detection suffer from precision loss during the generation of keypoint heatmaps, resulting in higher average angle errors. Methods based on edge detection and line detection require manual selection of key edges, making them ineffective in handling special cases and limiting their generalization capability. Benefiting from the strong fitting ability of deep learning and the end-to-end network design for key edge angle recognition, the proposed method achieves superior accuracy and generalization compared to existing approaches. The minimum average angle error is only 0.8°, with a maximum error not exceeding 4°, demonstrating the model’s effectiveness in identifying key edges and predicting their angles.

For training the bolt key edge angle recognition model, the collected 1300 bolt images were split into an 80% training set and a 20% validation set.The training data were fed into the network in batches. The network outputs feature maps at both the third and fourth stages. To mitigate gradient vanishing, losses were computed using outputs from both stages; however, only the final stage’s output was used for inference. The loss function used is the Mean Squared Error (

M S E

), defined as follows:

M S E = \frac{1}{m} \sum_{i = 1}^{m} {(y_{i} - \hat{y_{i}})}^{2}

(15)

where

y_{i}

is the predicted feature map result, and

\hat{y_{i}}

is the label feature map.

M S E

measures the model’s performance by calculating the average squared distance between each point in the predicted feature map and the label feature map. The

M S E

function has a smooth curve, making it easy to solve and optimize. It also amplifies large differences, which benefits the model’s learning and accelerates the fitting process.

The angular error and feature map mean squared error (

M S E

) during the training process are illustrated in Figure 13. As the number of training epochs increases, the

M S E

between the predicted results and ground-truth labels progressively decreases, reaching a minimum value of 0.0027, while the mean angular error achieves a minimum of 2.16°. To evaluate the impact of the multi-stage refinement architecture and spatial positional features on model performance, we conducted comparative experiments under three configurations: a single-stage model predicting Gaussian annular smoothed label maps; a two-stage model predicting Gaussian annular smoothed label maps; and a two-stage model predicting Gaussian heatmaps combined with Gaussian annular smoothed label maps.

The performance comparisons of these configurations are summarized in Table 2. The incorporation of both the multi-stage architecture and intermediate supervision with spatial positional features significantly reduces the angular error and feature map

M S E

, demonstrating that the multi-stage refinement structure and the integration of spatial information effectively enhance the accuracy of bolt key edge angle recognition.

In the experiments, we found that since bolts are rotationally symmetric objects, their poses may overlap with previous ones after rotating around the central axis by a certain angle. At the periodic angular boundary conditions, the semantics of the key edges of the bolt may become ambiguous, which is the primary source of errors. By performing angle recognition on both the original bolt image and the image rotated by a certain angle, and comparing the results, we can effectively determine whether the bolt is in a boundary pose while improving the accuracy of angle prediction, with the minimum angular error reaching as low as 0.7°. Additionally, thanks to the design philosophy of the multi-stage network architecture, the model effectively learns the relationships between different edges. As shown in Figure 14, the feature maps from the third stage exhibit slight ambiguity in distinguishing the two target edges, whereas outputs from the fourth refinement stage resolve this confusion, enabling clear differentiation between edges.

3.5. Loosening Detection Test Result Analysis

To verify the effectiveness of the method proposed in this paper for detecting bolt loosening in rail clamps, the following loosening detection test was designed: first, baseline images of bolts in different positions were taken when they were not loosened. Each bolt was then loosened by 3°, 5°, 10°, 20°, 30°, and 40°, and images were captured for each case. In addition, the above steps were repeated at camera overhead angles of 20°, 30°, 40°, 50°, and 60° to obtain images of the bolts at different positions and loosening angles. Using the method described in this paper, the loosening angle for each image was determined sequentially, and the loosening detection threshold was calculated as triple the average error. The loosening detection rates, which represent the probability of a loosening condition being successfully detected, are shown in Figure 15. The figure shows that the average loosening detection rate reached 97% across different overhead angles, demonstrating that the proposed method can effectively identify loosened bolts in rail clamps. Although there is a slight trend of decreasing detection rates as the overhead angle increases, the variations are small, indicating that the method has a good adaptability to different overhead angles.

4. Conclusions

This paper presents a computer vision-based approach for detecting loosening fish-tail bolts from an oblique perspective, addressing the challenges of angle distortion and the difficulty of identifying critical edges in tilted views. The proposed method consists of three steps: object detection with keypoint detection, image correction, and critical edge angle recognition. It identifies the angle of the critical edges of the bolt by comparing the differences in these angles across different timeframes to detect loosening. This method effectively solves the issues of angle distortion and the recognition of critical edge angles in oblique views. We constructed a fish-tail bolt loosening detection platform and conducted experimental verification, with the following results:

We collected a dataset of 815 rail clamp images under various viewpoints, environments, and lighting conditions, and trained these images using the YOLOv8n-pose model to identify the positions of rail clamps, nuts, bolt heads, and the key corner points of the rail clamp edges. Both the object detection bounding box and keypoint map achieved accuracies of 0.99.
Utilizing the positional information of the bolt bounding boxes and corner coordinates of the rail clamp edges, a perspective transformation was first applied to the bolt positions to restore their horizontal positional relationships. The rail clamp model was then determined by combining these horizontal relationships with a library of clamp dimensions. A second perspective transformation was applied to the image based on the dimensions of the rail clamp to correct the image. After correction, the included angle of the two highest back edges of the bolts was restored to approximately 120°.
Gaussian heatmaps and Gaussian annular smoothed label maps were generated for the highest two key edges in each of the 1300 corrected bolt images. A Lightweight OpenPose network was trained for key edge angle recognition, and after applying rotational preprocessing to boundary-case pose bolt images, the system achieved a minimum average angular error of 0.7°.
We conducted looseness detection experiments on bolts with different degrees of loosening in various oblique views. With a loosening threshold of 2.1°, the proposed method achieved an average loosening detection rate of 97%.

Experiments demonstrated that the proposed method can effectively identify the loosening of fish-tail bolts in rails. However, because the image correction is based on the four corner points of the rail clamp, this method is only applicable to standard rail clamps and cannot be used for special-shaped rail clamps. Additionally, as the angle of inclination increased, the edges of the bolts became blurred, leading to greater angle recognition errors and a decrease in the loosening detection rate. Future research needs to focus on optimizing the key edge angle recognition algorithm to improve its accuracy.

Author Contributions

J.Y. proposed the research framework and led the manuscript drafting. Q.W. conducted in-depth model optimization and spearheaded the experimental design and execution. K.X. made significant contributions to project oversight and investigation. Y.C. contributed the original dataset and provided valuable suggestions during the research. X.W. demonstrated exceptional competency in data analysis, ensuring result reliability. C.W. devoted key efforts to formal analysis. W.Z. played an indispensable role in project management, supervision, and resource coordination. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 62373372) and the National Natural Science Foundation of China (Grant No. 62272485). Furthermore, we acknowledge that the main research activities described in this paper were conducted as part of the Innovative Entrepreneurship Undergraduate Training Programme [Grant No. Yz2024410] at Yangtze University.

Data Availability Statement

The data are unable to be shared at this time, respecting privacy concerns.

Acknowledgments

We gratefully acknowledge and express our deepest appreciation to all the members who generously contributed their time, efforts, and expertise to participate in this work, making it possible to achieve our goals and produce meaningful results.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Luo, L.; Fan, J.; Ma, J.; Chen, W.; Song, W.; Bao, S. A nonlinear ultrasonic wave mixing method for looseness detection of bolted joints. Ultrasonics 2024, 142, 107402. [Google Scholar] [CrossRef] [PubMed]
Fu, T.; Chen, P.; Yin, A. A new axial stress measurement method for high-strength short bolts based on stress-dependent scattering effect and energy attenuation coefficient. Sensors 2022, 22, 4692–4710. [Google Scholar] [CrossRef] [PubMed]
Kim, Y.S.; Na, W.S. Development of a portable damage detection system based on electromechanical impedance technique for monitoring of bolted joint structures. J. Intell. Mater. Syst. Struct. 2022, 33, 2507–2519. [Google Scholar] [CrossRef]
Hao, B.; Su, Y.; Yan, X.; Zhao, Y. Bolt looseness identification using a knocking acoustic signal. J. Phys. Conf. Ser. 2023, 2658, 012027. [Google Scholar] [CrossRef]
Wang, S.; Wu, J.; Hou, J.; Zhang, D.; Hou, X.; Du, Y.; Yu, J. Research on rapid detection method of tower bolt looseness based on vibration signal analysis at different measurement points. In Proceedings of the 2024 International Conference on Optoelectronic Information and Optical Engineering (OIOE 2024), Kunming, China, 8–10 March 2024; SPIE: San Francisco, CA, USA, 2024; pp. 164–168. [Google Scholar]
Deng, X.; Liu, J.; Gong, H.; Huang, J. Detection of loosening angle for marked bolted joints with computer vision and geometric imaging. Autom. Constr. 2022, 142, 104517. [Google Scholar] [CrossRef]
Wang, J.; Li, L.; Zheng, S.; Zhao, S.; Chai, X.; Peng, L.; Qi, W.; Tong, Q. A detection method of bolts on axlebox cover based on cascade deep convolutional neural network. CMES-Comput. Model. Eng. Sci. 2022, 134, 1671–1706. [Google Scholar]
Luo, J.; Zhao, J.; Sun, Y.; Liu, X.; Yan, Z. Bolt-loosening detection using vision technique based on a gray gradient enhancement method. Adv. Struct. Eng. 2023, 26, 668–678. [Google Scholar] [CrossRef]
Wu, T.; Shang, K.; Dai, W.; Wang, M.; Liu, R.; Zhou, J.; Liu, J. High-resolution cross-scale transformer: A deep learning model for bolt loosening detection based on monocular vision measurement. Eng. Appl. Artif. Intell. 2024, 133, 108574. [Google Scholar]
Huynh, T.-C.; Park, J.H.; Jung, H.J.; Kim, J.T. Quasi-autonomous bolt-loosening detection method using vision-based deep learning and image processing. Autom. Constr. 2019, 105, 102844. [Google Scholar]
Lu, Q.; Jing, Y.; Zhao, X. Bolt Loosening Detection Using Key-Point Detection Enhanced by Synthetic Datasets. Appl. Sci. 2023, 13, 2020. [Google Scholar] [CrossRef]
Song, D.; Xu, X.; Cui, X.; Ou, Y.; Chen, W. Bolt looseness detection based on Canny edge detection algorithm. Concurr. Comput. Pract. Exp. 2023, 35, e7713. [Google Scholar]
Cha, Y.-J.; You, K.; Choi, W. Vision-based detection of loosened bolts using the Hough transform and support vector machines. Autom. Constr. 2016, 71, 181–188. [Google Scholar]
Terven, J.; Córdova-Esparza, D.M.; Romero-González, J.A. A comprehensive review of YOLO architectures in computer vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Varghese, R.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar]
Lao, W.; Cui, C.; Zhang, D.; Zhang, Q.; Bao, Y. Computer vision-based autonomous method for quantitative detection of loose bolts in bolted connections of steel structures. Struct. Control. Health Monit. 2023, 1, 1366–1375. [Google Scholar] [CrossRef]
Maji, D.; Nagori, S.; Mathew, M.; Poddar, D. Yolo-pose: Enhancing yolo for multi person pose estimation using object keypoint similarity loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 2637–2646. [Google Scholar]
Yang, X.; Yan, J. On the arbitrary-oriented object detection: Classification based approaches revisited. Int. J. Comput. Vis. 2022, 130, 1340–1365. [Google Scholar]
Cao, Z.; Simon, T.; Wei, S.E.; Sheikh, Y. Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7291–7299. [Google Scholar]
Cao, Z.; Hidalgo, G.; Simon, T.; Wei, S.E.; Sheikh, Y. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 172–186. [Google Scholar] [PubMed]
Osokin, D. Real-time 2D multi-person pose estimation on CPU: Lightweight OpenPose. In Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods, Funchal, Portugal, 16–18 January 2018. [Google Scholar]

Figure 1. Angle perspective correction.

Figure 2. Loosening detection process of fishtail bolt.

Figure 3. YOLOv8 network structure diagram.

Figure 4. Keypoint detection branch.

Figure 5. Key edge feature map of bolt.

Figure 6. Lightweight OpenPose network structure.

Figure 7. Experimental platform.

Figure 8. Fish-tail bolt dataset.

Figure 9. Target box and key corner labeling.

Figure 10. Key edge angle recognition annotation.

Figure 11. The map of the target frame and key points in the training process.

Figure 12. Comparison of key edge angles before and after correction.

Figure 13. Angle error and mean square error of feature map in training process.

Figure 14. Comparison of two-stage feature maps.

Figure 15. Looseness detection rate under different overlooking angles.

Table 1. Performance comparison with existing bolt loosening detection methods.

Algorithm	Maximum Angle Error (°)	Average Angle Error (°)
Keypoint detection	8.55	2.03
Edge detection + Line detection	14.6	1.67
Proposed method	3.83	0.80

Table 2. Performance comparison of different models.

Algorithm	Feature Map MSE	Angle Difference (°)
Single-stage model predicting Gaussian annular smoothed label maps	0.00656	3.956
Two-stage model predicting Gaussian annular smoothed label maps	0.00563	2.897
Two-stage model predicting Gaussian heatmaps combined with Gaussian annular smoothed label maps	0.00273	2.16

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, J.; Wu, Q.; Xie, K.; Cao, Y.; Wang, X.; Wen, C.; Zhang, W. Fish-Tail Bolt Loosening Detection Under Tilted Perspectives. Electronics 2025, 14, 1281. https://doi.org/10.3390/electronics14071281

AMA Style

Yu J, Wu Q, Xie K, Cao Y, Wang X, Wen C, Zhang W. Fish-Tail Bolt Loosening Detection Under Tilted Perspectives. Electronics. 2025; 14(7):1281. https://doi.org/10.3390/electronics14071281

Chicago/Turabian Style

Yu, Junqin, Qiwen Wu, Kai Xie, Yun Cao, Xiaofei Wang, Chang Wen, and Wei Zhang. 2025. "Fish-Tail Bolt Loosening Detection Under Tilted Perspectives" Electronics 14, no. 7: 1281. https://doi.org/10.3390/electronics14071281

APA Style

Yu, J., Wu, Q., Xie, K., Cao, Y., Wang, X., Wen, C., & Zhang, W. (2025). Fish-Tail Bolt Loosening Detection Under Tilted Perspectives. Electronics, 14(7), 1281. https://doi.org/10.3390/electronics14071281

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fish-Tail Bolt Loosening Detection Under Tilted Perspectives

Abstract

1. Introduction

2. Methodology

2.1. Unique Angle of the Bolt

2.2. Bolt Loosening Detection Procedure

2.3. Object Detection

2.4. Angle Correction

2.5. Angle Recognition of Key Bolt Edges

2.6. Bolt Loosening Detection

3. Experimental Results and Discussion

3.1. Platform Setup and Data Collection

3.2. Analysis of Object Detection and Keypoint Detection Results

3.3. Analysis of Perspective Transformation Results

3.4. Analysis of Critical Edge Angle Recognition Results

3.5. Loosening Detection Test Result Analysis

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI