1. Introduction
With the rapid development of industrial modernization and significant economic improvement, industrial equipment is developing towards large-scale production. Consequently, the transportation of this equipment imposes stricter standards for safety, speed, and quality assurance. Compared to roadway and waterway transport, railway transport is more secure and reliable, less susceptible to weather conditions, and reasonably priced to maintain transportation speed. In railway transportation, exceptional heavy-duty railway freight cars are often required for some oversize and overweight instruments and equipment [
1]. Exceptional heavy-duty railway freight cars generally use multi-stage load-bearing under-frame structures and multi-axle bogies, and their loading profiles often exceed the limits of ordinary trains. Such freight cars include depressed-center flat cars, long-large flat cars, and well-hole cars. Exceptional heavy-duty railway freight cars exceeding the railway gauge in transport may endanger traffic safety on adjacent lines and even cause train derailment accidents [
2,
3,
4].
Out-of-gauge freight train refers to any portion of the train that exceeds the rolling stock gauge or loading gauge in a specific segment when the longitudinal centerline and train centerline are in the same vertical plane. During a heavy-duty railway freight car operation, the inspection station along the route must repeatedly measure the size of over-limit freight cars to ensure that the loading size is always within the safety limit.
Currently, experienced workers manually measure and inspect the dimensions and structural integrity of freight cars using specialized instruments such as plumb line, level gauge, and measuring ruler. This detection method is cumbersome, time-consuming, carries certain safety risks, and has caused cargo transportation time to increase [
5]. Therefore, identifying and detecting the loading size of railway freight cars is essential for ensuring the safety of railway oversize freight operations and loaded goods.
As shown in
Figure 1, a railway freight car loaded with a transformer at a substation in Shenyang, China. The measurement staff at the inspection station does not detect car length but focuses on the geometry of the center, pos 1#, pos 2#, and pos 3#. These four key positions quickly establish the loading size of the freight car, and eventually, the out-of-gauge detection is judged.
Automatic out-of-gauge detection is urgently required to replace manual detection. The existing detection methods can be divided into laser [
6], structured light [
7], and image-based detection methods [
8]. These detection studies provide technical guarantees for railway safety monitoring.
The laser scanner is a representative three-dimensional (3D) data acquisition device that can quickly and accurately reconstruct the three-dimensional entity of the measured item. This method realizes detection by collecting point cloud data, processing point cloud, and three-dimensional reconstruction. Zhang [
9] suggested a three-dimensional laser scanning method for the volume of a railway tank car (container). The laser transmitter was installed within the tank car, and the laser receiver received the reflected laser through the tank wall to achieve all-around laser scanning. This method can realize the accurate reconstruction of the railway tank car model, but it is not suitable for measuring the external dimensions of freight trains. Si et al. [
10] proposed the application of a laser scanner in large-scale aircraft measurements. However, the aircraft surface is relatively smooth, lacking apparent features and dense point clouds, making detection difficult. Bienert et al. [
11] proposed a method to extract and measure individual trees using laser scanning technology automatically. Duan et al. [
12] proposed a method to reconstruct shield tunnel lining using point clouds. Shatnawi et al. [
13] realized the detection of road ruts using laser scanning. The laser scanner has achieved good performance in an austere environment. However, in complex environments, most dimensional measurement methods for extracting structural key points from 3D point clouds are based on the geometric features of points, with low accuracy. Therefore, the accuracy of the point cloud registration algorithm affects the accuracy of measurement results [
14]. At the same time, the particularity of the shape and structure of the measured equipment poses a higher challenge to point cloud registration technology.
The structured light measurement system is widely applicable to various fields of industrial measurement. Wang et al. [
15] proposed rail profile recognition based on structured light measurement with depth learning and template matching. The advantages of structured light 3D shape measurement are high precision and high resolution, but the measurable size is limited [
16]. In order to solve this scale limitation, Xiao et al. [
17] suggested a large-scale structured light 3D shape measurement method combined with reverse photography. In order to improve the accuracy of structured light detection, Xing et al. [
18] developed a weighted fusion method based on multiple systems. The structured light measurement system achieves high accuracy under indoor close-distance measurement conditions. However, it is vulnerable to intense natural light outdoors, and the remote measurement accuracy is poor, so it is not suitable for the measurement of railway out-of-gauge freight cars.
With the continuous development of computer vision in recent years, relying on image processing has also become a hot spot for research. Liu [
19] presented an algorithm for automatically segmenting railway freight car images. Xie et al. [
20] used a stereo-vision measurement technique to achieve freight train gauge-exceeding detection. Chen [
21] extracted the train image by processing the video with the three-frame difference method and the Canny operator and finally extracted the vehicle contour information by the minimum rectangle method. Similarly, Han et al. [
22] developed a Canny-based edge-detection algorithm to extract and measure the end contours of railway freight cars. Yi et al. [
23] proposed an augmented reality-based dynamic detection method to determine if a freight car can pass a particular route by constructing a freight car envelope model and calculating the distance between the freight car envelope and the obstacle. Current image-based research can be roughly divided into edge feature extraction for single images and 3D point cloud reconstruction from multiple image sequences. The measurement range of the first measurement method depends on the installation location and number of cameras, and the accuracy must be improved. In contrast, the second measurement method is too time-consuming, which hinders its promotion to a certain extent. Therefore, current methods struggle to meet the requirements for on-site measurement. Stereo vision is an excellent method for obtaining 3D geometric information about objects. Kim et al. [
24] designed a new scheme of crop height measurement methods for agricultural robots based on stereo vision. The measuring range of the binocular measuring system is proportional to the baseline (the distance between two cameras), so the baseline limits the measuring range of the binocular system. In [
25], Zhang et al. measured railway freight cars using a large base distance stereo system. To measure huge objects, large-scale measuring equipment should be utilized. Wang [
26] developed a mobile stereo-vision system with variable baseline distance to support three-dimensional coordinate measurement in a large field of view.
The standard stereo vision Inspection system’s fixed and short baseline will limit the measurement range and render it ineffective for monitoring large equipment. However, the existing stereo system increases the base distance by adding a fixed guide rail, which will lead to a huge system and cannot meet the requirements of convenience in outdoor industrial applications. Therefore, it is still necessary to explore a flexible stereo vision system with large baseline distance to detect out-of-gauge freight.
Based on the analysis above, we propose a novel measurement system that can change baseline distance with the flight path change through a UAV carrying a single camera. The innovations and contributions of this paper are as follows:
- (1)
A robust SURF_rBRIEF algorithm for stereo matching is proposed. After testing, the new algorithm’s stability, running speed, and accuracy are improved under different imaging conditions.
- (2)
Combining with the time-of-flight (TOF) method, the flying altitude control strategy is put forward for measurement with high precision and efficiency.
The rest of this paper is organized as follows. The framework of the visual measurement system is described in
Section 2. In
Section 3, an analysis of accuracy and control strategy for UAV fixed altitude flight is constructed. As a result, the field verification test is conducted in
Section 4. Finally, the conclusions and future work are detailed in
Section 5.
2. Visual Measurement Modeling
2.1. Model Description
A mobile single-camera stereo system is a visual measurement technique in which a single camera is moved to capture two frames from different locations against the same target. The system’s cost can be reduced by using only one camera. The camera is moved to various positions, rapidly forming a stereo vision system with varying baseline distances, providing high adaptability. The UAV altitude-holding control method with real-time altitude change compensation is realized by carrying a laser sensor, which further constitutes a mobile single-camera stereo system. An essential aspect of altitude control, hovering the UAV at a specific altitude allows it to acquire stable images from various altitudes.
As depicted in
Figure 2, when the UAV is at two different positions, two images containing the same feature point of the heavy-duty railway freight car are captured. The UAV can only ascend or descend in the
x (vertical) direction during the flight without any translation or rotation in the
z and
y (horizontal) directions. This mobile single-camera stereo system overcomes the limitation of the fixed dual-camera stereo system’s baseline spacing. It is possible to construct stereo vision systems with variable baseline spacing by simply moving the camera-equipped UAV to different image acquisition points.
In this paper, the current flight altitude of the UAV is recorded by the time difference between the light generated by the time-of-flight (TOF) sensor bouncing off the ground and returning to the sensor. The UAV takes off vertically in the x-direction from the ground, and if the feature point Q1 is observed for the first time, the height of the UAV from the ground at this position is H1. The coordinates of the image corresponding to the feature point Q1 are p1 = (v1, u1). Next, the UAV moves vertically until the feature point Q1 is no longer visible. When the feature point Q1 is just about to vanish, the UAV’s altitude above the ground is H2. The coordinates of the image corresponding to the feature point Q1 are p2 = (v2, u2). Finally, the mobile stereo system completes the image acquisition, with a baseline B of |H2 — H1| for the system.
The origin of the world coordinate system is located at the projection of the UAV height
H1 on the ground, as in
Figure 2. The world coordinates of height
H1 are supposed as (
x1,
y1,
z1), the world coordinates of height
H2 are (
x2,
y2,
z2), and the world coordinates of feature point Q
1 are noted as
p(
xQ1,
yQ1,
zQ1).
As a result, the world coordinates of
H1 can be expressed as (
x1, 0, 0) and the world coordinates of
H2 can be expressed as (
x2, 0, 0). From the geometric relationship, we can obtain
where
f is the focal length, (
v(1)Q1,
u(1)Q1) is the value of the image coordinate system of the point Q1, and the subscript (1) indicates that the position is at height
H1. (
v(2)Q1,
u(2)Q1) is the value of the image coordinate system of the point Q1 and the subscript (2) indicates that the position is at height
H2, as in
Figure 3.
According to Equation (1), the actual coordinates of the monitoring point can be converted from the image coordinates.
The
Di (Disparity) is defined as the difference between the image coordinates captured by the camera carried by the UAV [
27]:
Di =
v(2)Q −
v(1)Q. The baseline is:
B =
x2 −
x1 = (
v(1)Q −
v(2)Q)
zQ/
f = −
Di∙
zQ/
f. Among them, (
xQ,
yQ,
zQ) represents the point Q coordinates in world coordinate system. Combining Equation (1), we can obtain
As a result, the three-dimensional coordinates of any point can be determined using the image coordinates obtained by the camera at two different positions after conversion. After this point-to-point operation, all points on the image plane can get the corresponding three-dimensional coordinate point cloud as long as there is a corresponding matching point.
2.2. Precision Calculation
As shown in
Figure 4, an error analysis model is developed [
28]. It is possible to analyze the effects of the baseline and depth of the stereo vision system on measurement precision. Two cameras are placed horizontally to simplify the analysis. The coordinate origin of the vision system is the projection center o
1 of the left camera. O
1o
2 is the stereo vision system baseline with its length equal to
B. The effective focal lengths of the two cameras are
f1 and
f2. The coordinate systems O
1u
1v
1 and O
2u
2v
2 are the image plane coordinate systems corresponding to the left camera and the right camera, respectively. The projection points of the point Q in space on the two image coordinate systems are q
1 and q
2. The angles between optical axis O
1o
1, O
2o
2 and axis
x are
α1,
α2, respectively. The angles between optical axis O
1o
1, O
2o
2 and lines Qq
1, Qq
2 are the projection angles of the field of view of the camera, noted as
ω1,
ω2, respectively.
The three-dimensional coordinates of Q are obtained from the geometric relationship as
Based on Equation (3), the partial derivatives are found for the corresponding functional relations of
u1 and
u2:
The point Q is assumed to be located at the intersection of the two optical axes of the camera. Two cameras are assumed to be placed symmetrically.
Let , , .
Let ,,,.
The parameter
k is introduced to quantify the measurement error
e as a function of baseline
B. According to the Taylor expansion, we can obtain
The
y-direction measuring error at point Q is
The
x-direction measuring error at point Q is
Therefore the overall measuring error of the Q point is
As shown in
Figure 5, the variation between the baseline, distance, and measurement precision can be derived from the above equation. According to
Figure 5a,
e2 is closely related to the
y-direction measurement accuracy, whereas
e1 increases as
k increases. Both
e2 and
e3 exhibit a descending and then ascending trend. The minimum value of
e2 is 1, and the corresponding
k is 2. The minimum value of
e3 is 1.299, and the corresponding
k is 1.41. Therefore, if the design of
k is between 1 and 2, the system’s accuracy is considered to be high. If
k is less than 0.5 or greater than 3, the measurement system is deemed unreliable. According to
Figure 5b, once the system’s structure parameter
k has been determined, the measurement precision decreases proportionally to the system’s measurement distance.
Taking the DK36 well-hole car loaded with a transformer as an example, we analyzed the requirements of the recognition precision of the freight car contour dimensions on the resolution of the UAV camera. The dimensions of the car are 4.925 × 3.960 (m) and a camera with a × b (pixel) is used for detection. The required detection accuracy is 2 mm. Accuracy is the product of resolution and effective pixels. In general, the effective pixel is 1 in the case of frontal illumination. The camera resolution is 4925/a mm/pixel, so at least 2462.5 pixels are required; therefore, a camera with a resolution of at least 4032 × 3024 (pixel) is used.
2.3. Measurement Scheme
The proposed method consists of four stages: image acquisition and camera calibration, stereo matching, freight car segmentation, and out-of-gauge detection. The calibration of camera parameters is the primary work of vision measurement. The precision of vision measurement results is directly influenced by the precision of calibration results and the stability of the algorithm. Therefore, accurate camera calibration is a prerequisite for effective follow-up work. As the matching ratio of the freight car directly affects the subsequent accuracy of the measurement, stereo matching is the most critical stage. Therefore, an improved image feature extraction and matching methodology, and dynamic threshold are proposed to improve the matching ratio. The freight car segmentation stage aims to extract the freight car and reconstruct the dimension of the freight car. The standard limit graph is constructed by combining the railroad freight train out-of-gauge detection criteria. In the final stage, the out-of-gauge detection is judged by substituting the measurement results into the standard limit graph. More details are provided below.
2.3.1. Image Distortion Correction
In the image-acquisition process, the UAV remains hovering at a fixed flight altitude and ensures that the camera can capture a clear image of the target area at this time. First, when the acquired image contains feature point Q
1, the image will be marked as I
L. Next, the UAV maintains a smooth and uniform linear flight in the air along the altitude direction from the current observation point position to the following observation point position. When the acquired image no longer contains the feature point Q
1, the previous frame of this image will be marked as I
R. The baseline distance
B is obtained from the difference of the observation point positions corresponding to these two frames. Then, according to the parallax principle, the visual measurement of feature point Q
1 is concluded. Last, the steps above are repeated until the final stereo-vision measurements of all feature points have been made. The process is shown in
Figure 6, where
i is the sequence of image frames, and
j is the feature point number.
The LM (Levenberg–Marquardt) algorithm [
29] and the gradient descent method [
30] were used to solve and analyze the camera models containing different internal parameters, respectively. To determine the internal reference matrix and distortion coefficients, we analyzed the effects of the two algorithms on the calibration accuracy of the camera models to calculate and compare the reprojection errors of the camera models. According to the engineering experience of camera calibration experiments, it is not the case that the more images that are available, the more accurate the calibration results are. The ideal calibration images are between 10 and 20 [
31,
32], so 15 images are chosen for calibration. The image resolution sizes of the cameras used in the experiments are 4032 × 3024, 5472 × 3648, and 5120 × 3840. The detailed calibration process is given in
Figure 7,
Figure 8 and
Figure 9, which is obtained using a 5 × 7 checkerboard grid image with a grid size of 20 mm × 30 mm and calibrated by the MATLAB Camera Calibration Toolbox [
33].
The results of the camera model calibration are shown in
Table 1. For the three cameras, compared with the gradient descent method, the reprojection error of the camera calibration model is smaller when optimized using the LM algorithm, and the calibration results of the camera parameters are closer to the ideal values, with an accuracy improvement of 37%. Therefore, the LM algorithm is chosen for the iterative solution of the intra-camera parameters.
Using the internal parameter matrix and aberration coefficients obtained by solving the calibration experiment, aberration correction can be applied to the images captured by the camera. After aberration correction, the images captured by cameras with different aberration coefficients exhibit distinct effects.
Figure 10 shows the comparison before and after the correction of the freight car images captured by UAV, where
Figure 10a shows the original image of the DK
36 well-hole car captured by UAV, and
Figure 10b shows the correction result. Compared with the image taken by the UAV before correction, the lines at the edge of the freight car and the rails at the front of the image are restored from the curve shown in
Figure 10a to a straight state, as shown in
Figure 10b, and the correction effect is more pronounced. Therefore, for the camera lens with more serious distortion, it must be processed by the imaging distortion correction model to obtain the accurate image pixel coordinates.
2.3.2. Image Feature Matching
For image feature matching, the SURF_rBRIEF algorithm is presented in this paper. The SURF_rBRIEF algorithm is created by combining the SURF detector [
34] and the rBRIEF descriptor [
35]. The matching point screening process can dynamically adapt to different matching algorithms due to the improved coarse matching threshold. The SURF detector can obtain many stable feature points with better robustness in the feature point-detection phase. The rBRIEF descriptor is fast in computation, occupies little memory, has rotational invariance, and is more accurate in grasping the information of feature points. Therefore, an improved algorithm coupled with these two algorithms, SURF_rBRIEF, is proposed in this paper. The basic idea of this algorithm is described herein. First, the SURF_rBRIEF algorithm is the same as SURF in the feature point-detection stage. The SURF algorithm is used to obtain feature point localization. Next, the main direction of the point is obtained according to the grayscale center-of-mass method. Then, the descriptors are calculated by the rBRIEF method. Finally, the key points and descriptors of the feature points are obtained. The flow of the SURF_rBRIEF feature point detection and matching algorithm is shown in
Figure 11.
For accurate matching, a new coarse matching threshold is proposed. The screening threshold
ε for coarse matching and the interval [
Dmin,
Dmax] of the matching distance
D are dynamically linked. When the detection algorithm is modified, the coarse matching threshold will be dynamically adjusted based on the matching distance interval to filter out incorrect matches, which may guarantee a sufficient number of matching point pairs and improve matching results. The dynamic threshold can be expressed as
In order to compare the original SURF algorithm with the improved SURF_rBRIEF, the performance of the algorithm under different imaging conditions was tested. There were five types of variations in the imaging conditions during image acquisition: point-of-view variation, scale variation, blurring, JPEG compression, and illumination conditions. The experimental environment was Ubutun18.04, Linux operating system, Intel Core i5-9300H CPU @2.40GHz, and OpenCV3.4.4. The coarse matching strategy was used for both algorithms in the test, and the coarse matching threshold was changed dynamically according to the description of the sub-distance interval. The matching was carried out uniformly utilizing the Hamming distance matching method [
36]. The experimental dataset was derived from Mikolajczyk’s [
37] publicly accessible image database. The algorithm’s test data results in a successful state are summarized in
Table 2 below. In conclusion, the proposed SURF_rBRIEF algorithm improved the stability, running speed, and accuracy under different imaging conditions compared to the original SURF algorithm. The overall accuracy increased by 21%, and the running speed increased by 52%.
Actual measured image data of DK
36 well-hole cars were used to verify that the algorithm can achieve stable feature extraction and match in critical parts of railway freight cars. During the UAV flight, images of various railway freight car parts were collected and processed using the SURF and SURF_rBRIEF algorithms for comparison and analysis.
Figure 12a,c show the image feature extraction and matching effect obtained by using the SURF algorithm at different parts of the body of the DK
36 well-hole car, respectively.
Figure 12b,d show the image feature extraction and matching effect obtained by using the improved SURF_rBRIEF algorithm at different parts of the body of the DK
36 well-hole car, respectively. Comparing the matching effects in
Figure 12, we can see that for the images of the top and bottom body parts of the DK
36 well-hole car collected by the UAV, the newly proposed SURF_rBRIEF algorithm in this paper has fewer error matching line segments, indicating that the improved algorithm has fewer error matching points and higher accuracy compared with the original SURF algorithm.
2.3.3. Image Target Segmentation
The image segmentation technique was used to rapidly locate critical parts of heavy-duty railway freight cars and disassemble them. It was followed by rapid reconstruction of freight train contours and out-of-gauge detection combined with stereo matching. Since the HSV color space is closer to human visual perception of color, the RGB color space is first converted to HSV color space. The freight train is initially segmented based on color [
38] to obtain the freight train outline information with the background removed. Then, the edge features of the freight train were obtained by the Canny operator [
39], and a total of
P horizontal linear edge points were extracted and fitted to the straight line using the RANSAC (random sample consensus) method [
40]. We set the edge point set as
P, and the key steps of the algorithm for determining the straight line of the car’s edge profile are as follows. First, the line equation is written as
ax +
by = 1, where
a and
b are the equation coefficients that two points can solve. Next, two randomly selected points,
Pi and
Pj, are used to calculate coefficients
a and
b. Then, the distances from all other points to the line obtained from
a and
b are computed. Moreover, the number of points whose distance is smaller than threshold
dt is counted and forms an inlier point set
S. Furthermore, the above steps are repeated
M times and the fitted line with the highest number of internal points
Smax is selected. Finally, the coefficients of the corresponding lines are recorded as
a′ and
b′.
Figure 13 shows the segmentation process of the image of a heavy-duty railway freight train.
4. Validation and Application
A field test was conducted at a yard in Shenyang, Liaoning Province, China, with a DK
36 well-hole car loaded with a transformer. Based on the model framework and algorithm flow established in the previous paper, a measurement scheme for the loading contour dimensions of out-of-gauge freight cars was developed, and the contour dimensions of the DK
36 well-hole car were measured and calculated. Eventually, the calculated results were compared with the actual measurement results to prove the validity of the model and to analyze the factors affecting the model error. In order to verify the accuracy and efficiency of our measurement system, we used manual and UAV measurements to detect the out-of-gauge freight car, as shown in
Figure 16.
For the well-hole car loaded with a transformer, the measurement staff is not concerned with the length of the train but rather with the width and height of four special positions. These four unique positions are the center, pos 1#, pos 2#, and pos 3#. The center position refers to the transformer’s uppermost position, the second side position refers to the lugs on both sides of the transformer, and the third side position refers to the vehicle’s side-bearing beams on both sides.
Figure 17 shows the distribution of the points and locations to be measured. Number 1 and 2 are used to determine the height of the center position. Number 3 and 4 are used to determine the height of the Pos 1#. Number 5 and 6 are used to determine the height of the Pos 2#. Number 7 and 8 are used to determine the height of the Pos 3#. Number 9 and 10 are used to determine the width of the center position. Number 11 and 12 are used to determine the width of the Pos 1#. Number 13 and 14 are used to determine the width of the Pos 2#. Number 15 and 16 are used to determine the width of the Pos 3#. The focal length of the camera used in the experiment is 35 mm. The baseline
B is 2 m. The UAV shooting heights
H1 and
H2 are 1.5 m and 3.5 m, respectively. The object distance
ZQ is between 4.5 and 4.7 m depending on the observation point. As shown in
Figure 18, the images taken by UAV were matched for stereo recognition.
The contour dimensions of the DK
36 well-hole car were measured by both UAV stereo recognition and manual measurement. The captured images were extracted with feature points and reconstructed with the help of the SURF_rBRIEF algorithm, and the contour dimensions were finally calculated.
Table 4 and
Table 5 show the coordinates of observation points and contour width dimensions obtained by the two measurement methods.
The manual measurement took two hours, while the UAV detection method proposed in this paper only took five minutes. According to the experimental results in
Table 5, compared with manual measurement, the width and height measurement relative error of the DK
36 well-hole car measured by UAV is within 3.80%, with an average error of 3.29%. Because there is no complex point cloud reconstruction, we could carry out rapid two-dimensional detection but could not obtain an accurate three-dimensional model of the DK36 well-hole car. The position with the smallest relative error is the center width. The standard boundary graph was constructed according to the railroad freight train out-of-gauge detection standard. The measurement results were compared with the fundamental building limits in the railroad limit contour, and the results were obtained, as shown in
Figure 19. The top-loading dimension of this DK
36 well-hole car was within the primary building limit, but some positions slightly exceeded the maximum overload cargo loading limit.