Degenerate Near-Planar 3D Reconstruction from Two Overlapped Images for Road Defects Detection

This paper presents a technique to reconstruct a three-dimensional (3D) road surface from two overlapped images for road defects detection using a downward-facing camera. Since some road defects, such as potholes, are characterized by 3D geometry, the proposed technique reconstructs road surfaces from the overlapped images prior to defect detection. The uniqueness of the proposed technique lies in the use of near-planar characteristics of road surfaces‘ in the 3D reconstruction process, which solves the degenerate road surface reconstruction problem. The reconstructed road surfaces thus result from the richer information. Therefore, the proposed technique detects road surface defects based on the accuracy-enhanced 3D reconstruction. Parametric studies were first performed in a simulated environment to analyze the 3D reconstruction error affected by different variables and show that the reconstruction errors caused by the camera’s image noise, orientation, and vertical movement are so small that they do not affect the road defects detection. Detailed accuracy analysis then shows that the mean and standard deviation of the errors are less than 0.6 mm and 1 mm through real road surface images. Finally, on-road tests demonstrate the effectiveness of the proposed technique in identifying road defects while having over 94% in precision, accuracy, and recall rate.


Introduction
A road is one of the most fundamental infrastructures in the transportation system. A healthy and intact road surface condition increases ride comfort and vehicle safety for through traffic [1,2]. The road surface condition inevitably downgrades and is affected by stresses from traffic as well as climate impacts such as humidity or temperature change. Thus, frequent inspections of the road surface are vital in identifying road surface defects along with carrying out timely maintenance. Labor intensiveness, inefficiency, and subjectivity of manual inspection have resultantly necessitated automatic measurement of the road surface defects such as potholes and ruts, which are mostly characterized by geometry [3][4][5][6][7][8].
Past works on automatic road defects detection can be classified into three types: the acceleration-based detection, the color-based detection, and the geometry-based detection. The acceleration-based technique uses accelerometers as irregular geometrical changes create vibration that can be measured by accelerometers. Yu et al. [9] analyzed acceleration and automatically detected road defects for the first time to the best of the authors' knowledge. Vittorio et al. [10] detected the road anomalies based on the abnormal accelerometer data from the cellphone. Tai et al. [11] and Eriksson et al. [12] proposed a technique using a machine learning approach to detect road anomaly where Support Vector Machine (SVM) and unsupervised learning were used respectively to enhance detection accuracy. Xue et al. [13] adopted a self-learning one degree-of-freedom vibration signal to predict potholes. Mednis et al. [14] implemented and compared several acceleration data processing algorithms for pothole detection, which resulted in a detection rate between 68% to 90%. Although detection by acceleration techniques directly and thus accurately sense geometrical road defects, they miss the detection if no tire steps exactly on the road defects.
For color-based techniques, image sensors are often equipped to obtain the appearance of defects. Tedeschi et al. [15] proposed a technique using Local Binary Pattern (LBP) feature-based cascade classifiers to detect road defects from images. Koch et al. [16,17] used the histogram and four different image filters to extract road distress texture features. Jo et al. [18] constrained the road defect region between two lanes through the lane detection technique to increase the precision of pothole detection. Banharnsakun et al. [19] deployed an Artificial Neural Network (ANN) which can categorize the distress into longitudinal crack, transversal crack, and pothole. Ryu et al. [20] separated the pothole region from the background by Histogram Shape-Based Thresholding (HST) and then used multiple filters to find the pothole features. The color-based technique provides intuitive information about road defects' position and size. However, the RGB image analysis may not capture geometry and contains unnecessary information such as shadows, oil stains and pavement markings which affect the detection.
Among geometry-based techniques, Chang et al. [21] and Yu et al. [22,23] detected potholes by analyzing topological features obtained from 3D laser scanning data. Hou et al. [24], Fan et al. [25], and El et al. [26] applied stereo-vision systems to extract a 3D point cloud from road surface images and detect potholes directly from the 3D model of the road obtained from point cloud data while no 3D reconstruction precision was investigated. Ahmed et al. [27] proposed a pothole detection technique by Structure from Motion (SfM) taking multiple images on one road surface region to reconstruct 3D points of road surface. While accuracy in depth was reported to be in the order of 0.1 mm, the accuracy was attained by manually marking artificial features on the road surface. Antol et al. [28] and Moazzam et al. [29] implemented the road distress detection by 3D point cloud data from an RGB-D camera. The former used a movable RGB-D camera box to enable depth measurement at a low speed, while the latter mounted on a tripod to statically measure the 3D road surface by the RGB-D camera. However, the accuracy of the 3D reconstruction by using a laser sensor or stereo-vision system can be degraded if the vibration of the measuring sensors is significant. Further, the issue of the 3D reconstruction based technique is its accuracy in 3D reconstruction since the road surface is near-planar and thus provides poor vertical information.
This paper presents a new geometry-based technique that reconstructs road surfaces from two overlapped images captured by a downward-facing camera with little influence caused by the vibration and then detects road defects based on the 3D reconstructed road. The 3D reconstruction performed by using an improved SfM technique is extensively formulated such that the road surfaces, which are near-planar and have small vertical variations, can be reconstructed accurately. By solving the degenerate issue for near-planar road surface reconstruction, the proposed technique thus detects road defects from the accuracy-enhanced 3D reconstructed road surfaces. This paper is organized as follows. The following section refers to the traditional SfM for the road surface and the degeneracy issue for the planar object reconstruction. Section 3 first presents the proposed 3D reconstruction technique for near-planar road surfaces and then describes the detection of road defects detection based on reconstructed 3D road. Section 4 investigates the ability of the proposed technique parametrically in simulated environments and then applies to real road surface images. Conclusions are summarized in the last section. Figure 1 shows general settings and problem formulation of road surface reconstruction using a downward-facing camera for road defects detection. The road surface, shown as a near-planar object, contains a pothole representing a defect road. A camera, facing downward to the road surface at a height h, is mounted on a vehicle. While the vehicle is moving, the camera captures images I 0:K at positions X c 0:K from time step 0 to time step K. Since images are captured by a camera of various frame rates at various vehicle speeds, minimally and most fundamentally required is the reconstruction of a 3D road surface overlapped by two consecutive images {I k−1 , I k }. This problem is converted into localizing the road surface point cloud X r k ≡ {X r k,i |∀i} using the homogeneous two-dimensional (2D) image features x r k−1 ≡ {x r k−1,i |∀i} and the corresponding x r k ≡ {x r k,i |∀i}, which are extracted from image I k−1 and I k respectively. It is to be noted that X c k should be derived simultaneously with X r k since the camera position is not precisely known due to the vehicle vibration. Once the reconstruction has been completed, road surface points are classified as normal flat road surface X rn k and defect road surface X rd k . In Figure 1, {G} represents the global coordinate system while {L} is the local coordinate for two neighboring camera positions. Road surface reconstruction settings for defects detection from one downward-facing camera. 3D point cloud are reconstructed from consecutive images to represent the road surface, followed by classifying the road into defective and non-defective surfaces. Figure 2 illustrates the significance of the two-image problem formulation where the vehicle speed is shown with respect to different numbers of overlapped images when the camera frame rate is 60, 30 and 15 FPS. Note that these are the common frame rates in industrial cameras, and each image covers a 1 m × 1 m road surface area. For every number of images overlapped, N o , the overlapping area between every two neighboring images is at least (100 − 100 N o )%. As the curves exhibit, every camera sees common vehicle speeds when the number of overlapped images is only two. Therefore, 3D road surface reconstruction will fail if it is not possible from two images.  Figure 3 shows the notations and the operation of the general road surface 3D reconstruction from image features x r k−1 and x r k . To present the mathematical derivation of the two-image 3D reconstruction for road surfaces, a line is plotted passing the camera centers, X c k−1 and X c k . This line intersects with image I k−1 at point e k−1 as well as image I k at point e k . l k−1,i is a line passing through e k−1 and x r k−1,i , a projection from road surface point X r k−1,i to I k−1 . Similarly, l k,i is a line passing through e k and x r k,i , and this is given by:

Two-image 3D Road Surface Reconstruction
Combining Equation (1) with x r k,i T l k,i = 0 yields: If X r k,i is located on a road surface plane, then x r k−1,i and x r k,i are related by a homography matrix H ab : Substituting Equation (2) to Equation (3)results in: where F k = [e k ] × H ab is the fundamental matrix of the two images. Equation (4) holds for all the n correspondences {{x r k,i , x r k−1,i }|i = 1, 2, . . . , n} [30], which means: The solving of fundamental matrix F k , as well as the rotation matrix R k and the translation t k are given by the Appendix A. The final 3D reconstructed road surface X r k is given by the triangulation f t (·): where K is the camera's intrinsic matrix.

Planar Surface Degeneracy Problem
Since the road surfaces are near-planar, it suffers from the degenerate issue which will be shown by the the rest of this section. As X r k are located on the near-planar road surface, x r k−1 and x r k can be related by a 3 × 3 homography matrix H k : in which x r k is proportional to H k x r k−1 . This means that the cross product of x r k and H k x r k−1 is x r k × H k x r k−1 = 0. Thus solving H k equals to solving the equation A h k = 0 where H k , A and h k are expressed as: To solve h k , the problem is equivalent to minimizing A h k subject to h k = 1 because of image noises. Therefore, solving h k is similar to solving f k in the previous section. Degeneracy is defined as the situation when fundamental matrix F k obtained from the previous procedure is not unique. The planar object, which the road can be approximated as, is one of the degenerate geometries. If X r k are located on a plane surface, the correspondences in the two views x r k−1 and x r k satisfy Equation (7). Also, x k−1 and x k satisfy Equation (5). The substitution of Equation (7) into Equation (5) yields x r k T S k x r k = 0 (9) where S k = F k H −1 k . To satisfy Equation (9), S k must be a skew-symmetric matrix given by As a result, the fundamental matrix F k is: Thus F k has a solution with three degree-of-freedom (determined by s 1 , s 2 , and s 3 ). Since F k is up-to-scale, the solution of F k becomes to have two degree-of-freedom. Therefore the existing 3D reconstruction technique from I k−1 and I k cannot lead to correct 3D reconstructed points for planar road surface because of the ambiguity of F k introduced to reconstruction process from Equations (A4) to (A7) and 6. While 3D reconstruction techniques exist, the issue of their direct application to road surface profiling is the ill-posedness of the problem due to the lack of depth information and the incorrect feature matching due to the noisy image. The next section will present the proposed technique, which solves the ambiguity issue of F k for the road surface reconstruction, and leads to correct defects detection based on the 3D information. Figure 4 shows the proposed degenerate near-planar 3D reconstruction technique for road defects detection. The proposed technique consists of three parts: preprocessing, 3D reconstruction for near-planar road, and post-processing. The preprocessing rejects the mismatched feature correspondences to dramatically improve the feature matching between I k−1 and I k , which contributes to resolving the degenerate issue for near-planar road surface reconstruction. Then, a newly derived fundamental matrix F k with no ambiguity improves SfM and significantly resolves the degenerate issue. In the post-processing, since the reconstructed points X r k are unitless, the proposed technique converts X r k to metric points m X r k . As a result, road defects can be detected reliably due to the enhanced accuracy in 3D surface reconstruction.

Preprocessing
The preprocess rejecting mismatched correspondences is formulated as follows. Let the difference of the ith corresponding feature at time step k − 1 and k be: This makes the set d f k ≡ {d f k,i |i = 1, 2, . . . , n}, which includes all the n correspondences of the images I k−1 and I k . As the vehicle is moving along the road following a smooth path, it is valid to assume that the rotation of the camera is small and the camera's motion is linear in a short period between two neighboring time steps k − 1 and k: d which means d f k are also linear and proportional to the camera's motion. Since I k−1 and I k has Gaussian noises for x r k−1 and x r k and n is large with the difference distributed smoothly, the measured image corresponding featuresx r k−1 andx r k are: Combining Equations (14) and (15) with Equation (12), the proposed technique models d where d f k is the mean value and Σ f is the covariance matrix of d where λ is a threshold and 1 is an all-ones vector. As the exact distance that the camera moves between time step k − 1 and k is unknown, The RANSAC technique is difficult to determine the threshold and number of iterations to filter correct feature matchings. However, the proposed technique uses the camera's linear motion as a prior knowledge, which means correct matchings have similar values in d f k,i . Unlike RANSAC, Equation (17) only needs to find a reasonable λ and operate once to keep the correct matching within a range (d . Therefore, the proposed technique obtains correct feature matchings for the following near-planar 3D reconstruction.

3D Reconstruction for Near-Planar Road Surface
The proposed technique solves the ambiguity issue of F k by mathematically deriving a unique fundamental matrix for the near-planar road surface. In the local coordinate {L}, { L } X c k−1 = (0, 0, 0) T and its projection to image I k , e k , is expressed as: It is noted that from Equation (1), all the lines l k have the following for road surface images: Meanwhile, Equation (5)and x r k T l k = 0 relates F k and l k as: Substitute Equations (19) and (7) into Equation (20) resulting in: Combining Equation (21) with Equation (18), it derives F k for the near-planar road surface as: where H k is calculated recursively by RANSAC using x r k−1 and x r k after mismatched points rejection. Comparing Equation (11) with Equation (22), instead of representing F k with any 3-vector s, F k is determined in Equation (22) by t k which is the up-to-scale translation between the camera positions in two views: Since the vehicle moving along the road has small rotation R k for the camera in such a short period from time step k − 1 to k, R k is expressed as R k ≈ I. Equation (25) can be obtained from Equation (24): The substitution of Equation (25) into Equation (22) determines F k as: As a result, a unique fundamental matrix F k is obtained from Equation (26) when the road surface is near-planar. Then by using the traditional SfM technique, this F k leads to the correct reconstructed road surface points X r k following by identifying defects.
Because of various uncertainties in the 3D reconstruction process, errors will propagate and affect the 3D points X r k . Letx r k be the measured value of x r k wherex r k = x r k + ω and ω ∼ N(0, Σ x r k ) follows a normal distribution. Equation (24)can be rewritten as: where P + k = (P T k P k ) −1 P T k is the pseudo-inverse matrix of P k . Let Equation (23) be written as X r k = f (x r k ). By using the first-order Taylor series expansion Equation (23) becomes: where J k represents the Jacobian matrix of f (·). The covariance matrix of X r k thus is approximated by Since J k in this scenario equals to P + Equation (25) is deduced to be Therefore, although with a unique F for the near-planar road surface, the noises in the image inevitably cause errors for the 3D reconstructed surface points X r k due to the ill-posedness of the problem.

Post-Processing
After getting the near-planar road surface F k with no ambiguity from Equation (26), X r k are reconstructed from Equations (A4) to (A7) and 6. Although, the obtained 3D road surface points X r k are unitless up to a scale factor. In order to get m X r k , the proposed technique fits a plane on X r k to represent the road surface: Then the surface normal vector n k and the up-to-scale distance from the camera to the road surface h u are obtained from X r k based on plane parameters p 0 , p 1 , and p 2 : The reconstructed surface and the distance h u obtained by Equations (32) and (33), however, may not be the final reconstruction. Because the road surface may have anomalies such as potholes, the first-time road surface reconstruction will be distorted if such anomaly exists. Thus, a recursive surface fitting process is proposed to reconstruct the road surface through Equation (34) to Equation (36): In Equation (34), d k,i is a signed value calculated as the distance of X r k,i to the current reconstructed road surface. The positive d k,i represents the point X r k,i located in between the camera and the current fitted road surface. The negative d k,i means the point X r k,i is at the other side of the current road surface. Equation (35) illustrates the classification of X r k,i into possible defect points X rd k and non-defect points X rn k by a depth threshold T d . T n in Equation (36) is a threshold refers to the percentage of non-defect points among all the points X r k . If it is assumed that at least m percent of the points X r k are actually representing non-defect road surface, then a T n > m will continue the recursive process to fit a new road surface based on all the X rn k from the last iteration. The recursive process will continue until T n < m is reached.
After the recursive process, an updated camera to road up-to-scale distance h u was obtained from Equation (33). Then a metric scale factor α k is calculated based on the real camera to road surface distance h: where m X r k are the metric points with units. From here, the proposed technique converts the up-to-scale points X r k into metric scale road surface points m X r k . Thus the road defects are detected by the depth ( { G } Z direction) values of m X r k based on the correct geometry. It is noted here that in order to simplify the notation, m X r k are still written as X r k in this paper.

Experimental Results
This section provided two types of experiment to analyze the proposed technique. The first type of experiment was in a Matlab simulated environment which contained the simulated road surface, simulated camera model, and simulated camera motion. The simulation experiments analyzed the influence of different variables to the proposed road surface reconstruction. The second type of experiment was performed on the real road surfaces captured by a road surface imaging system. The real-world experiments demonstrated the accuracy of the proposed technique and its effectiveness on road defects detection. Figure 5 illustrates the simulated camera and the road surface in the simulation environment. On the right, the simulated camera is facing towards the simulated road surface, and has simulated properties such as intrinsic matrix and field of view. On the left, the environment creates 3D points X r k ≡ {(X r k,i , Y r k,i , Z r k,i ) T |∀i} to represent the road surface. Z r k = Z m + ω r , where ω r ∼ N(0, δ) is used to change the evenness of the road in { L } Z direction. Z m is the mean distance between camera and the road surface. The default unit in the simulation environment is millimeter. The simulated images are obtained by reprojecting X r k to the simulated camera.x r k are the measured value of x r k defined asx r k = x r k + ω, where ω ∼ N(0, Σ x r k ) has the covariance matrix Σ x r k and is used to model the uncertainty for matched features in image. The covariance matrix of Σ x r k is:

Experiments in Simulation Environment
As for the orientation, θ x , θ y , and θ z , are the change of angles for the camera about { L } X axis, { L } Y axis, and { L } Z axis between two time steps. Disturbances such as the vibration of the camera cause the orientation change of the camera. Define the error for 3D reconstruction as whered k,i is the measured distance and Equation (34) shows the ground truth distance d k,i . Table 1 lists the experimental parameters analyzed in the experiment.   Figure 6 shows the comparison of 3D reconstruction error between the proposed technique and traditional SfM. The left figure shows the 3D reconstruction error when the road surface is changing from planar (δ = 0) to non-planar (δ >> 0). When δ is small, the reconstruction error is large for traditional SfM as the degenerate issue still exists, while the proposed technique has small reconstruction errors. The error for the proposed technique in this case is mainly from image noise σ. When the road surface is non-planar, both SfM and the proposed technique have reconstruction error ≈ 2 mm. The right figure shows the reconstruction error influenced by image noise σ at δ = 0.1 and δ = 10. For non-planar road surface which has δ = 10 mm, the proposed technique and traditional SfM both have small and similar reconstruction error. When δ = 0.1 mm, i.e., road surface is near-planar, SfM has error usually between 10 and 1000 mm while the proposed technique has error usually less than 1 mm, and even for a much worse case when σ = 0.2, the error is less than 2 mm.  The image uncertainty σ is changed from 0.001 to 0.1, while the experiment also alters the distance from camera to road surface Z m to discover the influence to the results. It can be discovered that when δ becomes larger which means the road is not a planar surface, SfM gives close results to the proposed technique. When δ becomes smaller the error for SfM increases but for the proposed technique the error remains small. Figures 8-10 shows the 3D reconstruction error by the influence of errors in rotation matrix R. In this simulation experiment, the rotation matrix R is decomposed as   Figure 8 shows the influence to 3D reconstruction error by different θ x . For the SfM results, when δ = 0.1 the error is usually more than 5% of camera-to-road distance because in this case the error is dominated by the influence of the degenerate issue. In the meantime, 3D reconstruction error is much less by using the proposed technique for the planar road surface. For δ = 5 SfM has error under 2 mm. While for the proposed technique, when θ x = 5 • , the error is only around 1 mm larger than the 3D reconstruction error using SfM. Figure 9 identifies the influence to 3D reconstruction error by different θ y . The error is large and dominated by the influence of degenerate issue for SfM when δ = 0.1, while the proposed technique constructs road with less than 2 mm error. When δ = 5, SfM has comparable error with the proposed technique. For the proposed, the change of θ y has little influence on the 3D reconstruction errors which are under 2 mm even at the worst case. Figure 10 demonstrates the influence to 3D reconstruction error by different θ z . For δ = 0.1, the error is also large for traditional SfM because of the degenerate issue while the error is small for the proposed technique. When δ = 5, traditional SfM has comparable error with the proposed technique. For the proposed, the change of θ z almost has no influence to the 3D reconstruction error. The error in this case is mainly influenced by the variable Z m . The larger the Z m , the larger the error .    Figure 11 shows the 3D reconstruction error when there exists a change of height δ h caused by the vibration in camera to road surface distance h. The measured distanceĥ is expressed aŝ where ∆h is simulated to having a uniform distribution from 0 to δ h . In Figure 11 Figure 12 illustrates the comparison between Fan's [25] stereo vision road 3D reconstruction technique, traditional SfM, and the proposed technique. Figure 12a shows the simulation environment for stereo camera, where the baseline between the two cameras, B, is set to be B = 200 mm. Figure 12b compares the 3D reconstruction error from a changing θ y , caused by the vibration of the vehicle, using stereo technique, traditional SfM, and the proposed technique on the same simulated road which has δ = 0.1 mm and σ = 0.2 mm. The camera(s) has a height h = 1400 mm. To simplify the comparison, let θ y be the angle for camera 2 respect to camera 1 caused by the vibration. It can be seen that the error from stereo technique exponentially increases when θ y is larger. Even a relatively small vibration, when θ = 0.1 degree, ≈ 10 mm which is still large for the road surface reconstruction task. Although SfM has smaller error than the stereo technique most of the time after θ = 0.2 degree, it still has a mean error which is over 10 mm. This is still mainly caused by the degenerate issue of the road surface 3D reconstruction. The proposed technique, however, has less than 2 mm reconstruction error which is mainly caused by the image noise σ.  Figure 13 shows the experimental setup of the error analysis for the proposed technique using real images. The camera is facing downward to the road surface with its principle axis vertical to the ground surface as shown in Figure 13a. The ground surface is made by a flat plate to mimic a planar road surface as illustrated in Figure 13b. An image of road surface is printed and stuck to the flat plate to provide road surface patterns for the image feature searching and matching. A circular part of the plate can be removed from the plate to mimic the road pothole. Figure 14 illustrates an example of the 3D reconstruction for a same flat plate image using SfM and the proposed technique. Traditional SfM fails in this example since the road surface in the image is a near-planar. However, the proposed one gives the correct planar-like 3D surface reconstruction as shown in Figure 14c.  Figure 15 shows the error analysis for 3D reconstruction using real images. Table 2 lists the parameters analyzed in the experiments using real images. It is noted that the mismatched feature rejection constant is found to be robust to keep correct matchings at λ = 1.5. Figure 15a demonstrates the error of 3D reconstruction from traditional SfM. Figure 15b represents the 3D reconstruction error by using the proposed technique. The errors are compared between two techniques by changing the height of the camera h from 900 to 1600 mm. The mean errors are plotted and the error bar represents the standard deviation of 10 runs of image capturing for each height. It can be identified from Figure  15 that traditional SfM gives large mean error and standard deviation for this planar plate, while the proposed technique has mean error less than 0.6 mm and standard deviation close to 1 mm.    Figure 16 shows a system which captures road surface images. The authors' previous work [31] built this system which captures 1024 × 1280 resolution road surface images at a driving speed up to 100 km/hour. There are two cameras on this system. Although the proposed 3D reconstruction technique is based on a monocular camera, two cameras can work separately to increase the area of road surface region covered by images. This system is controlled by field-programmable gate array (FPGA) so that the frame rate of the camera is adaptive based on the vehicle speed. On-board diagnostics (OBD) port on the vehicle passes the vehicle's velocity to FPGA which will set higher frame rate for the camera when the vehicle is moving fast and lower frame rate when the vehicle is slow. The system set the frame rate so that there is at least an 50% overlapping area between two consecutive images.      Figure 19 demonstrates the repeatability experiment for the proposed technique. Figure 19a represents a section of the road which contains a pothole. This section of road surface are obtained by stitching 50 images which are captured using the system shown in Figure 16. In Figure 19b the { G } Z r − h values of reconstructed 3D road surface points are plotted as a colormap. In Figure 19c, the proposed technique measures the same road section which is reconstructed in Figure 19b. The two measurements are then compared to validate the repeatability of the proposed technique. In Figure 19d,  Table 3 compares SfM with the proposed technique on defects detection using road surface images. The comparison is based on 6300 road surface images which are collected at rural, urban, and highway roads for weather conditions such as sunny, cloudy, and partly cloudy around Blackburg, Virginia area. The real road surface images are captured at both highway driving speed (100 km/h) and local road driving speed (40 km/h). Some images capture potholes while other images capture flat road surface. From true positive (TP), false positive (FP), true negative (TN), and false negative (FN), the accuracy is expressed as (TP + TN)/(TP + TN + FP + FN), precision as TP/(TP + FP), while the recall illustrated by TP/(TP + FN). From Table 3 although traditional SfM gives higher recall rate between the proposed technique and traditional SfM, it has only 34.34% precision rate. It means that although traditional SfM rarely misses the detection of potholes (less FN), it generates more wrong detection of potholes (more FP). The proposed technique on the other hand, results in 98.95% accuracy, 94.33% precision and 95.76% recall rate. All the three criteria are above 94%.

Conclusions
A geometry-based technique of reconstructing degenerate near-planar road surfaces from two images for road defects detection is presented in this paper. The proposed technique mathematically formulates the near-planar road surface reconstruction problem, and improves traditional SfM for the 3D road reconstruction process. Since the degenerate issue of the near-planar road surface reconstruction is solved by the proposed technique, road surface defects are thus detected from the accuracy-enhanced 3D road surfaces.
Two types of experiment were conducted to evaluate the proposed road surface 3D reconstruction for the defects detection technique. In the simulation environment, the first experiment compared SfM and the proposed technique under different road unevenness δ and the noise σ in images. Results showed that the changing of δ does not affect the reconstruction error using the proposed technique but increases dramatically for traditional SfM when δ is close to 0. The second experiment compared traditional SfM and the proposed technique under the different rotation angles θ x , θ y , θ z for the camera. Results showed that by changing θ x , θ y , and θ z the error is less than 3 mm even at the worst case. The third experiment showed the change of camera to road distance δ h almost does not change the when 0 < δ h < 20 mm. The comparison of the stereo vision technique, traditional SfM, and the proposed technique demonstrated the robustness of the proposed technique for road surface reconstruction under the influence of vibration. For experiments using real images, the first experiment showed the 3D reconstruction error using both traditional SfM and the proposed technique for the reconstruction of a flat surface under laboratory environment. The results showed that the error for traditional SfM is much higher than the proposed technique, and the proposed technique has a mean error within 1 mm and standard deviation within 1 mm for h from 900 to 1600 mm. Lastly, 6300 real road surface images were captured by the presented system on both local road and highway road surfaces. The proposed technique increased the accuracy from 80% to 98.95% and precision from 34.34% to 94.33% for road defects detection. This paper focused on reconstructing a 3D structure for road defects using a downward-facing camera. Future works include: 1. Making the camera facing forward to capture the images in front of the vehicle, and then detect defects and objects on the road surface to help vehicles avoid obstacles. 2. Using deep neural networks on both the images and 3D reconstructed points to improve the accuracy of road surface defects detection. Acknowledgments: The authors would like to thank Murata Manufacturing Co., Ltd. for their support of this work.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
To solve F k , Equation (5) is rearranged to a form of Af k = 0, where: In SfM [30], F k is obtained by solving the minimization problem: After the fundamental matrix F k is calculated, by following the subsequent SfM process, the essential matrix E k is calculated as: where K is the intrinsic matrix of the calibrated camera. The Singular Value Decomposition (SVD) of E k then contributes to the calculation of rotation matrix R k and the up-to-scale translation vector t k between time step k − 1 and k: where there is one correct combination of R k and t k which can make all X r k be in front of the camera. The projection matrix P k is identified by the rotation matrix R k and the translation vector t k . The 3D reconstructed points X r k are finally obtained by the triangulation f t (·): x r k−1 = P k−1 X r k = K[I, 0]X r k x r k = P k X r k = K[R k , t k ]X r k (A6) X r k = f t (K, R k , t k , x r k , x r k−1 ) (A7)