2.1. Traditional Methods for Components Recognition
The general process to identify building components based on the raster image of the CAD drawing can be divided into two steps: primitive recognition and building element recognition [
5].
The primitive recognition identifies basic geometric primitives (e.g., lines, arcs and circles) from the images of CAD drawings using image processing techniques and computer vision techniques. The essence of this step is feature extraction, which determines whether each pixel in the image belongs to a feature and extracts core information (e.g., shape, color, texture, spatial relations or text) based on these image features. Numerous feature extraction algorithms, like Hough Transform [
15], scale invariant feature transform (SIFT) [
16], maximally stable extremal regions (MSER) [
17], and speeded-up robust features (SURF) [
18], have been proposed in the past few decades for different tasks of object recognition. Among these algorithms, Hough Transform is one of the most common and significant methods to recognize geometric primitives from raster images of CAD drawing. More than 2500 papers focus on its improved algorithms and applications [
19]. Some improved algorithms have been developed based on classical Hough Transform, such as Generalized Hough Transform [
20], Progressive Probabilistic Hough Transform [
21], Random Hough Transform [
22], Digital Hough Transform [
23], and Fuzzy Hough Transform [
24].
After identifying geometric primitives, the next step, namely building element recognition, mainly defines several classification rules based on geometric constraints and topological relations, and categorizes these geometric primitives according to pre-established rules. Many researchers have been working on the recognition of building components in architectural floor plan for a long time [
25]. Macé et al. [
26] proposed a method to detect walls based on the distance and texture between the parallel lines. Ahmed et al. [
27] distinguished walls from other components by dividing the lines into three levels: thick, medium, and thin. Riedinger et al. [
28] took the binarization process for the floor plan first, then the main walls and the dividing walls were detected and located based on the thickness of dark sketches, which represent wall seams. Grimenez et al. [
9] identified building components like walls, doors, windows, and rooms from floor plans using the method of pattern recognition. Meanwhile, others focus on the recognition of structural components in 2D drawings. Lu et al. [
29] put forward a shape-based method to detect parallel pairs (PPs) of structural elements like shear walls and bearing beams. Moreover, Lu et al. [
30] divided the whole image of structural drawing into several segments through detecting the symbol of grids, and then the location information of columns, beams and slabs were extracted with the help of Optical Character Recognition (OCR) technology. Instead of dealing with structural components, Cho et al. [
8] payed more attention to the 2D mechanical drawings and developed an algorithm with a relatively high accuracy to extract the geometrical information of mechanical entities such as ducts, elbows, branches and pipes as well as their corresponding semantic information.
Although existing component recognition methods can be used to identify specific components from CAD drawings, a major disadvantage of these methods is that it spends a lot of time to manually design and extract features, and the robustness of the features extracted in these conventional methods is poor. In addition, since the components obtained from existing methods are composed of several geometric primitives or symbols defined by the fixed pre-established regulations, the generalization ability is weak when dealing with the same components represented in different standards or drawn with different design conventions. Therefore, it is necessary and valuable to propose a more intelligent method to identify structural components in different types of structural CAD drawings. Due to the higher accuracy, strong generalization ability, simple operation, and lower cost (unnecessary to buy expensive measuring instruments), the deep learning technology was performed to detect components in this paper. Moreover, deep learning-based object detection method has been successfully applied in other fields of civil engineering [
31,
32,
33], but no researches using this method were found to identify components in CAD drawings, especially for structural components. The development of deep learning methods in object detection will be reviewed in following section.
2.2. Deep Learning Methods for Object Detection
In recent years, deep learning is one of the research hotspots in the field of artificial intelligence. The rationale of deep learning is to make a computer learn to simulate the way of thinking in the human brain as well as the transmission mode of signal in nervous systems. At present, deep learning has made breakthroughs in object detection such as gesture detection [
34], iris detection [
35], license plate detection [
36], face detection [
37], and human action detection [
38]. According to the process of detection, deep learning algorithms can be divided into two categories: two-stage object detection algorithm and one-stage object detection algorithm.
The two-stage object detection algorithm converts the object detection problem into a classification problem. The overall process can be divided into two stages. First, the region proposal is generated, and then the classifier is utilized to classify and amend the region proposal. Ross Girshick et al. firstly proposed Region-Based Convolutional Neural Network (R-CNN) [
39] by using a selective search [
40]. However, this algorithm requires a large amount of time to calculate and detection speed is slow. To improve R-CNN, Girshick developed Fast Region-Based Convolutional Neural Network (Fast R-CNN) [
41] based on the idea of Spatial Pyramid Pooling Layer (SPP) [
42]. Fast R-CNN greatly improves the speed of detection since it only performs convolution calculation once for the whole image. However, the process of a selective search for generating a region proposal in Fast R-CNN run on the CPU, still spends so much time on convolution calculation. Based on Fast R-CNN, Ren et al. put forward a new algorithm, namely Faster Region-Based Convolutional Neural Network (Faster R-CNN) [
43], to merge the generation of region proposal and the classification of CNN together and take all the computation with the help of GPU. As a result, there is a significant increase in speed as well as accuracy. He et al. [
44] proposed Mask R-CNN based on Faster R-CNN for object detection and instance segmentation. A limitation of this algorithm is that the cost of labeling when segmenting the instance is too expensive. Moreover, its detection speed still cannot reach the real-time level.
The essence of one-stage object detection approach is to transform the object detection problem into a regression problem. Different from the process in the two-stage algorithm that generates the region proposal first, one-stage object detection algorithm can directly create the class probability and coordinate information of the target object through the CNN, which immensely improves the efficiency of objection detection and meets the requirements of real-time detection in computing speed. Redmon et al. [
45] proposed an algorithm named YOLO through dividing an image into N×N grids and predicting the two bounding boxes and their corresponding category information for each grid. On the basis of YOLO, Liu et al. [
46] came up with the SSD based on the anchor mechanism of Faster R-CNN, which guarantees the high accuracy as well as the fast speed of the detection. However, the effect for recognizing a small target is not particularly desirable. To address these problems, researchers proposed some improved algorithms based on YOLO. The YOLOv2 [
47] was developed to improve the detection accuracy and speed by adding batch normalization, multi-scale training and anchor box after each convolutional layer. The YOLO9000 [
47] combined the ImageNet dataset [
48] and the COCO dataset [
49] together and achieved the detection of 9418 kinds of objects using the method of WordTree hierarchical classification.
2.3. Selection for Structural Component Detection
As shown in
Section 2.2, when the accuracy meets certain baselines, deep learning approaches are continuously developed for the speed of detection. Besides, in some application scenarios of object detection, it is noted that researchers prefer to use deep learning algorithms with a simple structure and faster detection speed. In particular, YOLO and improved algorithms based on YOLO are widely applied. YOLO is an end-to-end model that directly predicts the location of bounding boxes and the class probabilities of objects from the original image. Due to this concise and straightforward detection process, the detection speed of YOLO is extremely fast and the object in video can be detected in real-time. Meanwhile, comparing the two-stage object detection approaches like R-CNN, Fast R-CNN and Faster R-CNN, YOLO has less background errors since it trains on the whole image, which effectively helps to acquire contexture information about the target object. In addition, YOLO has characteristics of quick convergence and strong generalization ability.
Furthermore, structural components such as beams and columns in structural drawings have several characteristics like small size, high similarity and less features. When detecting these components using a deep learning-based method, a deep convolutional neural network is needed to form more abstract features to represent the location and category information. Moreover, a structural drawing always contains hundreds of components, and there are dozens of such drawings in a construction project. As a result, the detection method is needed to meet the precision for object detection as well as the speed nearly up to real-time level.
Therefore, according to the analysis mentioned above, YOLO is recommended to use for detecting structural components in scanned CAD drawings.