BIM Style Restoration Based on Image Retrieval and Object Location Using Convolutional Neural Network

: BIM is one of the main technical ways to realize building informatization, and the model’s texture is essential to its style design during BIM construction. However, the texture maps provided by mainstream BIM software are not realistic enough and monotonous to meet the actual needs of users for the model style. Therefore, an interior furniture BIM style restoration method was proposed based on image retrieval and object location using convolutional neural network. First, two types of furniture images, namely grayscale contour images from BIM software and real images from the Internet, were collected to train the following network model. Second, a multi-feature weighted fusion neural network model based on an attention mechanism (AM-rVGG) was proposed, which focused on the structural information of furniture images to retrieve the most similar real image, and then some furniture image patches from the retrieved one were generated with object location and random cropping techniques as the candidate texture maps of the furniture BIM. Finally, the candidate ones were fed back into the BIM software to realize the restoration of the furniture BIM style. The experimental results showed that the average retrieval accuracy of the proposed network model was 83.1%, and the obtained texture maps could effectively restore the real style of the furniture BIM. This work provides a new idea for restoring the realism in other BIM.


Introduction
Building digitalization has become an inevitable trend of transformation and upgrading in construction in recent years. As one of the most effective technologies to realize building informatization, building information modeling (BIM) can digitally express the building facilities' physical and functional characteristics. It also provides reliable shared information resources for all kinds of decision-making over the whole life cycle of buildings [1,2]. Therefore, the authenticity and integrity of the BIM's description of building facilities is an important basis for determining whether the model is available. As one of the essential attributes reflecting its style, the texture of the BIM plays a vital role in the quality of the model, especially in the BIM of architectural cultural heritage [3][4][5].
Currently, BIM mainly comes from modeling software and various network platforms, but the model visualization effect provided is challenging in terms of meeting the authenticity requirements. On one hand, the texture map provided by the modeling software is too singular and lack diversity compared to the actual needs. This is different to building a model that is visualized with a natural appearance such as Autodesk Revit. On the other hand, due to the personalized style design requirements, the model provided by the network platform needs to be reconstructed, and it still lacks the natural texture map.

Related Work
Currently, deep neural network models have achieved a series of remarkable results in the field of computer vision such as AlexNet [10], VGGNet [11], ResNet [12], and so on. In recent years, with the extension of deep learning technology, scholars have carried out some related work in BIM, mainly involving the BIM building component, for instance, classification [13][14][15] and architectural style classification [16][17][18][19]. However, as far as we know, there have been few reports on the use of deep neural networks to obtain the texture maps needed by BIM from real scenes and to restore their styles. In addition, the BIM includes the local component style and the overall model style. At the same time, in the real world, all kinds of furniture have different styles because of their different appearance characteristics and play an essential role in the interior home decoration style [18]. Therefore, this paper took the furniture image as the research object.
For furniture images with complex backgrounds and high similarity between classes, scholars have put forward relevant research. For example, in 2016, Bermeitinger et al. optimized the classification performance of neoclassical furniture images by image enhancement through the VGGNet16 model [19]. In 2017, Hu et al. classified furniture styles through manual features, depth features, or their combination, and the results showed that the combination of depth features and manual features had a better classification effect [20]. Hu et al. used the VGGNet model to classify the styles of three types of architectural pictures and a variety of non-architectural objects, and verified the advantages of a deep neural network in style cognition [21]. In 2021, Du et al. took furniture style classification as the research goal, processed the deep features of VGGNet16 by Gram transformation, and achieved good results in furniture style classification [22]. Although the above four methods optimized the classification performance of furniture from different perspectives, they did not optimize the backbone network. In 2018, Wang et al. proposed an AlexNet-S network model combined with an image similarity measurement algorithm to remove duplicate and irrelevant samples in the furniture image database [23]. Then, Luo Xia et al. classified furniture images through AlexNet based on feature fusion, but the above two methods were not further verified on other convolutional neural networks [24]. Sui et al. imitated the human attention mechanism and proposed a convolutional neural network (CNN) model that highlighted the color, contour, and other information of furniture images that makes up for the traditional CNN. The deficiencies in the image color and other features were ignored, and there was a lack of attention to the image channel field [25]. Ataer et al. simplified the VGGNet model by removing the fully connected layer to classify different styles of interior design renderings. Although this method could effectively retain the network convolution features, there was no further study on furniture classification in the renderings [26]. In the research of BIM in 2022, Zhou et al. used the YOLO neural network to detect the position and size information of the object from the camera video. They realized the effect of real-time restoration of the BIM according to the actual scene [27].
Based on the above analysis, in order to restore the style of furniture BIM, a multifeature weighted fusion neural network model based on attention mechanism (AM-rVGG) was first proposed. The model focuses on the furniture image features from the spatial and channel domains of the image, and balances the convolutional features of different depths through multi-feature weighted fusion to improve the accuracy of the cross-domain retrieval of the neural network, and then retrieve the real furniture image with high similarity to the furniture BIM. Second, a texture map generation method is proposed to obtain the texture map of the real furniture image. Finally, the obtained texture map is fed back to the BIM to restore the style of the BIM.

Method
For the problem of BIM style restoration, in order to obtain a realistic furniture image that is most similar to the BIM, first, the contour map of the furniture model is obtained. At the same time, the real furniture image dataset is established by the image preprocessing operation. Then, the contour image of the furniture model is input into the AM-rVGG network model, and the real furniture image similar to the contour image of the furniture model is retrieved in the real furniture image dataset. To obtain the candidate texture map of the BIM, the real furniture image is processed by the texture map restoration method. The obtained texture map is finally fed back to the BIM to realize the style restoration of the BIM. The general framework of the method is shown in Figure 1.
Based on the above analysis, in order to restore the style of furniture BIM, a mu feature weighted fusion neural network model based on attention mechanism (A rVGG) was first proposed. The model focuses on the furniture image features from t spatial and channel domains of the image, and balances the convolutional features of d ferent depths through multi-feature weighted fusion to improve the accuracy of the cro domain retrieval of the neural network, and then retrieve the real furniture image w high similarity to the furniture BIM. Second, a texture map generation method is propos to obtain the texture map of the real furniture image. Finally, the obtained texture map fed back to the BIM to restore the style of the BIM.

Method
For the problem of BIM style restoration, in order to obtain a realistic furniture ima that is most similar to the BIM, first, the contour map of the furniture model is obtaine At the same time, the real furniture image dataset is established by the image prep cessing operation. Then, the contour image of the furniture model is input into the A rVGG network model, and the real furniture image similar to the contour image of t furniture model is retrieved in the real furniture image dataset. To obtain the candida texture map of the BIM, the real furniture image is processed by the texture map resto tion method. The obtained texture map is finally fed back to the BIM to realize the sty restoration of the BIM. The general framework of the method is shown in Figure 1.

AM-rVGG Network Model
VGGNet [11] was first proposed in the ImageNet image classification competition 2014. It mainly has four structures with different depths, among which VGGNet16 is t most widely used. The VGGNet16 network can be divided into five convolutions and o full connection. Each convolution consists of two or three convolution layers. Each co volution layer uses a 3 × 3 convolution kernel, and the Relu activation function is us after the convolution layer. At the same time, each convolution layer is pooled by a 2 maximum pooling layer.
The VGGNet16 model has been widely used in image classification with deep n work depth, but with the increase in the number of convolution layers, the image det information will be easily lost. In this paper, the attention mechanism was introduced in

AM-rVGG Network Model
VGGNet [11] was first proposed in the ImageNet image classification competition in 2014. It mainly has four structures with different depths, among which VGGNet16 is the most widely used. The VGGNet16 network can be divided into five convolutions and one full connection. Each convolution consists of two or three convolution layers. Each convolution layer uses a 3 × 3 convolution kernel, and the Relu activation function is used after the convolution layer. At the same time, each convolution layer is pooled by a 2 × 2 maximum pooling layer.
The VGGNet16 model has been widely used in image classification with deep network depth, but with the increase in the number of convolution layers, the image detail information will be easily lost. In this paper, the attention mechanism was introduced into the VGGNet16 network model, and the method of multi-feature weighted fusion was used to compensate for the network details. The structure of the network model is shown in Figure 2.
s 2022, 12, x FOR PEER REVIEW 4 the VGGNet16 network model, and the method of multi-feature weighted fusion used to compensate for the network details. The structure of the network model is sh in Figure 2. According to the AM-rVGG model framework in Figure 2, for furniture images complex backgrounds, the high similarity between classes and large differences w categories, details such as texture and shape, which can be used to distinguish classes not obvious enough. Moreover, when the image is processed by the convolutional and the fully connected layer to obtain high-level semantic features, it will cause fu loss of detailed information. In order to effectively retain the detailed information o bottom layer of convolution and balance the features of different convolution dept multi-feature fusion method is proposed. First, the convolution layer of the VGGNet is divided into five convolutions, and the convolution features of different depths ar lectively fused. Specifically, the fourth and seventh convolutional output features of V Net16 are fused, respectively, and the convolution output feature of the tenth layer the convolution output feature of the thirteenth layer are fused. For the network m the features of the last fully connected layer are input as the final classification feat In order to make better use of the fused features for classification, the features of the convolutional layer and the fully connected layer are spliced and fused.
Multi-feature fusion can retain the convolution features of different depths wit weight and then balance the features of different depths. However, in order to mak network model have a more obvious image classification ability, it is necessary to fus detailed information that can represent the image category. Therefore, the attention m anism (convolutional block attention module, CBAM) can be used to highlight the we of the detail features to improve the classification performance of the network [28]. I CBAM module, the input feature is used to infer the attention map along two indepen dimensions (channel and space) in turn and then multiplied with the input feature to generate the recalibrated feature. Assuming an input feature as a one-dimens channel attention feature descriptor and as a two-dimensional spatial feature descri the overall process of the attention mechanism is: According to the AM-rVGG model framework in Figure 2, for furniture images with complex backgrounds, the high similarity between classes and large differences within categories, details such as texture and shape, which can be used to distinguish classes, are not obvious enough. Moreover, when the image is processed by the convolutional layer and the fully connected layer to obtain high-level semantic features, it will cause further loss of detailed information. In order to effectively retain the detailed information of the bottom layer of convolution and balance the features of different convolution depths, a multi-feature fusion method is proposed. First, the convolution layer of the VGGNet1616 is divided into five convolutions, and the convolution features of different depths are selectively fused. Specifically, the fourth and seventh convolutional output features of VGGNet16 are fused, respectively, and the convolution output feature of the tenth layer and the convolution output feature of the thirteenth layer are fused. For the network model, the features of the last fully connected layer are input as the final classification features. In order to make better use of the fused features for classification, the features of the last convolutional layer and the fully connected layer are spliced and fused.
Multi-feature fusion can retain the convolution features of different depths without weight and then balance the features of different depths. However, in order to make the network model have a more obvious image classification ability, it is necessary to fuse the detailed information that can represent the image category. Therefore, the attention mechanism (convolutional block attention module, CBAM) can be used to highlight the weights of the detail features to improve the classification performance of the network [28]. In the CBAM module, the input feature is used to infer the attention map along two independent dimensions (channel and space) in turn and then multiplied with the input feature map to generate the recalibrated feature. Assuming an input feature as a onedimensional channel attention feature descriptor and as a two-dimensional spatial feature descriptor, the overall process of the attention mechanism is: where ⊗ stands for the multiplication of elements. In order to fuse more representative features and avoid the loss of information due to the attention module, a weighted feature fusion method based on the attention mechanism is proposed. In this method, the reweighted features of the attention module are further fused with the input features, and the fused features are input to the next convolution. The feature weighted fusion method based on the attention mechanism is shown in Figure 3.
R REVIEW 5 of the fused features are input to the next convolution. The feature weighted fusion metho based on the attention mechanism is shown in Figure 3. More comprehensive and typical features can be obtained through the multi-featu weighted fusion network based on the attention mechanism. Before the features are inpu into the fully connected layer, they need to be processed in the channel directly throug the flattening operation. The traditional VGGNet16 model has three fully connected lay ers, but only the features before the last fully connected layer are used for classificatio The features before the fully connected layer may cause further loss of feature informatio during the flattening operation and the fully connected layer; therefore, based on the tr ditional VGGNet16 model, the second layer of full connection is removed, and the feature weighted based on the attention mechanism are spliced with the last fully connected fe ture, which is used for the final classification.

Texture Map Restore
The furniture image finally obtained by the above method often has a complex back ground, and the texture image of the furniture itself is not uniform. In order to genera the texture map of the furniture as accurately as possible, the complex background of th furniture image should first be filtered. For first-stage target detection algorithms such a YOLO and SSD, they can quickly identify and select targets from complex image back grounds [29,30]. Therefore, in this paper, the object detection algorithm YOLO was use to obtain the minimum block diagram of the real furniture object [29]. The image wa cropped according to the generated block diagram coordinates, and then the individu furniture image was cropped from the complex original image. However, the individu furniture image contained the textures of different structures of the furniture itself, in o der to obtain the texture map of each part to restore the BIM style. Therefore, a textu map restore method was proposed. First, the furniture images obtained by clipping we More comprehensive and typical features can be obtained through the multi-feature weighted fusion network based on the attention mechanism. Before the features are input into the fully connected layer, they need to be processed in the channel directly through the flattening operation. The traditional VGGNet16 model has three fully connected layers, but only the features before the last fully connected layer are used for classification. The features before the fully connected layer may cause further loss of feature information during the flattening operation and the fully connected layer; therefore, based on the traditional VGGNet16 model, the second layer of full connection is removed, and the features weighted based on the attention mechanism are spliced with the last fully connected feature, which is used for the final classification.

Texture Map Restore
The furniture image finally obtained by the above method often has a complex background, and the texture image of the furniture itself is not uniform. In order to generate the texture map of the furniture as accurately as possible, the complex background of the furniture image should first be filtered. For first-stage target detection algorithms such as YOLO and SSD, they can quickly identify and select targets from complex image backgrounds [29,30]. Therefore, in this paper, the object detection algorithm YOLO was used to obtain the minimum block diagram of the real furniture object [29]. The image was cropped according to the generated block diagram coordinates, and then the individual furniture image was cropped from the complex original image. However, the individual furniture image contained the textures of different structures of the furniture itself, in order to obtain the texture map of each part to restore the BIM style. Therefore, a texture map restore method was proposed. First, the furniture images obtained by clipping were randomly clipped according to the set clipping size and quantity. Then, the average pixels of the images obtained by clipping were counted in different regions. Finally, the images with large differences in the average pixels of the regions were removed. The other images were saved according to the average pixels of different regions, and the images were spliced and synthesized. Finally, the texture maps of different structures of furniture were obtained. The texture map restoration method framework is shown in Figure 4.

Dataset
As far as we know, there is no public image dataset for furniture BIM retrieval. Therefore, a separate dataset was constructed that included two types of images, namely, real furniture images and furniture model contours. Real furniture images through the web crawler technology, according to the category of furniture from Baidu images, downloaded a total of 26,806 real furniture images. Still, the Baidu image platform has the problem of repeated downloading of images; in order to improve the phenomenon of the same image, this paper will randomly retain the real furniture images obtained from the Baidu image platform and finally obtained a total of 11,753 images. These included chairs, beds, tables, windows, and doors. There were one to four sub-categories under each category of real furniture images, and the specific sub-categories are shown in Table 1. For the outline drawing of the furniture model, the furniture model was downloaded from Revit and various platforms according to the above classification, and 1035 pictures were captured from different angles. The specific number is shown in Table 2.
The real furniture image itself had color features, but the contour image of the model lacked color features. In order to reduce the difference between the two different domains

Dataset
As far as we know, there is no public image dataset for furniture BIM retrieval. Therefore, a separate dataset was constructed that included two types of images, namely, real furniture images and furniture model contours. Real furniture images through the web crawler technology, according to the category of furniture from Baidu images, downloaded a total of 26,806 real furniture images. Still, the Baidu image platform has the problem of repeated downloading of images; in order to improve the phenomenon of the same image, this paper will randomly retain the real furniture images obtained from the Baidu image platform and finally obtained a total of 11,753 images. These included chairs, beds, tables, windows, and doors. There were one to four sub-categories under each category of real furniture images, and the specific sub-categories are shown in Table 1. For the outline drawing of the furniture model, the furniture model was downloaded from Revit and various platforms according to the above classification, and 1035 pictures were captured from different angles. The specific number is shown in Table 2.  (15) The real furniture image itself had color features, but the contour image of the model lacked color features. In order to reduce the difference between the two different domains and improve the retrieval accuracy, Gaussian filtering and edge extraction were used to process the real furniture images. It unified the size of the two datasets as 224 × 224. Figure 5 shows an example of some furniture images from the constructed dataset.
Buildings 2022, 12, x FOR PEER REVIEW 7 process the real furniture images. It unified the size of the two datasets as 224 × 224. Fig  5 shows an example of some furniture images from the constructed dataset.

Experimental Setup
The style restoration of the furniture BIM needs to obtain the texture map of the furniture image, which is similar to the outline structure of the furniture model. In o to carry out the feature extraction and retrieval experiment of the furniture model out and real furniture image, methods based on HOG, VGGNet16, AM-rVGG, and AM-rV + HOG were designed. The HOG features were obtained based on the strategy in [31] AM-rVGG + HOG feature is the cascade fusion of the feature vector of the penultim fully connected layer of the AM-rVGG model and the HOG feature. When retrieving cosine distance is used as the similarity measurement method to obtain the desired ac image similar to the outline structure of the furniture model to be tested and then texture map is obtained.
Accuracy is used as the evaluation index of network model training, and its ma matical description is shown in Equation (2) where N represents the number of ima correctly recognized by the network model, and M represents the number of all imag N Accuracy M  The mean average precision (mAP) was used as the evaluation index of the retri experiment, and its mathematical description is shown in Equation (3).

Experimental Setup
The style restoration of the furniture BIM needs to obtain the texture map of the real furniture image, which is similar to the outline structure of the furniture model. In order to carry out the feature extraction and retrieval experiment of the furniture model outline and real furniture image, methods based on HOG, VGGNet16, AM-rVGG, and AM-rVGG + HOG were designed. The HOG features were obtained based on the strategy in [31]; the AM-rVGG + HOG feature is the cascade fusion of the feature vector of the penultimate fully connected layer of the AM-rVGG model and the HOG feature. When retrieving, the cosine distance is used as the similarity measurement method to obtain the desired actual image similar to the outline structure of the furniture model to be tested and then the texture map is obtained.
Accuracy is used as the evaluation index of network model training, and its mathematical description is shown in Equation (2) where N represents the number of images correctly recognized by the network model, and M represents the number of all images.
Buildings 2022, 12, 2047 8 of 12 The mean average precision (mAP) was used as the evaluation index of the retrieval experiment, and its mathematical description is shown in Equation (3).
where precision is the retrieval precision; n is the number of retrieval times; TP is the number of images that should be retrieved; FN is the number that should not be retrieved; and FP is the number that is incorrectly detected. The experiment in this paper was based on the Pytorch deep learning framework, and the Adm optimizer was used as the gradient descent algorithm. The initial learning rate was set to 0.0002, the number of training iterations was Epochs = 200, and the batch data size was Batch-Size = 64. The furniture dataset was divided into a training set and a test set according to a ratio of 9:1. All lab environments were Intel I9-11900K, Nvidia RTX3090, and 128 GB RAM.

Experimental Results and Analysis
First, the accuracy comparison images of AM-rVGG and VGGNet16 on the dataset are given, as shown in Figure 6. From the accuracy comparison curve, it can be seen that the convergence speed and accuracy of AM-rVGG network training had been significantly improved, and the accuracy of the first training of the AM-rVGG network reached about 45%. The comparison results of the highest accuracy and the average accuracy of the network in 200 iterations are shown in Table 3.
where precision is the retrieval precision; n is the number of retrieval times; TP is the number of images that should be retrieved; FN is the number that should not be retrieved; and FP is the number that is incorrectly detected.
The experiment in this paper was based on the Pytorch deep learning framework, and the Adm optimizer was used as the gradient descent algorithm. The initial learning rate was set to 0.0002, the number of training iterations was Epochs = 200, and the batch data size was Batch-Size = 64. The furniture dataset was divided into a training set and a test set according to a ratio of 9:1. All lab environments were Intel I9-11900K, Nvidia RTX3090, and 128 GB RAM.

Experimental Results and Analysis
First, the accuracy comparison images of AM-rVGG and VGGNet16 on the dataset are given, as shown in Figure 6. From the accuracy comparison curve, it can be seen that the convergence speed and accuracy of AM-rVGG network training had been significantly improved, and the accuracy of the first training of the AM-rVGG network reached about 45%. The comparison results of the highest accuracy and the average accuracy of the network in 200 iterations are shown in Table 3.  It can be seen from Table 3 that compared with the VGGNet16 network model, the best accuracy of the AM-rVGG network model was increased by 1.8%, and the average  It can be seen from Table 3 that compared with the VGGNet16 network model, the best accuracy of the AM-rVGG network model was increased by 1.8%, and the average accuracy of the training was increased by 2.2%. The average accuracy of the training had been significantly improved, which further proved that the convergence effect of the network model in the training process had been considerably improved.
In order to verify the effectiveness of this method more comprehensively, this paper designed retrieval experiments based on the HOG, VGGNet16, AM-rVGG network, and AM-rVGG + HOG method for different types of furniture data, and obtained the mAP values for various kinds of furniture. The comparison results are shown in Table 4. The Table 4 show that among the 12 categories of furniture images, the mAP values of the seven categories of furniture images were the highest by using the AM-rVGG feature and the HOG feature for fusion, and the mAP values of the three types of furniture images retrieved by using the HOG feature were higher and more prominent. By comparing the retrieval method based on HOG with that based on deep learning, it was found that the retrieval method based on deep learning has obvious advantages for small sample furniture images, indicating that deep learning features have more ability to express the image features. By comparing the retrieval experiments of VGGNet16 and AM-rVGG, it was found that the AM-rVGG method had an advantage in eight kinds of furniture image retrieval experiments, which further verified the effectiveness of the proposed method.
Finally, based on the AM-rVGG method, the mainstream BIM software Revit 2020 was used as a platform. The interior door was used as an example to verify the restoration of the furniture BIM style. First, the outline drawing of the BIM of the indoor door was obtained and processed by the AM-rVGG network model, and the cosine distance was used as the similarity evaluation index to sort and output the real furniture images similar to the BIM. The steps to retrieving the outline drawing of the indoor door model are shown in Figure 7.
designed retrieval experiments based on the HOG, VGGNet16, AM-rVGG network, and AM-rVGG + HOG method for different types of furniture data, and obtained the mAP values for various kinds of furniture. The comparison results are shown in Table 4. The Table 4 show that among the 12 categories of furniture images, the mAP values of the seven categories of furniture images were the highest by using the AM-rVGG feature and the HOG feature for fusion, and the mAP values of the three types of furniture images retrieved by using the HOG feature were higher and more prominent. By comparing the retrieval method based on HOG with that based on deep learning, it was found that the retrieval method based on deep learning has obvious advantages for small sample furniture images, indicating that deep learning features have more ability to express the image features. By comparing the retrieval experiments of VGGNet16 and AM-rVGG, it was found that the AM-rVGG method had an advantage in eight kinds of furniture image retrieval experiments, which further verified the effectiveness of the proposed method.
Finally, based on the AM-rVGG method, the mainstream BIM software Revit 2020 was used as a platform. The interior door was used as an example to verify the restoration of the furniture BIM style. First, the outline drawing of the BIM of the indoor door was obtained and processed by the AM-rVGG network model, and the cosine distance was used as the similarity evaluation index to sort and output the real furniture images similar to the BIM. The steps to retrieving the outline drawing of the indoor door model are shown in Figure 7. In general, to obtain a real furniture image, the image with the maximum similarity is used as the input to generate a candidate texture map. For an obtained real furniture image, a minimum block diagram of a real furniture object is first obtained by utilizing a object location algorithm [19] and is output by clipping; the main door body is stripped from a complex background image in the step, then the door in the main body image is processed by a texture map restoration method to obtain a candidate texture map. The obtained texture maps of the door panel, the glass, and the door handle are finally fed back to the Revit modeling software to realize the style restoration of the indoor door model. The style restoration steps are shown in Figure 8. In general, to obtain a real furniture image, the image with the maximum similarity is used as the input to generate a candidate texture map. For an obtained real furniture image, a minimum block diagram of a real furniture object is first obtained by utilizing a object location algorithm [19] and is output by clipping; the main door body is stripped from a complex background image in the step, then the door in the main body image is processed by a texture map restoration method to obtain a candidate texture map. The obtained texture maps of the door panel, the glass, and the door handle are finally fed back to the Revit modeling software to realize the style restoration of the indoor door model. The style restoration steps are shown in Figure 8. An example of the style restoration for a broader type of furniture BIM is shown in Figure 9. The AM-rVGG method was used for the retrieval experiment, and the experimental results were sorted according to the similarity. In this experiment, the style of the furniture BIM was generated based on the furniture image with the largest similarity. It can be seen from the example that for the model with more complex construction such as the European bed, the real furniture image retrieved had more complex and diverse textures, and the candidate textures processed by the texture map restoration method were also more real, so the overall style of the BIM generated by its style was more real. For a model with a simple structure such as an office chair, the retrieved image texture of the office chair is relatively simple, so the overall style effect of the BIM after texture map feedback is not as An example of the style restoration for a broader type of furniture BIM is shown in Figure 9. In general, to obtain a real furniture image, the image with the maximum similarity is used as the input to generate a candidate texture map. For an obtained real furniture image, a minimum block diagram of a real furniture object is first obtained by utilizing a object location algorithm [19] and is output by clipping; the main door body is stripped from a complex background image in the step, then the door in the main body image is processed by a texture map restoration method to obtain a candidate texture map. The obtained texture maps of the door panel, the glass, and the door handle are finally fed back to the Revit modeling software to realize the style restoration of the indoor door model. The style restoration steps are shown in Figure 8. An example of the style restoration for a broader type of furniture BIM is shown in Figure 9. The AM-rVGG method was used for the retrieval experiment, and the experimental results were sorted according to the similarity. In this experiment, the style of the furniture BIM was generated based on the furniture image with the largest similarity. It can be seen from the example that for the model with more complex construction such as the European bed, the real furniture image retrieved had more complex and diverse textures, and the candidate textures processed by the texture map restoration method were also more real, so the overall style of the BIM generated by its style was more real. For a model with a simple structure such as an office chair, the retrieved image texture of the office chair is relatively simple, so the overall style effect of the BIM after texture map feedback is not as obvious as that of a European bed. The AM-rVGG method was used for the retrieval experiment, and the experimental results were sorted according to the similarity. In this experiment, the style of the furniture BIM was generated based on the furniture image with the largest similarity. It can be seen from the example that for the model with more complex construction such as the European bed, the real furniture image retrieved had more complex and diverse textures, and the candidate textures processed by the texture map restoration method were also more real, so the overall style of the BIM generated by its style was more real. For a model with a simple structure such as an office chair, the retrieved image texture of the office chair is relatively simple, so the overall style effect of the BIM after texture map feedback is not as obvious as that of a European bed.

Conclusions
In order to solve the problem that the texture map provided by the mainstream BIM software was not real enough and single, which cannot meet the needs of users for the BIM style design, in this study, taking advantage of the advantages of a convolutional neural network, a style restoration method of indoor furniture BIM based on a convolutional neural network is proposed. First, in order to train the proposed model, a dataset of grayscale contour images from the BIM software and real images from the Internet were constructed. Second, in order to obtain the most similar real images, a multi-feature weighted fusion neural network model based on the attention mechanism (AM-rVGG) was proposed. The model optimized VGGNet16 by the attention mechanism and feature fusion, and the average retrieval accuracy was improved by 2.2%. Finally, a technique based on object localization and random cropping was used to generate some furniture image blocks from the retrieved furniture images as candidate texture maps for furniture BIM. The candidate texture map was fed back to the BIM software to realize the style restoration of furniture BIM.
Experiments showed that the BIM restored by this scheme could restore the real scene more realistically, which provides a new idea for the research of various model styles under the concept of BIM. With the continuous research of the deep learning algorithm in the field of BIM, it will further promote the development of building digitalization. It is worth mentioning that in the face of complex BIM, the method of this study is still insufficient to extract the local complex texture map, which will be the next research direction of this study.