Intelligent Recognition Method of Decorative Openwork Windows with Sustainable Application for Suzhou Traditional Private Gardens in China

: Decorative openwork windows (DO-Ws) in Suzhou traditional private gardens play a vital role in Chinese traditional garden art. Due to the delicate and elegant patterns, as well as their rich cultural meaning, DO-Ws have quite high protection and utilization value. In this study, we ﬁrstly visited 15 extant traditional gardens in Suzhou and took almost 3000 photos to establish the DO-W datasets. Then, we present an effective visual recognition method named CSV-Net to classify different DO-Ws’ patterns in Suzhou traditional gardens. On the basis of the backbone module of the cross stage partial network optimized with the Soft-VLAD architecture, the proposed CSV-Net achieves a preferable representation ability for distinguishing different DO-Ws in practical scenes. The comparative experimental results show that the CSV-Net model achieves a good balance between its performance, robustness and complexity for identifying DO-Ws, also having further potential for sustainable application in traditional gardens. Moreover, the Canglang Pavilion and the Humble Administrator’s Garden were selected as the cases to analyze the relation between identifying DO-W types and their locations in intelligent approaches, which further reveals the design rules of the sustainable culture contained in Chinese traditional gardens. This work ultimately promotes the sustainable application of artiﬁcial intelligence technology in the ﬁeld of garden design and inheritance of the garden art.


Introduction
The Chinese traditional garden, from the Shang and Zhou dynasties (11th century BC) to the Ming and Qing dynasties (the end of the 19th century AD), has, for over three thousand years, formed a unique garden system in the history of the world. The Chinese traditional garden is a highly concentrated collection of Chinese traditional culture, technique and humanistic spirit. As a significant style of Chinese traditional garden, private gardens in Suzhou contain characteristic local conditions and customs of regions south of the Yangtze River; as a result, the 21st meeting of the UNESCO World Heritage Committee approved recognizing the Humble Administrator's Garden, the Lingering Garden, the Master of the Nets Garden and the Mountain Villa with Embracing Beauty as typical examples of "Suzhou traditional garden" included in the World Heritage List in 1997. They highly praised Suzhou gardens and commented that "art, nature, and ideas are integrated perfectly to create ensembles of great beauty and peaceful harmony" [1].
Decorative openwork windows (DO-Ws), a type of ornamental carved window applied in Chinese gardens, were set on a white wall. The carved pattern in the window hole (2014) analyzed the patterns of DO-Ws from the auspicious culture [12]. There were also studies on the spatial light and shadow changes [13], the spatial atmosphere changes [6] and the spatial level changes [14] brought by the design of DO-Ws from the perspective of the methods and concepts of their spatial treatment. In addition to the above research on the ancient meaning of DO-Ws, more and more research has actively explored the modern use of them in recent years [15]. Some research works studied the application of the modeling elements and the artistic effect of DO-Ws [16] or analyzed and referenced the design concept of DO-Ws [17], in order to use new materials and new forms to integrate traditional culture into modern design [18].
Relying on the Internet of Things system that was built by stationary surveillance cameras, smart phones, mobile robots and other terminals, taking the acquisition of highquality image data as a foundation, in order to study classification and recognition networks based on deep learning or other methods [19][20][21], can provide accurate, reliable and realtime recognition results for landscape managers and designers with higher efficiency and lower costs, which has gradually become the research focus in the domestic and overseas areas. Artificial intelligence technology has been widely used in the related fields such as urban planning administration, architecture and landscape data monitoring. Yi Zheng (2020) discussed the function of accurate recognition of artificial intelligence in assisting the planning and management of urban landscapes [22]. In the field of architecture, Wei-xuan Wei (2019) thought that artificial intelligence has not only enriched the content of architectural design but also made the learning process of classic architectural cases simpler and more efficient for designers [23]. At the same time, he also discussed the application prospect of artificial intelligence in the construction of intelligent scenic spots. In addition to the above contributions in the design area, artificial intelligence technology has also been leading the way in the field of landscape data monitoring. Whether being used for recording urban tree spaces [24], monitoring plants' water, fertilizer, disease and insect pest situations or preventing and controlling forest fires with the combination of 3S technology [25], artificial intelligence has the efficiency, accuracy and perspectiveness that the human brain and traditional computer technology do not have in the monitoring and prediction of landscape data [23]. Except for the research on modern landscapes, a few scholars have begun exploring the application value of artificial intelligence technology in the research of Chinese classical gardens. Kuo C J (2003) once came up with establishing the knowledge base of Chinese classical garden design by using artificial intelligence [26], but unsolved problems remained; for example, there were so many garden elements that it was difficult to allocate the weight of the database's content according to their importance.
According to the various studies of DO-Ws above, it can be found that, regarding the present research of the decorative art of DO-Ws, there are few research works focusing on the classification and identification of their different types, and the research on the patterns of DO-Ws also lacks a scientific identification system. At the same time, the research on analyzing the correlation between the theme of DO-Ws and the garden space in allusion to a certain garden is insufficient as well. Although artificial intelligence technology has been developed in many fields of modern landscaping, in reality, the research applied to landscape architecture is quite deficient. Most of the works are related to comprehensive research in urban planning, architecture, biology and other fields [24], which shows that the application of artificial intelligence in landscape architecture is still elementary to some extent. However, rarely did the research utilizing artificial intelligence's intelligent recognition form a knowledge database and ultimately achieve the goal of intelligent design being conducted [27]. In the research of Chinese classical gardens, there are occasionally some works focusing on intelligent identification, which also lack breakthrough due to the complexity of garden elements, however.
Recently, deep learning methods have become the dominating approach for image classification.
Year by year, various new architectures have been proposed. For example, VGG-19 was proposed by the famous VGG (Visual Geometry Group) of Oxford University [19]. Since deeper neural networks are more difficult to train, a residual learning framework defined as ResNet [28] was proposed by changing the layer inputs instead of learning unreferenced functions. Another solution is to transfer information in a partial structure without passing through the neural network, which was named Inception-v3 [29]. This was proved to preserve some original information and effectively prevent the gradient dispersion problem in backpropagation. Similarly, the basic idea of DenseNet was derived from ResNet, but it established dense connections between all the previous layers and the latter layers. DenseNet was intended to alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse and substantially reduce the parameters [21]. We found that those methods above have a poor performance in identifying DO-Ws in the complex landscape environment. Due to some characteristics of DO-Ws such as their complex patterns, various categories, some environmental factors, background interference and the changes in the equipment's posture, achieving a good identification of DO-Ws is still in a tight corner. As a result, we propose the optimized conception of CSV-Net that has better performance and sustainability in the recognition of DO-Ws in Suzhou traditional gardens.
Taking the DO-Ws of Suzhou traditional gardens as the research subject, this paper obtained more than 2000 photos of DO-Ws from the traditional gardens in Suzhou. By using the deep learning architecture identified as CSV-Net, based on convolution neural networks, we constructed datasets of the DO-Ws in Suzhou traditional gardens, which could not only identify the design context and pattern information of different DO-Ws but could also obtain the GPS information of all DO-Ws in extant gardens accurately and efficiently. The most important aspect is that CSV-Net can also restructure basic symbol units of DO-Ws by machine learning, in order to generate alternative DO-W designs automatically according to the design requirements of different landscape environments in the further research and design work in the future, and ultimately realize the sustainable application of artificial intelligence technology in the field of traditional-style landscape design.
The DO-W identification datasets based on CSV-Net can be further studied based on type identification; for example, CSV-Net can determine which gardens certain patterns appear in and ascertain if they are related to the garden space. It can also detect the basic pattern unit in DO-Ws to conduct the extraction or design of fundamental figures in ornamental patterns. From the construction of the DO-W datasets, at the end of this paper, we take the DO-W identification results of two famous gardens as examples, in order to clarify the types as well as distribution of DO-Ws in a certain garden and then discuss the internal relation between their types and the garden space according to the identification results, which could assist in conducting more convenient and efficient traditional-style garden design in the modern society.
The deep learning architecture based on CSV-Net for the DO-Ws of Suzhou traditional gardens will, firstly, provide help for the teaching and researching activities in landscape architecture design; secondly, the datasets about DO-Ws can boost the modern application of DO-W art [11,30]. Thirdly, the method of CSV-Net provides a good way to learn the elements in traditional gardens, which ultimately has sustainable value in inheriting as well as propagating the garden art.

Experiment Scenes
In order to ensure a sufficient image quantity of DO-W patterns, 15 classical gardens in Suzhou, such as the Humble Administrator's Garden, the Lingering Garden and the Master of the Nets Garden, were visited ( Figure 1) through field research. We conducted an inch-by-inch search of DO-Ws in these gardens and took photos in order to preliminarily establish an initial dataset of DO-W identification and classification. The image acquisition work in this paper started from 5 January 2021 to 10 January 2021, and each DO-W photo was taken from three angles (front view, left view and right view). At last, 2785 photos with the size of 2880 × 2160 pixels were collected. Among them, 2520 photos were valid. There were 501 DO-W photos taken in the Humble Administrator's Garden on 5 January; on 6 January, 213 photos were taken in the Lion Forest Garden, 21 photos were taken in the Half Garden, 102 photos were taken in the Couple's Garden and 63 photos were taken in the Garden of Cultivation; on 7 January, 165 photos were taken in the Lingering Garden, 153 photos were taken in the Xiyuan Temple, 75 photos were taken in the Mountain Villa With Embracing Beauty and 127 photos were taken in the Garden of Harmony; on 8 January, 351 photos were taken in Tiger Hill and 186 photos were taken in the Master of the Nets Garden; on 9 January, 279 photos were taken in Yan's Garden in Mudu Town, Suzhou, 90 photos were taken in the Hongyin Mountain Villa and 351 photos were taken in the Canglang Pavilion; and on 10 January, 108 photos were taken in the Retreat and Reflection Garden in Tongli Town, Suzhou. In order to facilitate the establishment of DO-W pattern datasets, the DO-Ws in Suzhou classical gardens photographed in this study have more than 100 sets of images per class, which have the basic amount of data to meet the model training.

Image Sample Labels
There are many types of DO-W patterns, and most of them are made up of various auspicious symbols' permutations and combinations. Auspicious culture is an indispensable part of Chinese traditional culture, which reflects the ancient Chinese people's worship of nature and non-natural forces when productivity and living standards were undeveloped. They used these auspicious symbols drawn from nature, legends, or other ways to pray for good luck and exorcise misery [31].
As DO-Ws in traditional gardens have important protection and research value, there are many ways to classify DO-Ws. For example, according to whether the line of the window bar is straight or not, the patterns can be divided into soft or hard patterns of DO-Ws [32]. In addition, according to the content of the DO-W pattern, it can be divided into a flower pattern DO-W, a bird pattern DO-W, an insect pattern DO-W, a fish pattern DO-W, etc., according to the classification of DO-Ws in Lin-di Cao's Illustrated Suzhou Garden Flower Window. After the field investigation, it was found that not every Suzhou traditional garden has all five types of DO-W, and in different gardens, the proportions of different types of DO-W are not the same. In most of the gardens investigated, plant pattern DO-Ws accounted for the largest proportion, followed by utensil patterns, animal patterns and geometric patterns. Thereby, this study divided over 1000 windows into the following five types: plant pattern DO-Ws (529 sets), utensil pattern DO-Ws (383 sets), animal pattern DO-Ws (327 sets), geometric pattern DO-Ws (295 sets) and character pattern DO-Ws (256 sets). This article counts the application of different types of DO-W in all the Suzhou gardens investigated, as shown in Table 1. Table 1. Types of main DO-Ws in each garden (statistics by the author).

Themes of DO-Ws The Name of Garden and Percentage of Main Types of DO-Ws
Plant patterns accounted for the largest proportion

Data Pre-Processing and Enhancement
In the process of field shooting, not all DO-Ws had the proper conditions to take perfect pictures from three different angles. Some DO-Ws are unavoidably covered by plants and buildings. Moreover, there are also special places where the DO-Ws are too high to shoot. In the process of image pre-processing, we firstly needed to pick these special photos out and then process the temporarily unavailable pictures in the database with Adobe Photoshop (PS) software to make the image samples look complete and available.
When one of the views of the DO-W was complete but the other views were absent, we usually needed to use the function of "perspective deformation" in PS to stretch the perspective. As shown in Figure 2, we can flip the DO-W samples with the front view to obtain the left and right views with the help of perspective deformation.
In order to compensate for the impact of an uneven sample distribution on the model recognition performance, and to avoid overfitting the network, this paper performed en-hancement processing on a small number of sample data before training. The enhancement method used in this paper mainly includes 5 steps: (1) Random clipping: Randomly crop a rectangular region whose aspect ratio is randomly sampled at [3/4; 4/3] and whose area is randomly sampled at [8%, 100%] and then resize the cropped region into a 448-by-448 square image. of linear overlay between any two images and their corresponding labels. This is conducted as follows: where x a and x b are the original input images, y a and y b are the corresponding labels of the images and λ is a random number from the beta (α, α) distribution with the value range of [0, 1]. The above enhanced steps can obtain the generalization of the network architecture and the improvement in the robustness capability of the model, and all the images are randomly assigned to the network training after the above pre-processing.

Proposed Method
Due to the lack of relevant data, the DO-W classification problem is hard to deal with by using the current CNN method, meaning that in the multiple dimensions as well as perspectives of DO-W recognition, introducing a new architecture is of great significance to achieve more granular information of the RAW images which have not been processed, and, finally, to acquire higher accuracy. On this occasion, the crucial issue that should be treated seriously is how to train a multi-layer neural network structure by improving the time efficiency and compressing large-scale parameters by the method of deep learning technologies and optimization tricks. For the sake of solving these issues, this paper aims to come up with the conception of CSV-Net. The novel architecture consists of three modules: the CSP-Net backbone network, the Soft-VLAD module and the probability classifier, which is shown in Figure 3.

Backbone Network
In this process, a relatively robust network is used because of the high localization of discriminative regions in DO-W categories. With its small-scale parameters, the abundant features can be extracted through constraining the size and stride of the convolutional filters and pooling kernels. By extracting coarse granulometric features through the usage of the truncated cross stage partial network (CSPNet) [33], which acts as a backbone network module, the learning ability of classical coarse-grained CNNs is allowed to be improved, and it can also allow it to maintain sufficient accuracy while reducing its weight in the meanwhile. Taking the ResNet [28] as the basis of the backbone network, we can design the proposed CSPNet to achieve a richer combination of gradients by switching concatenation and transition steps. Thanks to this, the inference speed and accuracy will be greatly improved while the amount of calculation will be reduced. In order to conduct the fine-grained crop classification tasks, after the pre-training of the ImageNet dataset, the CSPNet can offer a type of multi-scale coarse granulometric feature. Additionally, it is worth noting that, as to the input image's size, we should use 448 × 448 to replace the default 224 × 224. The detailed architecture of CSPNet contains three parts: an input module, four CSP stage modules and a final pooling layer. The composition of the input layer is a 7 × 7 convolutional kernel, with a step size of 2 and an output of 64 channels, and a maximum pooling layer, with a convolutional kernel of 2 × 2 and a step size of 2, which passes through the inter CSP stage consisted of n partial residual blocks and a partial transition layer. Then, in the inter CSP stage, the base layer's feature maps are divided into two parallel paths. Among these paths, path b firstly passes through a convolutional layer (the convolutional kernel of it is 1 × 1, and the step is 1, with an output of 64 channels), and then it passes through several residual blocks as with ResNet or ResNeXt. Moreover, there are also two tiny paths in each partial residual block, with path 1 passing through three convolutional layers each with their convolutional kernel sizes of 1 × 1, 3 × 3 and 1 × 1, and the number of output channels is 128. The output of a residual block can directly be acquired by summing the input of path 2 with the result of path 1. For the remaining residual blocks, the output of path b can result from using the same computational process. Similarly, path c also goes through a convolutional kernel of 1 × 1, with a step size of 1 and an output of 64 channels. As we directly splice it with the output of path b, and it goes through a convolutional kernel of 1 × 1, we can obtain an output of 256 channels. The remaining three CSP stages and the inter CSP stage have an obvious difference: by first passing through a convolutional kernel with a 3 × 3 down-sampling layer, we can obtain the output of path a for the other CSP stages, 1~3. Then, the output of path a is passed through path b and path c, and the subsequent parts are the same as the inter CSP stage. All outputs' dimensions will be doubled. The formulae in each CSP stage are shown below: where x is the input of the CSP stage, f 3×3 () represents the lower sampling layer, x a is the output of the lower sampling layer, f b 1×1 (), f c 1×1 () represents the convolutional layer through which path b and path c pass, x b , x c represents the output of f b 1×1 (), f c 1×1 (), g(x b ) 0 represents the input of the 0th residual block, g(x b ) i , g(x b ) i−1 represents the output and input of the i-th residual block, and f Res(X) () represents the residual calculation of the input. (A ⊕ B) indicates that A and B are spliced in the channel dimension. The number of residual blocks in the four CSP stages is n = {3, 3, 5, 2}. Then, the final pooling layer uses the global average pooling (GAP) to form the output of the backbone network, which represents the coarse granulometric feature maps F in the 2048-dimensional vector as follows: where H csp () denotes all layers of the backbone network, and ( x, y) is the input images. Additionally, the Mish function is selected as the activation function with the expression

Soft-VLAD Module
Through analysis of the differences in the feature maps for multi-class categories, we can learn that either using each branched feature map alone or roughly integrating them without reasonable discrimination for DO-W images is always unsatisfactory because some of the supportive information is implied in the coupling association of different features, which is of great help to classify RAW images. Based on this, we constructed an aggregation strategy through which the internal relationship of different granulometric features can be analyzed. As we concatenate all of them with different scales and dimensionalities, the aggregation vector can be acquired with a better representation. However, it is hard to explore context characteristics as well as associations among different feature maps, as the vector of local aggregated descriptors (VLAD) is a basic vector used to capture statistical information in a low-dimensional space. Thus, the trainable soft connect layer is proposed on the basis of VLAD to store the sum of residuals between all features and their corresponding cluster centers, and then it aggregates each local feature at the nearest cluster center, therefore fusing the different feature spaces into a unified meta-space. Using the output features maps F as the input vectors, the soft connect layer consists of a 1 × 1 convolutional operation and a k-means clustering algorithm, which, given multidimensional feature descriptors, computes a single D × K dimensional output to record the cluster centers c i . The soft connect output is written as a i according to the following equation: where l corresponds to the dimension total of the feature vector outputted by the multibranch subnetworks, i represents the serial number of cluster centers, and i represents the serial number of non-cluster centers. w i , b i , w i and b i represent the corresponding training stage updatable parameters, respectively. Following this soft connect layer, a VLAD pooling layer combines intra-normalization and L2 normalization operations to unify the vector dimensionality. Finally, a fully convolutional (FC) layer is added to complete the Soft-VLAD module, which outputs a 256-dimensionality tensor V as follows: with the Soft-VLAD module aggregated, the first-order statistics of residuals between local feature x l and cluster c i in different parts of the descriptor space are associated by the soft connect weights, which achieves better flexibility and nonlinear expressive ability.

Probability Classifier
In this paper, we firstly come up with a probability classifier containing three layers as follows: a 128-dimensional fully convolutional (FC) layer, a dropout layer with a 0.5 forgotten parameter and an average pooling layer. Next, the final predicted values are formed by letting the compact high-dimensional descriptor pass through a SoftMax layer. We take cross-entropy loss as the loss function and apply the label smoothing technique using the new smoothed labels to replace the original ones as follows, in order to reduce the risk of overfitting. y = (1 − ε) y + εu (7) where y is the sample label after the data processing step, ε is the smoothing factor, and u is a fraction of the category numbers. We can infer that label smoothing, which drives the classification probability results of the SoftMax activation function output closer to the correct classification, can ultimately enable the network to have better generalization by suppressing the positive and negative sample output differences.

Training Parameter Settings
Before training the deep neural network, we need to ensure the performance of the prediction model by initializing the hyperparameters of the model. In this paper, we use the Ranger optimizer to optimize the parameters of the entire network. The Ranger optimizer can also be regarded as Radam with the addition of a lookahead. Radam is a modification of the Adam optimization algorithm based on the potential scatter of the variance that dynamically turns the adaptive learning rate on or off, and then it provides a dynamic warm-up without the need for adjustable parameters. On the other hand, by saving two sets of weights (fast and slow weights), the lookahead can be regarded as an external attachment to the optimizer. When the fast weights are updated k times, the slow weights will be updated one step in the direction of the current fast weights, which can effectively reduce the variance and achieve a faster convergence.
For the parameters of the optimizer, we used the default settings, with the initial learning rate set to 1e − 3 and k set to 6, and we used 77.9% of CSPNet's top1 on ImageNet to load the pre-training weights. In addition, the rest of the network parameters were initialized with "He". Throughout the training process, our batch size was set to 112, the entire training period was set to 100 cycles and we used the cosine annealing learning rate reduction algorithm and restarted it. The learning rate was trained for 30 cycles at 1e − 3 firstly, and then we started to reduce the cosine annealing learning rate at the 31st cycle, setting the minimum learning rate to 1e − 6, and at each restart, the learning rate was 70% of the initial learning rate of the previous cycle, and then the cosine annealing learning rate was set to 1e − 6. The cosine annealing step was set to 2, and the length of each stage base cycle was 10, i.e., which means that the learning rate was restarted at the 41st and 61st cycles.

Datasets and Setup
In this study, the DO-W datasets were labeled as five categories including plant patterns, utensil patterns, animal patterns, geometric patterns and character patterns.
There are many DO-Ws with plant patterns, including more than ten types of plant themes such as peony, Ganoderma, begonia and lotus. Most of them have auspicious or religious meanings. The utensil patterns involve many elements of ancient Chinese life. In addition to the festive utensil or exorcism with auspicious meanings, there are also many DO-W patterns with figurative or abstract meanings related to the graced life of the ancient literati. The animal patterns include two forms as follows: mythical and auspicious creatures. These patterns use mythical or realistic animals as the theme, such as a dragon, bat, scalewing or goldfish. They all have distinctive auspicious meanings in China. The geometric patterns of DO-Ws include four types: eight diagrams, ice crack, sun and swastika. The characteristic of this group is using specific geometric symbols as the center of the composition or arranging these symbols in an orderly manner which has an obvious theme and rhythmic beauty. Finally, the character patterns include "Fu", "Shou", "Xi" and "Yue", as well as the Latin Cross, which is unique to the Retreat and Reflection Garden. Examples of DO-W patterns in Suzhou traditional gardens are shown in Figure 4. To ensure the reliability of the DO-W recognition model during the training processes, we randomly selected 15% of samples as the test set, and the remaining 85% as the training set. Moreover, we built a cloud server platform with Ubuntu 20.04LTS, which is equipped with a dual-core Intel Xeon E5-2690 V3@2.6 GHz × 48 processor, 128 G of RAM and 2 × 2 T SSD, 7 NVIDIA Tesla p40 GPUs for graphics and 168 G of computational cache. All the code in this paper is based on the deep learning framework Pytorch-1.7.1 version.

Experiment Results
The evaluation metrics in this paper include: the accuracy of measuring the overall performance of the model on the dataset; using the F1-score (F1) to measure the overall robustness of the model; and time, used to indicate the average single sample recognition time. In order to testify that our method could be well used for recognizing DO-W images, we establish some comparative experiments with other classical CNN methods, which have already achieved remarkable success in other fields, to illustrate the better performance of CSV-Net. In other fields, those CNNs achieved remarkable success. Then, 11 fine-gained methods are used to further explain the effectiveness of multi-steam cross-level fusion in our method for fine-grained classification. Table 2 lists the average values of the accuracy, F1-score and TSM metrics of all the models. As shown in Table 2, compared with other state-of-the-art CNN models, our proposed model achieved comparable or better results for the image classification of DO-W identification on different indicators. Firstly, compared to the other methods, our method could improve the accuracy result by up to 87.9%, with at least 5.5% promotion, which illustrates the advantage of Soft-VLAD aggregation in improving the efficiency and characterization ability of the best multi-layer CNN network. Similarly, our end-to-end method consistently shows a better performance on both the F1-score and TSM, outperforming state-of-the-art methods, and has a relatively compact multi-layer network architecture without extra annotation. This clearly indicates that CSV-Net has better learning and DO-W recognition and representation capabilities in actual application scenes. Figure 5 provides more comparisons of precision-recall curves of different categories, which can further illustrate the feasibility and effectiveness of CSV-Net. Based on the average precision and recall values of each model of all DO-W categories depicted in the figure, our proposed method obtains the highest average value (90.83% and 91.78%, respectively). The results show that the proposed method has a great influence on the learning of advanced features while performing network convergence, which further proves that our model is more stable than the other methods and is more suitable for the DO-W recognition task. In order to test the convergence effect of the model, we conducted an ablation experiment at the same time and compared the value change in the loss function in each module through the control variables. The degree of inconsistency between the predicted result of models and the true label was estimated using the loss function. The smaller the loss function, the better the robustness of the model. As shown in Figure 6, we evaluate the proposed method by comparing the loss function curve with other methods in this experiment. We observe that the general trend of the loss function values provided by each model shows a decreasing trend on the DO-W dataset, which means that after 70 update epochs, they all tend to converge and stabilize. The proposed CSV-Net achieves the best performance in terms of the loss value change. Compared with other models, the overall training loss drops to about 0.7662, with the smallest averages, and the convergence process is fast and stable, with the predicted value reaching close to the true value at the 19th epoch. This shows that CSV-Net can not only improve the representativeness of the multi-layer structure for recognizing DO-W pattens but can also enhance the training efficiency and robustness of the entire model. Additionally, we provide a detailed explanation to study the impact of some important parameters and different components of CSV-Net. As shown in Figure 7, we enumerate the accuracy results of each DO-W category on five comparison models in the experiments. Obviously, the accuracy trend due to CSV-Net (red line) is relatively smooth and stable. In contrast, by mining complementary information of multi-layer feature maps, our method will provide an extra gain to reinforce the quality of learned discriminative features, thereby reaching the overall performance promotion for DO-W recognition. This means that our model performs the high capabilities of distinguishing inter-class discrepancies for different image samples of each DO-W species, which helps to improve the intelligent degree for DO-W recognition in Suzhou traditional gardens.

The Sustainable Application of DO-W Recognition
"The top four greatest classical gardens of Suzhou"-the Canglang Pavilion, the Lion Forest Garden, the Humble Administrator's Garden and the Lingering Garden [34], which were, respectively, built in four different dynasties as follows: Song (996-1279), Yuan (1271-1368), Ming (1368-1644) and Qing (1636-1912) dynasties-have been enrolled in The World Heritage List successively as outstanding representatives of Suzhou classical gardens. This discussion takes the Canglang Pavilion, the earliest built garden among these four famous gardens in Suzhou, and the Humble Administrator's Garden, the largest existing garden, as discussion objects, in order to firstly analyze the position of the DO-Ws in these two gardens and then identify and classify their types relying on CSV-Net so that we can conduct further discussion of the relation between DO-W patterns and the garden space in which they are located, which shows the sustainable application in the rules of garden design based on the results of the experiment.

DO-W Recognition in the Canglang Pavilion Relying on the CSV-Net
The Canglang Pavilion is the oldest garden among the existing gardens in Suzhou. Shun-qin Su (1008-1048), the owner of the garden, built the pavilion near the river in Suzhou City, called Pingjiang City in the Northern Song dynasty, which is full of natural beauty. In the Qing dynasty, the pavilion named Canglang in this garden was relocated to an earth mound, and the bridges, corridors and windowed verandas have also been constructed successively near the pond. The garden covers an area of about 16 mu (mu, a Chinese unit of land measurement that is commonly 666.7 square meters), and its layout is dominated by mountains, with the main pond outside the garden. The mountains, rocks, trees, pavilions, storied buildings and towers in the garden are all delicate and elegant. In the northeast side of the garden, there is a double-sided corridor connecting the garden with the landscape scenery outside by using DO-Ws, which has become a model of view borrowing treatment in Suzhou classical gardens.
With the establishment of the intelligent identification database of the Canglang Pavilion, it reflects that the Canglang Pavilion involves five types of DO-W patterns, where the number of plant pattern windows is the largest, the animal pattern windows are the fewest in frequency and the rest are evenly distributed. From the perspective of its garden space layout, it can be found that the distribution of DO-Ws in the Canglang Pavilion is mainly concentrated in the north area along the river, as well as in the west part of the garden's landscape space, as shown in Figure 8. This picture shows that the Canglang Pavilion takes a circular corridor space as the main line of sightseeing, around which are abundant DO-Ws. The DO-W patterns here also have a certain regularity, which is using a combination of five patterns of DO-W to decorate important landscape interfaces so that they can enrich the diversity and interestingness of the garden. In the gardening space mainly created by natural mountains and forests, the patterns of DO-W are based on plants, in order to extend the natural and wild atmosphere of the garden as well as the artistic conception of plant culture. In the northeast corner of the garden, the double-sided corridor near the water is one of the best examples of landscape borrowing disposition. Additionally, the layout of its DO-Ws also shows ingenuity. A total of 25 DO-Ws are neatly arranged along the twists and turns of the corridor, connecting the landscape inside and outside the garden. This double-sided corridor locates in the middle of the earthen hill inside the garden and the clear pond outside the garden so that the visitors here can experience the joy of nature; this is why it is considered to be the best scenic spot in the garden. DO-Ws of the double-sided corridor cover the five types of DO-W patterns of classical gardens, including the symbols of plant patterns, utensil patterns, animal patterns, geometric patterns and character patterns. It has Ganoderma patterns, begonia patterns, kaki calyx patterns, plum blossom patterns, flower basket patterns, ruyi patterns, lantern patterns, Kui Long patterns, swastika patterns, ice crack patterns and shou character patterns, as shown in Figure 9. Together with the double-sided corridor, the patterns of DO-W here express the beautiful scenery and natural livable artistic conception of the Canglang Pavilion. This approach to shaping the main interface of the garden through various types of DO-W is also worthy of being inherited and learned. The corridor for viewing and enjoying the natural scenery in the west of this garden is decorated with pure plant patterns to strengthen the natural atmosphere of the space as well as extending the artistic conception of the garden through the decoration of plant DO-Ws. In the west side of the garden, the connecting corridor centered on the Imperial Stele Pavilion surrounds the only water space in the garden and is decorated with more than 10 DO-Ws of plant symbols with exquisite patterns and different shapes, as shown in Figure 10, which include begonia patterns, jasmine patterns, autumn leaves patterns, Ganoderma patterns, lotus patterns and Malva patterns. While walking along the corridor, visitors are surrounded by a natural and delicate landscape on one side and by exquisite and rustic DO-Ws with plant patterns on the other side, which can not only enrich the experience of the veranda for visitors but can also extend the natural atmosphere as well as the botanical culture that the space is intended to convey at the same time.

DO-W Recognition in the Humble Administrator's Garden
As the largest existing classical garden in Suzhou, the Humble Administrator's Garden was built in the middle of the Ming dynasty, with the owner called Xian-chen Wang. This garden creates an environment which is centered on water and surrounded by forests, reflecting a gardening artistic conception of "the humble people take watering gardens and growing vegetables as their political affairs". After repeated changes to the owner, the garden now includes three parts: the central, western and eastern parts, with a total area of about 4.1 hm 2 . The eastern part of the garden is dominated by natural scenery. In addition to the flourishing flowers and trees, it also combined with earthen hills and a water system, which are quite natural and wild. The central part of the garden is the essence of the whole garden, with many well-proportioned and exquisitely shaped buildings arranged around the central pool, which contains superb landscaping techniques and rich landscape levels. The western part of the garden, dominated by flat hills and grasslands, also has some buildings occasionally arranged, presenting a quaint and natural appearance. These three parts in the Humble Administrator's Garden have their own distinctive styles, which are a precious heritage of Chinese garden art [35].
There are many DO-Ws in the Humble Administrator's Garden, and the types are also extremely rich. It covers four types of DO-W patterns: plant patterns, utensil patterns, animal patterns and geometry patterns. Among them, plant patterns are the most common, followed by utensil patterns. The distribution of DO-Ws in the Humble Administrator's Garden is quite distinctive. In order to create the atmosphere of natural countryside and to express the scenery of mountains and rivers through natural elements, there are few corridors in the central area. Figure 11 shows that the DO-Ws in the Humble Administrator's Garden are mostly decorated in small spaces of the courtyard. It is different from the Canglang Pavilion, where DO-Ws are arranged around large spaces such as the main landscape of hills and ponds. Due to its large area, the central part of the Humble Administrator's Garden is centered on water to reflect the natural and wild atmosphere. At the same time, it also enriches the garden tour experience by arranging courtyard spaces of varying size and shape around the water. While ensuring the full natural atmosphere of the large space, it also commits to making the surrounding courtyard spaces more transparent and exquisite by decorating them with DO-Ws. Owing to the existence of various courtyard spaces, the DO-Ws in the Humble Administrator's Garden also have the function of connecting the sight lines between the inside and outside courtyard space. Furthermore, the application of ingenious decorative patterns of DO-Ws also enriches and extends the design context of small spaces, which can finally reflect the unique style of each space. Take the DO-Ws in the east corridor of the courtyard called "The Listening to the Rain Pavilion" for instance; the patterns of DO-Ws here are mainly the symbols of plant patterns and utensil patterns. Plant symbols include begonia patterns, peony patterns, jasmine patterns, plum blossom patterns and Ganoderma patterns, while the symbols of the utensil patterns are represented by festive symbols such as lantern patterns, flower basket patterns and vase patterns. The DO-Ws here reflect a stronger flavor of life than those in other areas of the garden. Such a design is perhaps related to the function of the elegant life it contains as well as the proximity to the old residential area, as shown in Figure 12. In addition, the patterns of DO-Ws in the courtyard of "The Malus Spring Dock" next to "The Listening to the Rain Pavilion" are very distinctive too. It can be seen in Figure 13 that five patterns of DO-Ws are similar, and only the DO-Ws with swastika patterns set on the partition walls in the courtyard have slight differences, which not only has the meaning of auspiciousness but also creates a type of sequential sense and rhythmic beauty.  Another example is the water corridor on the east side of the western part of this garden. It is a model for water corridor design in Suzhou traditional gardens due to its rich variation in both plane and section. The design of DO-Ws here is also quite distinctive, including the patterns of Ganoderma patterns, begonia patterns, Patra leaves patterns, flower basket patterns, ruyi patterns, turtle shell patterns, Kui Long patterns, swastika Patterns and shou character Patterns. Similar to the DO-Ws of the double-sided corridor in the Canglang Pavilion, this place also uses the combination of various DO-W patterns at the important interface of the garden to enrich the diversity and interestingness of the garden as well as suggesting the spatial boundary. Whether walking through the water corridor or viewing it through the water, visitors can feel the interpenetration of the central and western gardens thanks to these DO-Ws. This charm of the boundary that seems to separate the spaces from each other with the combination of dynamic changes in the DO-W patterns finally creates a cheerful landscape atmosphere of the water corridor space, as shown in Figure 14.

Conclusions
In this investigation, we presented an effective visual recognition model named CSV-Net, which is presented for DO-W classification in practical scenes. In detail, the proposed method firstly employed data augmentation tricks to enlarge the dataset. Secondly, the CSPNet backbone network with its structure and parameters was modified to learn massive feature maps and fine-tuning knowledge. Then, the soft-VLAD module was proposed to fuse different granulometric features into high-dimensionality features. Finally, a probability classifier was assigned to form an ultimate prediction representation for distinguishing the DO-W categories, which is guided by a loss function in the training and testing processes. Several experiments showed the competitive performance of our method favorably against the state of the art. The recognition accuracy and F1-score achieved very competitive results up to 87.9% and 0.918, respectively, both of which outperform the included contrasted models, indicating the better recognition accuracy and model stability. Moreover, the overall parameters of CSV-Net only total 728 MBytes, achieving a good balance among the model's performance, robustness and complexity. This also illustrates that CSV-Net has further potential for intelligent application for identifying DO-Ws or other objects in traditional garden practices.
Additionally, we used CSV-Net to recognize the DO-Ws in the Canglang Pavilion and the Humble Administrator's Garden and to explore the relation between DO-Ws and the composition of the garden spaces. We found that the arrangement of different types of DO-Ws is closely related to their surrounding natural environment, as well as the architectural function of the same space in which they are located. As a result, intelligent recognition of DO-Ws in traditional gardens based on CSV-Net can replace the traditional methods, such as measuring or hand drawing, and it is better than the current CNN methods, such as VGG-19, Inception-v3, ResNet-50 and DenseNet-50, in the aspects of stability, efficiency, capability, etc. The proposed method shows the sustainability of the reuse of datasets and has good performance in extracting the design rules from the excellent traditional gardens to make garden design more efficient and low cost nowadays.
In the future, CSV-Net will continually help us to restructure basic symbol units of DO-Ws by machine learning, in order to generate alternative DO-W designs automatically according to the design requirements of different landscape environments, and to better protect and inherit the decoration arts of DO-Ws. Moreover, it also can be used in the recognition of other objects in traditional gardens, such as the shape of rockery, the pattern of paving and the ornaments of buildings, and ultimately promote the sustainable application of artificial intelligence technology in the field of landscape design as well as acknowledging and inheriting the Chinese traditional garden heritage.
Author Contributions: R.Z. conceived and designed the whole structure of the paper and wrote the paper, Y.Z. and J.K. accomplished experimental work and wrote the paper. C.C., X.L. and C.Z. helped to translate part of the paper. All authors have read and agreed to the published version of the manuscript.