Central Courtyard Feature Extraction in Remote Sensing Aerial Images Using Deep Learning: A Case-Study of Iran

: Central courtyards are primary components of vernacular architecture in Iran. The di-rections, dimensions, ratios, and other characteristics of central courtyards are critical for studying historical passive cooling and heating solutions. Several studies on central courtyards have compared their features in different cities and climatic zones in Iran. In this study, deep learning methods for object detection and image segmentation are applied to aerial images, to extract the features of central courtyards. The case study explores aerial images of nine historical cities in Bsk, Bsh, Bwk, and Bwh Köppen climate zones. Furthermore, these features were gathered in an extensive dataset, with 26,437 samples and 76 geometric and climactic features. Additionally, the data analysis methods reveal signiﬁcant correlations between various features, such as the length and width of courtyards. In all cities, the correlation coefﬁcient between these two characteristics is approximately +0.88. Numerous mathematical equations are generated for each city and climate zone by ﬁtting the linear regression model to these data in different cities and climate zones. These equations can be used as proposed design models to assist designers and researchers in predicting and locating the best courtyard houses in Iran’s historical regions.


Historical Architecture of Iran
Vernacular architecture, which has developed throughout time, uses a variety of novel techniques and technologies to address a variety of contextual requirements in order to improve sustainability [1,2] and urban identity [3]. Moreover, historic and vernacular architecture as an attraction of cities [4], it can help the economic prosperity [5]. Iran's vernacular architecture has perfected the skill of context adaptation by developing distinct structures in the country's many areas. Despite their fundamental differences, vernacular designs of these regions have a common philosophical underpinning: sustained contextual adaptability [6,7]. Indeed, by definition, a vernacular structure is constructed by locals, employing traditional methods and locally accessible materials to suit domestic ways of living [8], which is consistent with what is currently known as sustainability. That is why it is highly critical to protect these structures instead of reconstruct them [9]. The house's cooling and heating systems are primarily passive [10] in regard to fulfilling varying demands [11], which results in varying architectural approaches in vernacular structures. This strategies are very important in the sustainability of this style of architecture as studies have shown that the rate of energy consumption of buildings may increase in the future due to future uncertainties [12,13] and the design should be optimized in all aspects [14].
Iranian architects have developed various solutions and methods from natural characteristics to suit the country's diverse climatic conditions [15]. In traditional Iranian architecture, cost efficiency was among the most critical factors to be considered. Therefore, materials used in desert architecture must be readily accessible, cheap, and economical. Additionally, vernacular architects used locally accessible materials to construct a haven from atmospheric pollutants [16].
A great example of sustainability in Iranian architecture can be found in Yazd, which has the most significant surviving historic urban fabric in the country. Due to Yazd's hot and dry climate, providing occupants with high living standards was a real challenge for architects [17]. The requirement to protect people from the summer heat and scorching sun has significantly impacted the design methods used to construct shade zones. Thus, the urban design was dense, to provide maximum protection from inclement weather [18,19], and central courtyards were used to meet daylight requirements and fit the design with the climate (Figure 1). As an indistinguishable element of houses in the central plateau of Iran, the central courtyard played a structural role at the macro, middle, and micro scales in municipalities, segments, and structures, and similarly, as a repeated element, presented the vacuum concept as a fundamental component of a coherent urban development model. Central courtyards (vacuum spatial grid) serve as an organizer in the microsystem of structures, particularly small classical houses in Iran ( Figure 2). Since spaces are organized around it, and when the courtyard is excluded, the house's spatial organization is disrupted [20]. Houses with courtyards have been utilized and improved for an extended period because they are advantageous for building in hot-arid climates. Most ancient civilizations used this technique in central Iran and several other arid regions of the Middle East. In hotarid regions and deserts, central yards effectively protect people from severe environmental conditions. In desert regions, rooms that face the yard are often sheltered from severe heat, winter cold, windstorms, and sand [19,21,22]. As part of the house's structure, the courtyard has a consistent structure and will improve and complement space arrangement. Understanding the relationship between components in the organizational structures of Iranian homes is critical to comprehend their interactions. Thus, the connection between the components and the whole in any spatial organization system is determined by various variables, including form, material, size, color, function, and content. Fundamentally, in the relationship between parts and a whole, if one part differs from the others, in one or more characteristics, it represents a distinct component that exists independently of the whole and functions within an organizational system. The component's form is the most significant predictor of a reciprocal connection between it and the whole (Figure 3).

Deep Learning, Architecture, and Planning
Until recently, computational techniques for structural analysis, design, and development have been almost unmatched by methods that analyze the current environment visually, and convert it to digital data and in-depth information. A few years ago, the field of computer vision made significant strides in developing programs for identifying and classifying existing components for various purposes. AlexNet, a convolutional neural network, may be seen as a forerunner to this 2012 advancement [23]. Airborne and satellite imaging remote sensing has also become a critical element of classical architecture and archaeological research, allowing rapid and high-resolution documentation of historical locations [24][25][26][27]. Additionally, it enables the recording and tracking of antique and historical architectural structures that are unreachable due to endangered species or have been permanently destroyed. Satellite and aerial images are becoming increasingly abundant and diverse, making them ideal for this subject. However, these massive quantities of historical architectural information are still examined via human labeling and visual examination. Numerous studies have been conducted on the implementation of deep learning to various classifications of images, both generic and specialized in different areas [28][29][30][31]. Additionally, there are several projects devoted to the categorization of photographs of architectural heritage, trying to use different methods [32][33][34][35]. Since 2016, research utilizing CNNs to automate identifying historical architectural and archaeological elements has made significant progress in this field. In this study, the use of CNNs for prospecting vernacular architecture is demonstrated by extracting characteristics of central courtyards from aerial and satellite images of ancient cities.

Central Courtyards Studies
The primary goal of this research is to examine the traditional central courtyard idea as a passive cooling technique for enhancing interior thermal conditions in Iran's BS climate. An empirical survey was conducted to analyze three significant courtyard architecture variants, including their orientation, size, and ratios, as well as their opaque and transparent surfaces, in fourteen significant historic buildings in five ancient cities, including Mashhad, Shiraz, and Tehran in the BSks mesoclimate, as well as Dezful and Shushtar in the BShs-mesoclimate zone. The analysis considered three design variations: (1) the courtyard's orientation, extension, and rotation angle; (2) the courtyard's size and ratios concerning the overall wind flow; and (3) the courtyard's opaque and transparent surfaces concerning entry points to the overall wind flow [36]. The examination of criteria 1 for the Nasir al Molk building in Shiraz, one of the fourteen study cases, is depicted in Figure 4. Both the house and courtyard were oriented northeast-southwest, with the courtyard angled 45 degrees to the north. This research uses the study findings of fourteen courtyard buildings as case studies to determine the design patterns of courtyard homes at the city level. They summarize the findings of the criterion analysis to offer an acceptable model for building courtyard homes in these cities. Additionally, effort was made to suggest a design based on the length, width, and height ratios (Table 1). According to this study, approximately 17% of the courtyard should be dedicated to natural features, 8% water, and 9% plants. This ratio is feasible for ensuring adequate thermal comfort in the courtyard and adjacent regions based on observation and prior research. This quantitative research indicates that Iranian traditional central courtyards were built with an emphasis on orientation and geometry, concerning physical and natural factors to function well passive cooling systems. Soflaei et al. used the same technique in a prior study on various cities and climatic zones to evaluate the physical characteristics of six significant traditional courtyard homes in Iran's BWks-mesoclimate area. Findings indicated that most Iranian courtyards examined were purposefully built to allow direction, size, and ratio to function as microclimate modifiers.

Aerial Images Remote Sensing
This article compares several techniques for automatically detecting buildings in aerial photos and laser data with varying spatial resolutions. Five techniques are evaluated in two research areas utilizing features derived at the pixel and object level, with the precondition that all methods use the same training set. The techniques are evaluated using error measurements derived by superimposing the findings on every area's manually produced reference map [37]. Land cover classification, using very high-resolution remote sensing imagery, appears to progress from land-use categorization to pixel-level segmentation. Inspired by the current breakthrough of deep learning and the filtering technique used in computer vision, this study proposes a segmentation method that utilizes deep residual networks to create an image segmentation neural network and a guided filter to exclude buildings from remote sensing photos [38]. This study develops a GRRNet for building extraction by fusing high-resolution aerial imagery with LiDAR point clouds. The updated residual learning network is utilized as an encoder element of GRRNet to train multi-level properties from the fusion data. A GFL unit is implemented to avoid inadvertently sending features and maximize the categorization effects [39].
The first stage of categorization, the extraction function, was performed in this study through a pre-trained CNN that recovers the deep characteristics of aerial pictures from different network layers, including the mean pooling layer and any prior convolutional layer. After dimensional reduction on large vector features, the concatenation algorithm was expanded to include the derived features from various neural networks. The research team experimented with different CNN designs to obtain the best results [40]. This article summarizes the results of a study on identifying qanat shafts using CORONA satellite imagery from the Cold War period. The application of deep CNNs for automated archaeological feature detection was investigated through remote sensing. Their case study involved the qanat structure in Iraq [41].
The efficient non-local residual U-shape network is a new type of network proposed in this research. It consists of a well-designed U-shape encoder/decoder setup and an optimized non-local block termed the asymmetric non-local block pyramid. The encoder/decoder framework is utilized to extract and reconstruct feature charts with great care, and APNB will gather global contextual information through a self-care method. The suggested ENRU-Net is evaluated and compared to existing state-of-the-art models using two widely utilized public aerial image building datasets [42]. This study suggested a novel deep transfer learning technique for automatically detecting Hakka Weilong Houses, a renowned ancient dwelling and a major cultural icon for Hakka, a Chinese minority found worldwide. The RS produced the Hakka Weilong House Image Dataset utilizing aerial photos of urban and suburban Meizhou [43].
The article described the development of a novel, fully automated three-dimensional building restoration system capable of producing first-level detail building models from multi-view aerial images without relying on additional information. The recovered models in this study come close to the accuracy and reliability of manually defined models. Three components comprise the proposed method: (1) efficient dense alignment and reconstruction of the earth's surface; (2) consistent building footprint recovery and polygon regularization; and (3) extremely accurate building rooftop and base height inference [44].

Research Goals
Numerous research initiatives in Iran concentrate on ancient central courtyards. They compared numerous characteristics of central courtyards and homes, including their orientation and ratio, in a series of case studies [20,36,[45][46][47][48]. However, conventional data collecting techniques restricted researchers from conducting a thorough case study of nearly entire central courtyards in a municipality or across several cities in Iran. This study will overcome this restriction via machine learning techniques to (VHR) aerial imagery. It will gather and analyze large amounts of data on central courtyard characteristics to uncover previously undiscovered meteorological and geometrical variables that influenced the design of central courtyards. Additionally, this research analyzes the last dataset for the Iranian historical central courtyard as a passive cooling technique in BS and BW climates, to offer an architectural model for modern sustainable structures. The finished design model may serve as a reference for designers and researchers as they progress in the historic districts of these Iranian cities.

Case Study
The case study for this research involves nine cities in Iran's various climate zones ( Figure 5 based on [49]). These cities have different climate conditions for gathering big data of central courtyards. Therefore, the samples will have a variety of features, according to the climate zone. Table 2 shows the list of the cities in the case study and the climate divisions. These cities are separated into two main climate zones BS and BW: arid/steppe and arid/desert. BS and BW climates also have two subcategory climate zones: "H and k" in the cold and hot categories. Thus, there are four climate categories, called Bsh, Bsk, Bwh, and Bwk.

Material and Method
Two distinct research methods were employed in this work. The first involved library research that concentrated on sustainable historical architecture and the passive cooling effect of central courtyards. The second research method was a survey that focused on the physical environmental characteristics of close to 30,000 traditional courtyard homes in nine historic Iranian towns. This research study used a variety of machine learning techniques. The typical machine learning framework is composed of the following components: data gathering, preprocessing, model construction, and model validation. Data collection and preparation result in generating an annotated dataset based on ground truth data or expert opinion. Post-processing of data is required to identify out-layers or inaccuracies. The overall surveying component of the research is divided into four distinct stages. The stages are as follows: (1) detection of courtyards; (2) segmentation of courtyards; (3) extraction of features; and (4) data analysis. In summary, the procedure begins with extremely high-resolution aerial images and concludes with a wide assortment of approximately 30,000 samples, including 76 geometric and climatic characteristics. Figure 6 illustrates the surveying subsection of the study's flow chart.

Very High-Resolution (VHR) Aerial Images
Detecting objects in VHR aerial pictures is a challenging task. The approaches proposed in the literature for solving the object detection problem in VHR images are classified into two broad categories: traditional approaches that rely on handcrafted features and deep learning-based approaches that employ a CNN as a feature extractor and achieve superior performance. Handcrafted features have limited presentation potential and lack the required precision [51]. Deep learning has exceptional performance in various areas, including image processing, due to the automated creation of features [23,[52][53][54][55]. In numerous benchmarks, including PASCAL [56] and COCO [54,55,57,58], region-based CNNs outperformed conventional object detection techniques [54,55,[57][58][59].
However, object recognition is more straightforward in these benchmarks than it is in VHR aerial images. Natural photographs depict significantly larger objects than aerial photographs. Additionally, the appearance of objects changes significantly in VHR images due to occlusion, shadow, lighting, resolution, and perspective fluctuation. As a result, object identification in VHR aerial pictures is more challenging than in natural photographs ( Figure 7). The primary source of input data for this study involves a collection of VHR aerial images. These images were taken between 2011 and 2016 by Iran's National Cartographic Center (Supplementary Materials National Cartographic Center). Aerial photographs of these nine cities are available in a variety of scales, resolutions, and directions. As a result, a preprocessing procedure was used to divide these aerial images into smaller images. Furthermore, the scale and direction of each aerial image of these cities were measured manually. Except for Esfehan, the aerial images were divided into 60 subdivided images. Esfehan's aerial images were subdivided into 30 subimages due to their resolutions being less than half of the main images. The images were subdivided because their resolutions (23,080 × 15,080 pixels) were too high for the training and object detection model. Table 3 shows the list of the cities, their pixel scales, and their northern angles.

Image Segmentation
The Remo app (Supplementary Materials Remo.ai: Image Datasets Management)was used in this study to perform the annotation task for the segmentation step. This step employs two distinct segmentation models: one for segmenting courtyards and the objects contained within, and another for segmenting offset images. Each training dataset contains 550 images; thus, the annotation task was completed for 1100 images in this step. For the first dataset, 4 objects were segmented for training the model: courtyard, a water place, green area, and shadow. Moreover, the courtyard and its associated house were segmented in the offset images. Figure 10 illustrates an image annotation task, such as segmentation, using the Remo app. The Mask R-CNN model was used in this study to perform example segmentation. The proposed method rapidly identifies objects in a picture while simultaneously creating a high-quality segmentation mask for each instance. Mask R-CNN enhances faster R-CNN by introducing a branch projecting an object mask parallel to the current branch for the bounding box identification. Mask R-CNN is easy to train and provides a minimal overhead compared to Faster R-CNN, which runs at five frames per second. Additionally, Mask R-CNN is easily generalizable to various applications, such as estimating human postures within the same framework [63].
In CNN models, the word "backbone" refers to the feature extractor network. These feature extractor networks collect features from the input picture and then upsample them using a basic CNN decoder module to produce segmented masks. The backbone network's purpose is to supply a segmentation mask for every individual object's polygon initialization. As shown in the original mask R-CNN, the instance segmentation model creates a segmentation mask for every instance in the scene. A bounding-box detection step is introduced to anticipate individual key points and segment the picture into distinct building instances [64].
As shown in Figure 11, the backbone network employs the standard two-stage instance segmentation method provided by mask R-CNN [65]. Since this method operates by predicting the segmentation mask, it is divided into detection and segmentation tasks. The design generates well-localized ROI features, which are critical to the model's performance. Figure 11. A diagrammatic representation of the backbone network for a mask R-CNN model [66].
Mask R-CNN is used to identify and then segment objects. The detection process produces localized ROIs using a feature map generated by a feature extractor, such as ResNet, a feature extractor often employed in two-stage object recognition [61,67]. For semantic segmentation, the features of each ROI are fed into basic convolutional layers and object masks. This study applies transfer learning to a mask R-CNN model from Detectron2 [68]. To this end, a ResNet-50 backbone pre-trained on COCO-train2017 is used [61]. After training the model with the two courtyards and courtyard offset datasets, the model was evaluated with a 20% test dataset. The evaluation demonstrates that courtyards, shadows, and houses achieve greater than 90% accuracy in classification, bounding-box, and masking tasks. However, the accuracy of water places and green areas was approximately 80%. Figure 12 illustrates several examples of evaluation datasets following the segmentation task. These images are segmented by the machine for testing the result of the training task. For example, the machine wrongly segmented the shadow of the green areas (yellow lines) in samples that show low accuracy, in some cases.

Feature Extraction and Dataset Gathering
The image segmentation was applied to the courtyards, and courtyard offset datasets, and the result was a set of polygons for each courtyard's objects. The next step was to extract features and collect datasets to create around 30,000 samples with 76 features to create the final comprehensive datasets. In this study, all polygons were combined from the same object class into a single object. Following that, each object's area in square meters was measured. The minimum bounding box for courtyards, houses, and water places was then identified for further measurements (Supplementary Materials MinimumBoundingBox). Figure 13 depicts a fitting minimum bounding box on a polygon. In addition, to extract the geometrical features of the selected object, the dimensions and direction of the minimum bounding box were measured. Figure 13. An example of a polygon's minimum bounding box. The red line represents the polygon created by the segmentation task that depicts the courtyard object. Furthermore, blue is the courtyard's minimum bounding box.
Rather than geometric features, the dataset now includes several climatic features for each city. The EnergyPlus Weather Format (EPW) (Supplementary Materials Weather Data|EnergyPlus)files were then used to extract climatic data for each city. The EPW files contained the climatic information of each city for 14 years (2004-2018). After exporting the climatic data, the mean, min, max, and 50% were calculated for the 14 years and then added to the main dataset. As a result, the final features for each courtyard were 76 geometric and climatic data points. Table 4 summarizes these features in terms of their geometric and climatic classifications. The final dataset consisted of 29,191 samples collected from nine cities. Following that, the data were ready for additional analysis.

Data Analysis
The first step of data analysis was data preprocessing. During data preprocessing, the out layers or incorrectly detected features were identified. For instance, there were several courtyards with an area of less than one (m 2 ) in size; as a result, they were identified, and the entire sample from the dataset was removed. Numerous samples in the dataset contained incorrect data due to the impossibility of checking the results of courtyard detection and image segmentation after each step. However, it was straightforward at identifying and deleting this information from the final dataset. It was expected that approximately 20% of the dataset would contain incorrect data due to a 10% error in object detection and a 10% error during the image segmentation steps. Throughout the data analysis stage, an attempt was made to uncover relations between the dataset features by the use of the Pearson correlation coefficient.

Dataset Analysis of the Cities
The final dataset contained 29,190 samples and 77 columns, one of which was a courtyard ID column. The dataset was reduced to 26,437 samples after preprocessing and removing incorrect and outlier data. As a result, approximately 9.5% of the courtyard detection and segmentation were incorrect. Consequently, these records were deleted from their dataset. After data preprocessing, the final dataset contained 26,437 rows × 77 columns. The first step in reading a dataset is to visualize all of its values in a single plot to understand their distribution. As a result, using a heatmap across the entire dataset is a viable solution ( Figure 14). It shows that, for example, the correlation between courtyard length and courtyard width is approximately 0.88, showing a strong relationship between these two variables. Furthermore, there is no correlation between the house ratio and latitude features, as the heatmap indicates that these features have a correlation coefficient of 0.033. Another visualization method used in this research was scatter plots, which show how the data are affected by different features. For instance, the scatter plot of courtyard area and water place area demonstrates a logical spreading of data ( Figure 15).
Within the visualization, different extracted features can be compared and analyzed in order to reach meaningful results. In this respect, the above visualizations are some examples of the implementation of the generated dataset. It should be mentioned that while reaching meaningful relationships among features is highly significant, and it is not the goal of this paper. Therefore, the mentioned graphs show the possibilities of using the datasets as an example. In this respect, the most important example of this dataset's outcome would be a model for the courtyard shape, which will be explained in the following section.

Proposed Model for the Dimension of the Central Courtyard
One of the study's outcomes is a proposal for a design model for central courtyards, as a passive energy solution in these particular cities. Additionally, this dataset can pose as a model for additional cities, by running the central courtyard feature extraction model on other aerial images of various cities. The design model's purpose is to determine the optimal direction and dimension of a central courtyard in these cities or to forecast the shape of a destroyed central courtyard in the historical region of the cities represented in the datasets. It can also serve as a model for developing and reconstructing historical districts in heritage cities. For instance, in the historical region of Yazd, Iran, there is an empty field. The proposed design model in this study will predict the courtyard's optimal direction, dimension, and area, based on thousands of samples taken throughout Yazd. It predicts these features through the use of data science prediction models, such as linear regression ( Figure 16). The results reveal a strong correlation between the length and width of central courtyards in all cities, indicating a distinct pattern in the design of central courtyard houses throughout Iran's history.

Conclusions
The features of central courtyards were extracted in this study, in order to collect big data for future research. The flow chart in Figure 6 indicates that this study is divided into four distinct steps. These steps are as follows: (1) preprocessing VHR aerial images for object detection and detection of courtyards using a faster RCNN model; (2) segmentation of the objects in the courtyards using mask RCNN; (3) extraction of features associated with each object and collection of big data; and (4) data analysis and data science of the dataset to discover mathematical relations between features. In summary, the procedure begins with extremely high-resolution aerial images and concludes with a vast collection of 26,437 samples, each of which contains 76 geometric and climatic features. Additionally, the final data analysis models revealed several relations and correlations between the final dataset features. It demonstrated that, while the direction and dimension ratios of central courtyards vary across cities and climatic zones, they all serve a purpose. Furthermore, the proposed design model can guide researchers and designers working in historic cities. The model can predict and identify the optimal form of a central courtyard in a historic region based on thousands of courtyard samples. In summary, this paper postulates that machine learning techniques used in computer vision and data science are highly beneficial for studying historic architecture. This research demonstrates that it is possible to study over 26,000 courtyard samples in a dataset using only a single computational method.
In this research, the authors faced several technical and practical challenges, and in order to minimize the challenging areas and consider the available computational resources, some limitations were considered. Since detecting water places and green areas needed further tuning, and this task was not among the main goals of this research, the authors did not focus on this part. Apart from that, courtyards located between two subdivided images were not considered in the dataset as it reduced the model's accuracy. Finally, the VHR aerial images do not cover all historical cities in Iran or the Middle East, which is why a limited number of cities could be selected as the case study.
For future studies, extending the dataset over the borders of Iran would be recommended to study the impact of this style of architecture on other Middle Eastern countries. Moreover, detecting more details in the courtyards, such as their height, could add value to this dataset. In terms of analyzing the dataset, focusing on each dual relationship of features can reveal details about the principles of the central courtyard's design in this style of architecture.