Analyzing Land Shape Typologies in South Korean Apartment Complexes Using Machine Learning and Deep Learning Techniques

: In South Korea, the con ﬁ guration of land parcels within apartment complexes plays a pivotal role in optimizing land use and facility placement. Given the signi ﬁ cant impact of land shape on architectural and urban planning outcomes, its analysis is essential. However, studies on land shape have been limited due to the lack of de ﬁ nitive survey criteria. To address these challenges, this study utilized a map application programming interface (API) to gather raw data on apartment complex layouts in South Korea and processed these images using a Python-based image library. An initial analysis involved categorizing the data through K-means clustering. Each cluster’s average image was classi ﬁ ed into four distinct groups for comparison with the existing literature. Shape indices were employed to analyze land con ﬁ gurations and assess consistency across classes. These classes were annotated on a parcel level using the Robo ﬂ ow API, and YOLOv8s-cls was developed to classify the parcels e ﬀ ectively. The evaluation of this model involved calculating accuracy, precision, recall, and F1-score from a confusion matrix. The results show a strong correlation between the identi ﬁ ed and established classes, with the YOLO model achieving an accuracy of 86% and demonstrating robust prediction capabilities across classes. This con ﬁ rms the e ﬀ ective typi ﬁ cation of land shapes in the studied apartment complexes. This study introduces a methodology for analyzing parcel shapes through machine learning and deep learning. It asserts that this approach transcends the con ﬁ nes of South Korean apartment complexes, extending its applicability to architectural and urban design planning on a global scale. Analyzing land shapes earmarked for construction enables the formulation of diverse design strategies for building placement and external space arrangement. This highlights the potential for innovative design approaches in architectural and urban planning worldwide.


Introduction
In South Korea, a parcel is defined as an independent land unit depicted on the cadastral map, an official document detailing the location, shape, and area of land parcels.Each is assigned a unique identifier, providing essential data for understanding ownership, usage conditions, purpose, and boundaries.Parcels are primarily used in real estate transactions, land use planning, and legal disputes.For land use planning, parcel information is crucial for assessing a land's purpose and potential for development during urban and architectural planning [1].Historically, regulations such as road diagonals dictated building shape and density according to road frontage dimensions.These regulations have since been abolished, increasing the influence of land shape on building form and density [2].
Parcel shapes in South Korea pose challenges due to the difficulty in logically distinguishing them and the ambiguity of survey standards, leading to research uncertainties.It is hypothesized that parcel shapes can be classified into patterns that combine geometric structure with the effective area, allowing for analysis through characteristic patterns by organizing geometric land information into structured sequences [3].
While newly developed or restructured areas typically feature regular-shaped parcels, irregular parcels remain common in older downtown districts.Previously, redevelopment and land readjustment efforts often disrupted and reorganized existing urban spatial structures into regular shapes.However, current trends favor gradual, restorative development, like urban regeneration, which preserves existing parcel shapes.Understanding the effects of parcel shapes on architectural forms and densities, particularly how irregular shapes influence building behaviors, is increasingly vital.
In South Korean apartment complexes, the parcel is integral to the design and development process, determining land use efficiency and optimal facility placement, as well as managing and preserving environmental elements such as green spaces.Parcels, which legally define land boundaries, are essential in obtaining building permits, transferring property, and conducting real estate transactions.The shape of the parcel significantly influences architectural and urban planning, affecting development conditions and necessitating a careful analysis of parcel typology to tailor development plans accordingly.
Despite its importance, research on parcel typology is limited, largely due to the complexity and resource-intensive nature of analyzing distinctly labeled data.To address these challenges, this study proposes using unsupervised learning techniques to efficiently organize large datasets into meaningful groups without prior labeling, followed by a detailed analysis and typification using supervised deep learning methods.This study lays the groundwork for typifying and analyzing unlabeled raw data in architecture and urban design.It presents a methodology employing machine learning for data classification and deep learning for analysis and verification.Additionally, it proposes that this approach extends beyond South Korean apartment complexes, holding promise for global application.The analysis of land shapes can inform diverse design strategies for building and external space layouts tailored to site configurations.This suggests the potential for evolving design strategies worldwide based on the landform analysis.
This study aims to classify parcel shapes in South Korean apartment complexes using machine learning, building upon previous studies to develop a deep learning model that accurately identifies and categorizes these shapes.This study departs from traditional methodologies that heavily relied on visual inspection for parcel analysis.Through the utilization of artificial intelligence, including machine learning and deep learning, this research mitigates the subjectivity of researchers, enabling an objective analysis.This approach sets it apart from previous studies, signifying a significant advancement in the field.The goal is to enable the detailed analysis of apartment complex designs based on parcel type, facilitating strategic planning for building arrangements and external spaces.

Classification of Parcel Shapes
According to domestic guidelines, parcel shapes are classified into six categories: square, horizontal rectangle, vertical rectangle, trapezoid, irregular, and sack, with irregular parcels being difficult to categorize into a specific shape [4].A square parcel closely resembles a square, a horizontal rectangle has its longer side along the road, a vertical rectangle has its shorter side along the road, a trapezoid mirrors the geometric shape, and a sack-shaped parcel is either narrow or shaped like an inverted triangle or trapezoid, with the point meeting the road.
Irregular parcels are defined as those with shapes or triangular configurations where over one-third of the area is deemed lost according to the minimum bounding box standard.The area within the minimum bounding box that constitutes the effective area is distinguished from the lost area.Prior studies have utilized machine learning to typify such irregular parcels, thereby enabling automatic rule learning from data without humancoded rules [5,6].These studies employed the K-means algorithm from the scikit-learn library, a leading machine learning tool in Python, to analyze 500 sample parcels, categorizing them into six types of irregular shapes.
Unlike previous research that manually measured and categorized land into six shapes, this study uses K-means clustering to objectively classify 3000 apartment complex parcels, further verifying the classifications through deep learning, therefore marking a significant advancement over earlier efforts.

Analysis of Parcel Shapes
Numerous studies have examined the irregularity of parcels by utilizing indices that quantify this irregularity, predominantly employing the shape index (SI).The SI is calculated based on the ratio of a shape's perimeter to its area, with higher values indicating increased irregularities as the perimeter lengthens disproportionately compared to the area [3,7].Additionally, research has considered previously defined parcel types that incorporate the minimum bounding box, applying the standard index (STI).The STI measures parcel irregularity by dividing the parcel's area by that of its minimum bounding box, producing values between 0 and 1.Values approaching 1 signify a regular shape, whereas values nearing 0 denote greater irregularity.Furthermore, the width-depth ratio (WR) of the minimum bounding box has been employed to assess parcel shapes, with lower values suggesting shapes closer to a square [8].In research analyzing parcel shapes, a method was proposed to develop a Parcel Shape Index (PSI).This index utilizes parameters such as perimeter, area, internal angles, and the number of external points to assign scores, differentiating parcels based on their shapes.However, this approach entails a complex computation process to apply the classification system, posing efficiency limitations [9].

Machine Learning in Architecture and Urban Studies
In the realm of machine learning methodologies, these are categorized into supervised and unsupervised learning.Supervised learning has been applied in architecture and urban planning to predict real estate indices [10], estimate building energy consumption [11], determine occupancy rates [12], and simulate land use changes [13].Conversely, unsupervised learning involves techniques such as clustering, which groups similar entities together and is particularly useful in handling unlabeled data [14].Within unsupervised learning, methods extend beyond K-means clustering to include DBSCAN, hierarchical clustering, and spectral clustering.Among these, K-means clustering is favored for its simplicity, rapid execution, and robust performance [15].It groups similar objects together, aiding in the classification of subjects such as apartment building shapes [16], disaster intensities [17], and land use categories [6].
Furthermore, research aimed at enhancing efficiency in architecture and urban management through machine learning includes studies on various approaches.These include investigations into detecting road potholes and their locations [18]; comparing the effectiveness of machine learning algorithms such as support vector machine (SVM), Maximum Likelihood (ML), and Random Trees (RT) for extracting impervious surfaces in residential complexes and illustrating their spatial distribution [19]; and developing a machine learning technique called Building Detection with Shadow Verification (BDSV) based on high-resolution satellite images to automatically detect buildings within urban areas [20].These studies explore the applicability of machine learning in the field of architecture and urban planning and contribute to solving relevant problems.

Research in Urban and Architectural Fields Using Deep Learning
Studies in urban and architectural engineering increasingly leverage deep learning, reflecting a notable rise in research from 2010 to 2020.This surge in artificial intelligence, machine learning, and deep learning applications, particularly in structural engineering, utilizes algorithms such as artificial neural networks (ANNs), genetic algorithms (GAs), genetic programming (GP), and support vector machines (SVMs).While these studies generally lack a defined optimal dataset size, they typically involve dividing the dataset to train models.Performance metrics used include the correlation coefficient (R), mean squared error (MSE), and root mean squared error (RMSE) [21].
Additionally, deep learning has spurred innovative changes in evaluating the durability of building materials and the structural integrity of infrastructure.For example, research employing deep convolutional neural networks and particle swarm optimization to predict the compressive strength of cement-based materials exposed to sulfate in marine environments [22] and studies utilizing an ensemble of CNNs and data fusion techniques to assess corrosion and coating defects at coal handling and preparation facilities [23] have overcome the limitations of traditional assessment methods.These studies offer higher accuracy and efficiency, making significant contributions to advancements in the field of structural engineering.
In architectural planning and design, deep learning is extensively used to analyze two-dimensional images like floor plans, with convolutional neural networks (CNNs) demonstrating exceptional ability in this area.This capability is particularly advantageous for studies that convert architectural spaces into two-or three-dimensional grid points for further analysis.Research trends indicate that applications like YOLO have been instrumental in constructing spatial relationships within architectural spaces.These studies train YOLO models on house floor plan images to identify components such as rooms, doors, and windows, thereby forming spatial relationships depicted in bubble diagrams [24].
Following the recognition of floor plans through deep learning, extensive drawing information is interpreted into graph diagrams.Algorithms are developed to automatically extract information from these plans, transforming it into graph form.This process aids in creating datasets that facilitate the planning stages of architectural design, suggesting models that recommend spatial relationships [25].Additionally, GAN-based methodologies are utilized to segment site areas, design layout plans, and propose furniture arrangements [26].Bubble diagrams also serve as inputs for GANs to generate room masks, or nodes, within floor plans and to artificially create interior designs [27].

Scope of Apartment Complexes
Regarding data sources, this study relies on information from the Public Housing Management Information System (K-apt) [28], which provides extensive details on multiunit housing complexes with 100 or more households, as required by South Korean public housing regulations.These regulations mandate the disclosure of management fees and other critical data, including the address, completion year, number of households, buildings, and floors for over 18,000 complexes.For this study, we focused on apartment complexes with at least 300 households that are legally required to include facilities like senior citizen centers, playgrounds, daycare centers, and community facilities [29].From this dataset, 13,264 apartment complexes meeting these criteria were selected as the study population for analyzing apartment complex layouts.

Parcel Shape Types
According to prior studies, parcel shapes are distinguished between regular and irregular forms (see Figure 1).Regular types are identified as square, horizontal rectangle, vertical rectangle, trapezoid, irregular, and sack.These classifications were initially based on visual inspections conducted by government agencies and were later validated through comparisons using indices such as the SI, STI, and WR.On the other hand, irregular parcel types, including Avocado, Potato, Corner, Bell, Stick, and L-shape, were categorized using K-means clustering on a dataset of 500 sample parcels.The optimal number of clusters (K) was established via the Elbow method, identifying the point where the graph shows a significant bend as the most suitable value.Following the determination of 20 clusters, the average shape image of each cluster was visually analyzed, and types were assigned, with subsequent validations through SI, STI, and WR comparisons [30].

Image Acquisition
In the context of apartment layouts, KakaoMap (see Figure 2) [31], a mapping application programming interface (API) similar to Google Maps used in South Korea [32], facilitated the collection of various details such as complex boundaries, main building layouts, playgrounds, badminton courts, waterfront areas, roadways, sidewalks, and parking lots.The addresses of each apartment complex were input into KakaoMap, enabling the capture and preservation of images to secure map files for each complex.These maps were then refined by cropping out the boundaries of the apartment complexes as depicted on the map, thereby focusing solely on the layouts of each complex.This process utilized a database previously compiled by the same researcher; further details are available in Yoon, S. B. (2024) [33].

Image Processing
The map service indicates boundaries corresponding to each address as pink areas, based on national cadastral maps.It was observed that these boundaries slightly deviate from the actual site boundaries.In instances where the address-based areas did not accurately define the complex boundaries, these discrepancies were exception-handled and data refinement was conducted.As a result, layouts for 3000 apartment complexes were securely acquired.According to prior studies, these images require normalization for application in K-means clustering.This process involves converting image pixel values to a standardized resolution of 100 × 100.However, for subsequent applications in deep learning models, the images were resized to a more detailed resolution of 640 × 640.Although this resizing might increase memory usage in K-means clustering, it is essential for preserving detail and enhancing the accuracy of detailed clustering performance.It also improves performance in deep learning models.
The image resizing step is critical for standardizing the data format by centrally positioning the parcel and setting the image boundary to a bounding box, crucial for calculating the minimum bounding box in future analyses of parcel shapes.Initially, the parcel images, which were in vector format, were converted into raster images made up of pixels.To normalize the image data, which ranged from 0 to 255, the pixel values were divided by 255, converting them into a numpy matrix of positive values.This normalization process, a common preprocessing technique in image processing, was employed to transform the data into a format containing 10,000 pixel values per parcel, ultimately producing 3000 black-and-white images [5].The Python image library (PIL) was utilized for this data acquisition.
Following the data analysis, the performance of several clustering methods-Kmeans clustering, hierarchical clustering, DBSCAN, and spectral clustering-was compared.K-means clustering emerged as the most effective method among the options evaluated due to its simplicity and efficiency.Leveraging the scikit-learn library, a widely used Python machine learning library, K-means clustering was implemented to categorize the parcels into K clusters.Average images for each cluster were then derived, which facilitated the classification of the parcels according to the types identified in previous research.
To develop a deep learning model capable of learning and classifying data by parcel type, each image was labeled according to its identified parcel type.For the classification, the YOLOv8 network model was employed, with original images annotated using the Roboflow API [34].A total of 3000 apartment complex parcel images were categorized into six classes-Bell, Corner, Potato, Square, Stick, and Trapezoid-reflecting parcel types identified in previous research as well as findings from this study.To accommodate the requirements of the pre-trained models, the images were resized to 640 × 640 and expanded to enhance the dataset size, thus helping to prevent overfitting.The dataset was proportionally divided into a training set, testing set, and validation set, with ratios of 7:2:1, respectively.These image (see Figure 3) preprocessing steps ensured the dataset was adequately prepared for subsequent deep learning tasks.

Research Methods
This study aimed to typify and analyze the shapes of apartment complex parcels based on the methodology illustrated in Figure 4.In summary, the study examined the classification system and analytical methodology for parcel shapes, drawing on prior research.Raw data underwent processing to train machine learning and deep learning models.K-means clustering was employed to typify the shapes of apartment parcels, which were subsequently trained on a YOLO model to analyze and validate the parcel shapes.The detailed analysis method is as follows.

Image Clustering
Image clustering involves grouping images into clusters by identifying inherent patterns or structures within the data without the use of pre-assigned labels.This process is intended to elucidate the structure of an image dataset or classify images based on specific characteristics.Several clustering methods are typically employed, including K-means clustering, hierarchical clustering, DBSCAN, and spectral clustering, each with its own set of strengths and weaknesses.K-means clustering is noted for its simplicity and efficiency, even with large datasets, although it requires the number of clusters to be predetermined.Hierarchical clustering generates a dendrogram presenting a hierarchical structure of clusters but is computationally intensive and inflexible once clusters are defined.DBSCAN, which does not presuppose the number of clusters or their shapes, is sensitive to parameter settings and may perform inconsistently across datasets with varying densities.Spectral clustering excels in identifying clusters of diverse shapes and is suitable for nonlinear data structures, yet it demands significant computational resources, and the choice of parameters critically influences its performance.In this study, after evaluating the advantages and disadvantages of these methods, K-means clustering was selected as the most fitting due to its straightforwardness and efficiency.
In this study, the Elbow method was employed to determine the optimal K value for K-means clustering, addressing one of the method's inherent challenges-the difficulty in selecting the ideal number of clusters.This method utilizes the point at which the slope of the graph markedly changes, or "bends", as an indicator of the appropriate K value [5].However, pinpointing a definitive bending point proved challenging, leading to the selection of 20 as the most suitable K value after multiple trials.Through K-means clustering, 3000 images were effectively categorized into 20 clusters.Consistent with prior research, average images for each cluster were generated, facilitating the typification of regular and irregular parcel shapes and enabling the initial clustering of unlabeled image data.In the K-means clustering process, key parameters include the number of clusters (K), initialization method, maximum number of iterations, distance measurement method, and tolerance.The number of clusters (K) was optimized using the Elbow method, which assesses the rate of decrease in the cost function for each cluster number to determine the appropriate K value.For initialization, the K-means++ method was employed, which considers the distance between data points to achieve more stable results.Furthermore, the maximum number of iterations was set at 300, the Euclidean distance was utilized as the distance measurement method, and the tolerance was set at 0.0001 for clustering.These parameter settings were implemented using the features of the scikit-learn library.They were determined as the optimal experimental conditions through multiple experiments.

Image Classifier
Image classification is commonly executed using pre-trained models, drawing on vast image databases like ImageNet [35], COCO dataset [36], and CIFAR [37].ImageNet, for instance, features more than a thousand varied classes and houses over a million images, offering a rich resource that significantly enhances the generalization capabilities of models.Pre-trained models on this dataset include ResNet [38], VGGNet [39], EfficientNet [40], GoogleNet [41], MobileNet [42], and YOLO [43], each engineered with distinct attributes and applications.While ResNet and VGGNet are characterized by a high number of parameters, rendering them heavier and slower, other models like EfficientNet and Goog-leNet suffer from high computational demands despite their sophistication.MobileNet, though advantageous for mobile and edge device applications, may struggle with more complex tasks.YOLO stands out as a One-Stage Detector that facilitates real-time object detection and classification, boasting ease of training and scalability across diverse datasets, which makes it particularly adaptable for modifications and customizations.
Given the vast volume of images and the need for rapid inference, the YOLO model was chosen for its efficiency in processing and its capability to analyze objects of varying sizes swiftly and effectively.

You Only Look Once (YOLO) Model
YOLO typically features a standard structure for object detectors that comprises two main components: the head and the backbone.The backbone is tasked with extracting feature maps from images, which are essential for classifying objects.The head, on the other hand, focuses on localizing objects by interpreting the feature maps produced via the backbone [44].Originally, YOLO was designed primarily to identify objects and their locations within images.However, the latest iteration, YOLOv8, has seen significant enhancements in terms of its structure and architecture.These improvements make YOLOv8 more user-friendly, fast, and accurate, thus broadening its applicability to a range of tasks including object detection, image segmentation, and classification.
During the planning and execution phases of this study, the latest version available was YOLOv8, which incorporated the most advanced technology and had been verified for stability.While newer versions such as YOLOv9 and YOLOv10 have since been released, YOLOv8 was utilized for this research.
One of the standout features of YOLOv8 is its scalability.It is designed as a comprehensive framework that supports all previous versions of YOLO, making it easier for users to transition between different versions and compare performance effectively.Key advancements in YOLOv8 include a new backbone network, an anchor-free detection head, and a novel loss function (see Figure 5).These features enable YOLOv8 to execute efficiently on a variety of hardware platforms, from CPUs to GPUs, providing enhancements in speed and accuracy compared to its predecessors [45,46].
For the purposes of this study, the YOLOv8x-cls model trained on ImageNet-a repository that includes over 1000 classes-was considered.Among the available models, YOLOv8s-cls was selected.It is the second fastest model in the series and operates with relatively fewer parameters, making it well suited for the high-performance classification needs of this study, particularly in analyzing the diverse characteristics of parcel shapes in apartment complexes [45].

Analysis of Parcel Shapes
Following the classification of parcel shapes, the differences in related indices were analyzed.The indices were calculated using methods from previous research applied to the newly classified types.One such index is the SI, calculated as follows (Equation ( 1)): the perimeter length of the parcel is divided by the square root of the area, then multiplied by 0.25.A higher SI value indicates a greater irregularity in shape, as it suggests a longer perimeter relative to the area, thereby pointing to a higher degree of irregularity.
The STI is a measure of parcel irregularity, calculated as shown in Equation ( 2): the area of the parcel divided by the area of its minimum bounding box.This index ranges from 0 to 1, with values approaching 1 indicating more regular shapes, and those closer to 0 denoting more irregular shapes.

𝑆𝑇𝐼
. ( The WR is defined in Equation (3) as the aspect ratio of the minimum bounding box, calculated by dividing the width by the depth of the parcel.This ratio approaches 1 as the parcel becomes more square-like and increases as the shape elongates.
To calculate SI, STI, and WR, the Python OpenCV library was utilized to compute the area and perimeter of the images and specific code was developed to determine the bounding boxes that encapsulate each parcel (see Figure 6).Based on these computations, average values of these indices for each type of parcel were derived and subsequently analyzed.

Evaluation Metrics Accuracy
The evaluation of the classification model's performance utilized a confusion matrix, which is a tool that helps represent the number of data points correctly classified by the model in comparison to the actual categories.As illustrated in Figure 7, this matrix includes a TP in the first row and first column, indicating a true positive (TP) with the prediction of '1' and an actual category of '1'; a TN in the second row and second column, representing a true negative (TN) with both the prediction and actual value of '0'; an FP in the first row and second column, indicating a false positive (FP) with the prediction of '1' but an actual value of '0'; and an FN in the second row and first column, indicating a false negative (FN) with the prediction of '0' but an actual value of '1'.
The confusion matrix is instrumental in calculating key performance metrics such as accuracy, sensitivity, precision, and F1-score.Each of these metrics ranges between 0 and 1, with a precision and sensitivity of 0.8 or higher considered indicative of high performance.A small difference between these metrics suggests that the model does not have significant issues with bias or variance.An F1-score of 0.9 or higher is typically regarded as exemplary, denoting an excellent classification model [47].Accuracy, defined as the ratio of correctly classified data to the total dataset, serves as a fundamental measure of the model's overall performance.

𝑇𝑃 𝑇𝑁 𝑇𝑃 𝑇𝑁 𝐹𝑃 𝐹𝑁
In this study, precision is defined as the proportion of correct positive predictions made using the model, reflecting its reliability in making accurate positive identifications.Meanwhile, recall (also known as sensitivity) measures the proportion of actual positives the model correctly identifies relative to the total number of actual positives.These metrics are crucial for assessing the model's accuracy and completeness, respectively.The formulas for calculating these scores are detailed in Equations ( 2) and ( 3).The F1-score, which is the harmonic mean of precision and recall, is particularly useful for evaluating model performance when dealing with unbalanced datasets.A high F1-score is indicative of robust model performance, balancing both precision and recall effectively.

Results and Discussions
For this study, apartment complexes with at least 300 households were selected.Using a map API, the back data of complex layouts were retrieved and subsequently transformed into rasterized, black-and-white images.These images were then converted into numerical numpy arrays through a normalization process to prepare a dataset for classifying parcel shapes.The shapes were initially typified using K-means clustering, and then categorized into four classes-Avocado, Potato, Trapezoid, and Stick-based on both previous research and findings from this study.Parcel shapes were labeled using the Roboflow API data annotation tool.Subsequently, a model for analyzing the apartment complex parcel dataset was developed based on the YOLOv8s-cls model, and the research findings are presented in the following subsections.

K-Means Clustering
The optimal K value for K-means clustering was determined using the Elbow method (see Figure 8), a technique used to identify the number of clusters in which the addition of another cluster does not give much better modeling of the data.This method indicated a lack of a clear bending point, prompting the exploration of various K values-20, 30, 50, and 100-with 20 ultimately being selected as the most appropriate for classifying parcel shapes.The analysis of the average images derived from each cluster highlighted that similar parcel shapes often differed based on their orientation, an observation attributed to pixel positioning, which varies based on image orientation.Table 1 summarizes the shapes and numbers (N) of parcels in each cluster.This detailed classification and the derived images aid in our understanding of parcel shape variability within the dataset.

Clusters Results of K-Means Clustering Center Images
Cluster

. Parcel Shape Classification and Analysis
The analysis of average images derived from K-means clustering revealed four distinct parcel types (see Table 2), refining the classification from previous research that distinguished between five regular and six irregular types.Observations indicate that, unless an area is newly developed or extensively restructured, parcels generally exhibit irregular shapes, with only a few closely resembling regular forms.Moreover, the categorization of irregular shapes in earlier studies showed some ambiguity; for instance, irregular types such as Avocado, Potato, Corner, Bell, Stick, and L-shape were identified, yet only a small number of parcels actually conformed to the Bell (48 parcels) and L-shape (13 parcels) classifications.The Bell type closely resembles the Potato type in shape, and the L-shape is predominantly elongated, akin to the Stick type.The Corner type, which shares a similar WR with the Avocado but is more asymmetric, can be reasonably grouped with the Avocado due to shared characteristics.
Consequently, this study streamlined the categorization of irregular parcel shapes from previous studies into three types and introduced a new type, close to regular, named Trapezoid.The analysis of the characteristics of the four parcel shapes derived from K-means clustering was conducted by comparing the average values of SI, STI, and WR for each type, as presented in Table 3.
SI revealed that the Stick type exhibited the highest average SI at 1.321, indicating the greatest degree of irregularity due to more convolutions in its shape.This was followed by the Avocado type at 1.194, Potato at 1.129, and Trapezoid at 1.104.Higher SI values suggest more complex shapes with greater irregularities.
The STI compares the parcel area to the area of the minimum bounding box, with higher values closer to 1 indicating shapes that are more rectangular.The Trapezoid type displayed the highest average STI at 0.845, signifying its closer resemblance to regular shapes.This was followed by Potato at 0.801, Avocado at 0.729, and Stick at 0.687.Notably, the Stick type was identified in previous research as having a loss area greater than 30%, indicating significant irregularity.The Avocado type also showed notable irregularity with a 27.1% loss area.
The WR is a key index used to describe the aspect ratio of the minimum bounding box and serves as an indicator of shape elongation.A higher WR value suggests a more elongated shape, while a value closer to 1 indicates a more square-like configuration.In this analysis, the Stick type displayed the highest WR at 2.269, marking it as the most elongated among the types.This was followed by the Avocado type at 1.619, which is less elongated but still significantly non-square.The Trapezoid and Potato types showed WR values of 1.329 and 1.278, respectively, indicating that these types are bulkier and closer to square than the others.
The application of machine learning-based shape classification has demonstrated that the identified parcel shapes maintain a consistent relationship with established shape indices.The next step in this research is to validate these classified types through deep learning models specifically tailored to these four classes.In a series of experiments utilizing the YOLOv8 model, an optimized model configuration was achieved, as depicted in Figure 9.The model was configured with approximately 0.5 million parameters, a balance that optimizes both efficiency and performance.The computational cost was quantified at 12.6 GFLOPs, and the learning rate was set at 0.01, paired with a momentum of 0.937 to model convergence and stability.Additional settings such as a weight decay of 0.005 and a batch size of 16 were implemented to enhance model generalization performance and learning stability.
The performance of the developed classification model was systematically evaluated using a confusion matrix, visualized in Figure 9.The resulting metrics-accuracy, precision, recall, and F1-score-were computed based on the matrix data.The model achieved a commendable accuracy of 0.86, indicating a strong performance.Precision in this context quantifies the accuracy of positive predictions made by the model, while recall measures the model's ability to identify all relevant instances correctly.The F1-score, being the harmonic mean of precision and recall, is a critical measure that reflects the balance between these two metrics.A high F1-score in this study suggests that the model performs well in both precision and recall, confirming its effectiveness in classifying parcel shapes based on the identified categories.
The performance metrics for each parcel type as classified by the YOLOv8 model demonstrate robust accuracy and a balanced evaluation between precision and recall, reflecting high performance across the board (see Table 4).
For the Avocado type, the model achieved a precision of 0.86, meaning that 86% of parcels classified as Avocado by the model were correct.The recall of 0.85 indicates that 85% of actual Avocado-type parcels present in the dataset were successfully identified by the model.The corresponding F1-score of 0.85 highlights a balanced performance between precision and recall, illustrating the model's effectiveness in classifying this particular shape with high reliability.
For the Potato type, the precision was slightly higher at 0.87, suggesting that 87% of predictions made by the model for the Potato type were accurate.The recall also stood at 0.86, indicating that 86% of all Potato-type instances were correctly detected.An F1-score of 0.86 further indicates that the performance between precision and recall is evenly matched, showing no significant discrepancy and suggesting robust classification capabilities for this type.The Trapezoid type showed equal precision and recall values of 0.87.This indicates that 87% of parcels predicted as Trapezoid were correctly identified, and 87% of all actual Trapezoid parcels were detected by the model.The F1-score of 0.87 underscores a highly balanced performance, reflecting the model's capability to classify this regular-approaching shape with high accuracy.
Lastly, the Stick type exhibited a precision of 0.84 and a slightly higher recall of 0.89.This means that while 84% of parcels identified as Stick by the model were correct, the model was able to detect 89% of all actual Stick-type parcels, indicating a slightly greater sensitivity.The F1-score of 0.87 shows that despite the minor difference between precision and recall, the overall performance is still highly effective, demonstrating the model's strength in identifying this more elongated parcel type.
Overall, these metrics not only reflect the individual strengths of the model in recognizing each parcel type but also emphasize the model's overall efficacy in handling a variety of shapes with high precision and recall, confirming its suitability for complex classification tasks in urban planning and architecture.

Test Data Results
This study divided the dataset into training, validation, and test sets in a 7:2:1 ratio.Excluding the test set, the YOLOv8s-cls model was trained and optimized, yielding results as detailed in Section 4.2.1.The derived confusion matrix and evaluation matrix confirmed that the model effectively classifies each class.To analyze and discuss the reliability and generalization ability of the developed model, results from applying the unutilized test data in model training were examined.
Out of the total 3000 dataset items, 300 (10%) comprised the test set applied to the YOLOv8s-cls model, and the outcomes were visualized in a confusion matrix as shown in Figure 10.Based on this, the accuracy, precision, recall, and F1-score for the test set were calculated (see Table 5).The accuracy achieved was 0.86, consistent with the results obtained from the training and validation sets.A comparative analysis of each class's evaluation results from the training and validation phases showed that, while precision for the Potato type decreased slightly from 0.87 to 0.86, recall rates for Avocado and Stick types were 0.84 and 0.87, respectively, and the F1-score for the Stick type was 0.85-marginally lower than the training and validation results but still maintaining a comparable level.
The findings from the test set indicate that the model's performance on new, unseen data is similar to the outcomes based on the training and validation data.Although some metrics for specific types were slightly lower, the differences were minimal, suggesting that the model has secured a strong capability to generalize.This demonstrates the model's consistent performance and robustness.

Summary
This study aimed to classify parcel shapes in South Korean apartment complexes using machine learning techniques informed by prior research and to refine these classifications using a deep learning model.The initial data collection was performed using a map API, with the data being categorized into 20 clusters via K-means clustering.These clusters were subsequently refined into four classes based on comparisons with average shapes and previous typologies.The classes were then validated using SI, STI, and WR as analytical methods, assessing the irregularity and convexity of each shape.
For model evaluation, the YOLOv8 classification model was trained with these labeled data and assessed using precision, recall, and F1-score metrics, which collectively demonstrated high accuracy and predictive performance.Despite the high performance of the model, it is important to acknowledge the possibility of errors in classifying land shapes.Some shapes may not clearly belong to any of the predefined classes, leading the model to predict the class with the highest probability.In such cases, the potential for classification across multiple classes should be considered.To address these challenges, enhancing the diversity of training data, optimizing parameters, utilizing high-performance YOLO models, and applying data augmentation techniques are crucial for improving model performance.Furthermore, analyzing the limitations of the dataset and its impact on model performance is essential for exploring new research directions to overcome the model's constraints.K-means clustering presents limitations, notably the requirement to predefine the K value.Determining an optimal K value can be challenging without a clear inflection point, as observed in this study, potentially leading to time-consuming trials.However, clustering unlabeled, computer-analyzed data continues to offer significant time and resource efficiencies, particularly when human analysis is impractical.However, interpreting these data from an architectural and urban planning perspective still necessitates professional expertise, affirming that final shape classification remains a researcher's responsibility.
Contrary to previous studies that categorized parcel shapes into eleven types, this study simplified the classification to four types.Although this reduction may appear overly simplistic, it is justified by various experiments.For example, training the deep learning model initially with 20 clusters without expert input resulted in low accuracy.Subsequent re-clustering and further analysis failed to improve performance, indicating that a more streamlined classification could be more effective.
This finding underscores the necessity of expert insight and analysis.Moreover, models trained with classifications identical to those from previous studies exhibited low performance, likely due to differences in the study subjects.This highlights the need for new research and analyses tailored specifically to apartment complexes, as successfully demonstrated in this study where optimal experimental conditions led to a streamlined classification of parcel shapes into four distinct classes.
This study is noteworthy for acquiring extensive data on parcel shapes in South Korean apartment complexes and for developing a novel methodology that utilizes machine learning to classify unlabeled raw data and deep learning to validate these classifications.The use of machine and deep learning in analyzing parcel shapes enhances time efficiency and objectivity during the planning and development stages of apartment complexes, providing a data-driven approach to architectural and urban planning.
Future research should explore different apartment complex types based on these parcel classifications to develop strategic plans for the arrangement of main buildings and external spaces.Such studies could significantly advance strategic planning for apartment complexes.

Limitations
However, the study's scope was limited to using K-means clustering for analyzing parcel types and did not include a comparative analysis with other machine learning methodologies.Additionally, the deep learning analysis was conducted solely with the YOLOv8s-cls model, without comparison to other models.However, the two methodologies selected in this study have been confirmed through prior research to outperform other methods.DBSCAN and spectral clustering suffer from performance degradation in datasets with uneven density due to their sensitive parameter settings, and hierarchical clustering faces high computational costs and difficulty in modifying established clusters.This validates that K-means clustering, with its simplicity and efficiency, is more suitable for handling large volumes of data like those used in this research.Additionally, while there are various deep learning models such as ResNet, VGGNet, and EfficientNet, most require a significant number of parameters and have relatively slow processing speeds, complicating the analysis of large datasets due to their complex structures.In contrast, the YOLO model offers the flexibility to be modified and customized as needed, and with fewer parameters, it provides faster inference speeds, making it an ideal choice for efficiently analyzing objects of various sizes.Despite these limitations, this research marks a significant step forward in demonstrating how architectural and urban planning can benefit from advanced analytical methods that utilize unlabeled raw data.
Future studies will utilize the dataset from this research to conduct comparative analyses using upgraded versions such as YOLOv9 and YOLOv10, as well as evaluating other non-YOLO deep learning models.While the YOLOv8s-cls model proposed in this study demonstrated excellent performance under basic noise conditions, its robustness under specific conditions has not yet been fully verified.Therefore, future research will involve artificially introducing various types of noise, such as Gaussian and salt-and-pepper noise, to assess the model's response.Additionally, data augmentation techniques including geometric transformations, color modifications, and random cropping will be employed to enhance the robustness of the data, enabling performance evaluation across different noise levels.These efforts are expected to contribute to the development of models optimized for parcel shape classification.
Moreover, this approach will enable a comprehensive survey and analysis of the entire dataset of apartment complexes in Korea.By doing so, cross-sectional and regression analyses based on various indicators such as construction periods and regional differences will facilitate an in-depth analysis of trends and impact relationships in Korean apartment complexes.

Figure 1 .
Figure 1.Parcel shape types derived from previous research.

Figure 2 .
Figure 2. Apartment layout data using maps.

Figure 3 .
Figure 3. Example of data labeling by parcel shape type in apartment complexes.

Figure 4 .
Figure 4. Workflow of the proposed methodology: (a) Reviewing parcel shape classification and analysis methodologies based on prior studies.(b) Initial classification of parcel shapes within an apartment complex using K-means clustering, followed by type verification through YOLO classification, after processing the raw data.(c) Analyzing irregular shapes according to the parcel shape analysis methodology and evaluating the performance of the YOLO model.

Figure 6 .
Figure 6.Data detection example for parcel shape analysis.

Figure 7 .
Figure 7. Example of a confusion matrix.

Figure 8 .
Figure 8. Results of the Elbow method for determining the optimal K value.

Figure 10 .
Figure 10.Test data results of apartment complex parcel type.

Table 1 .
Parcel shape clustering results for apartment complexes.

Table 2 .
Parcel shapes classification for apartment complexes.

Table 3 .
Analyzing parcel shape types for apartment complexes.

Table 4 .
Evaluation of the parcel shape classification model for apartment complexes.

Table 5 .
Evaluation of test dataset of the parcel shape classification model.