1. Introduction
Social interaction is a key element in sustaining social structure and regular activities within educational processes. Supervision of social interaction between individuals reinforces both formal and informal processes in educational institutions. [
1,
2]. In consideration of space as an active role-player in encouraging human behavior, this study is constructed based on the Affordances theory [
3] and focuses on the potential of campus open spaces to trigger social interaction.
Within the framework of Affordance Theory, environments are approached with an ecological perspective and understood as relational ecosystems that provide users opportunities for action, beyond merely being a physical space. Building on this perspective, this study conceptualizes socio-spatial affordances as the capacity of spatial infrastructures to enable, arrange, or constrain social interaction through perceptible environmental cues.
In traditional interpretations, the affordances are mainly addressed at the individual scale, and the concept is referred to as social or socializing [
4,
5] and spatial affordances [
6]. However, the socio-spatial affordance approach emphasizes the repetitive behavioral and interactional patterns that emerge from spatial conditions. Thus, the term socio-spatial affordances will be used in this study to emphasize the relational structure of the theory. This conceptualization allows social interaction to be examined as a result of both spatial structure and social dynamics. Not only in terms of terminology but also operationally, the study reflects this multidimensional structure by bringing together the individual, behavioral, and spatial characteristics and addressing them within the relational infrastructure provided by Affordance Theory.
Asserting this relation in a time- and energy-efficient and systematic way, AI tools benefited in due form. The study aims to create an AI-based unified framework to (1) detect human behavior, (2) determine social interaction scores, and (3) reveal the correlation between social interaction scores and characteristics of the physical environment based on Affordance Theory [
3] in campus settings, as a complementary function of educational institutions.
The proposed framework is tested through a case study, in which context-specific empirical findings are targeted. This framework is intended to provide a replicable and transferable methodological approach for environment–behavior research across diverse physical and/or sociocultural contexts and under different climatic conditions. Therefore, the two main hypotheses of the research are as follows:
H1: The detection of human behavior, determination of social interaction scores, and exposure of the correlation between social interaction scores and spatial characteristics of the physical environment can be realized through artificial intelligence approaches.
- -
H1a:
The detection of human behavior can be realized through artificial intelligence approaches.
- -
H1b:
The determination of social interaction scores can be realized through artificial intelligence approaches.
- -
H1c:
The exposure of the correlation between social interaction scores and characteristics of the physical environment can be realized through artificial intelligence approaches.
H2:
The social interaction scores differentiate among different spatial features.
- -
H2a:
The social interaction scores increase in open spaces located closer to circulation axes and educational spaces as necessary activities of educational settings.
- -
H2b:
The social interaction scores have a significant relation with physical and spatial characteristics of open spaces.
AI is a broad field of research that aims to simulate human intelligence by enabling computers to perform cognitive processes such as perception, learning, and problem-solving. By virtue of its main sub-fields, such as machine learning (ML), deep learning (DL), and computer vision (CV), AI provides flexible frameworks that convert raw data into structured information and predictions. These models typically learn the numerical representation of data and map it into desired outputs. In recent years, AI has also contributed to considerable advancements in the field of architecture [
7]. Many practices, such as the production of architectural plan diagrams [
8,
9,
10], smart city planning [
11,
12], human behavior analysis, and human-building interactions [
13,
14], have been reshaped using ML models. ML has transformed architectural design by exchanging intuition-oriented decision-making with data-driven insights, optimizing space and energy efficiency [
15,
16,
17], and predicting performance [
18,
19]. AI-driven approaches form a strong basis for the reconfiguration of environment-behavior studies in architecture. Thus, the use of AI tools provides suitable conditions for accelerating, streamlining, and systematizing processes in these research fields. In this study, a unified quantitative research design is proposed based on three AI-based stages. Within the scope of the case study conducted at the ATU Faculty of Engineering building complex, the social interaction behavior of 746 individuals in campus open spaces was captured through systematic observation over six days and 18 observation cycles, each comprising 17 subspaces. The data were recorded using behavioral maps and detailed checklists. These behavioral maps were analyzed via the DL-based human detection approach. In the second stage, a behavioral coding approach was developed, and the social interaction scores of the participants were calculated based on the determinants of social interaction retrieved from related studies in the field, accordingly [
20,
21,
22]. Distance-based Interaction Feature Construction (DIFC) and Interaction Score Prediction were designed for the prediction of the social interaction classes via an ML approach. In the last stage, all subspaces were classified by spatial quality, and both Spatial Feature Selection and Spearman’s rho Correlation were used to identify the socio-behavioral aspects associated with spatial quality, thereby triangulating the findings and validating the proposed methodology.
Major contributions of this study are classified into three main themes, including (1) theoretical, (2) methodological, and (3) empirical contributions:
Theoretical Contribution: This study contributes to the theoretical framework by evaluating the socio-behavioral affordances of campus spaces utilizing the Affordances theory [
3] and suggesting the term socio-spatial affordance accordingly.
Methodological Contribution: While existing methods employ manual and rule-based hierarchical structures built upon AI-derived data for social interaction prediction and spatial analysis, the proposed approach provides a fully automated, artificial intelligence-based unified framework. This methodology allows behavioral data to be analyzed in a time- and energy-efficient manner using artificial intelligence techniques, thereby yielding more comprehensive, confidential, and reproducible results.
Empirical Contribution: The study contributes to the applied practices of environmental design fields through empirical evidence revealing the spatial characteristics that promote social interaction in educational settings.
The rest of the paper is organized as follows:
Section 2 summarizes related works.
Section 3 provides the dataset and the proposed approach.
Section 4 and
Section 5 present the experimental results and discussions. Finally,
Section 6 outlines the conclusion and future works.
3. Materials and Methods
In this study, an AI-based unified framework is proposed as the primary methodology for modeling social behavior in relation to spatial features, while conventional data-gathering and analysis methods are employed as a baseline for comparison and validation. The AI-based exploratory approach served for the investigation of new measurement methodologies for the presented features and the relations between them in order to prevent the time-consuming nature of conventional behavioral approaches. Multiple AI approaches benefited from the three-phase methodology, and the model diagram of the proposed framework is shown in
Figure 2.
Within this methodology, the raw data are evaluated using both conventional statistical approaches and AI-based unified autonomous approaches. To verify the proposed model, a case study was conducted, and the study area is the ATU Faculty of Engineering building complex. The complex area is located in the center of the campus and is a unique space for forming the main integrated courtyard system and providing a pedestrianized space for users, as shown in
Figure 3.
The campus design attracts attention for the holistic design approach, and the courtyard system is consciously shaped, preventing the formation of leftover spaces; thus, the study area consists of subspaces with diverse, deliberately specified spatial features. The complex area is addressed in the scope of the study to test the proposed methodology.
3.1. Data Gathering and Pre-Processing
During the data collection stage, a systematic observation was conducted at the ATU Faculty of Engineering building complex during the hot season, and the subspaces of the courtyard system were determined and classified according to the spatial features specified in the literature. Including a whole complex area, seventeen subspaces were determined and utilized as observational units forming the basis of the observational study carried out systematically throughout three observational cycles each day, along six days, as shown in
Figure 4. Observational sampling is specified in detail to include the full social behavior of the individuals within each observational unit. Socio-behavioral patterns observed were recorded using detailed checklists and behavioral maps at each cycle, with the site plan serving as the spatial reference plane.
The data set used in the first stage of the study comprises a total of 331 images and behavioral data for 625 individuals, spanning the first five days and fifteen cycles of the observation process. These data were split into 70% training, 20% validation, and 10% test data to build a reliable object detector model capable of identifying individuals performing various actions and postures under different environmental conditions. The 331 images collected during the first five days of observation contain participants with different postures and actions, and the class-wise distributions are shown in
Table 1. To evaluate the generalizability of the proposed AI-based framework, the observation data from the sixth day were included in the test set, and the number of participants was increased from 625 to a total of 746. For training and validating the object detector, 565 of the participants are included, while the rest of the samples, including 83 images and 181 participants, are used as input for predicting the social interaction score in the second phase of the AI-based unified framework.
A preprocessing approach is used to enhance readability by separating behavioral maps into image units to make them eligible for AI methods, and then processing them using object detection.
Figure 5 presents sample visualizations created based on data obtained from observations made on six days, three time periods each. Different postures and actions (sitting, walking, and standing) of individuals are represented by distinct symbols (circle, triangle, and cross).
Annotation of ground-truth data poses a major challenge, especially for identifying small-scale objects in architectural areas, since detection performance largely depends on labeling accuracy. In large interior or campus plans, a large number of social interaction groups distributed in different locations makes manual labeling quite time-consuming. In particular, for architectural layouts of complex geometries, accurate labeling of these objects is crucial for both spatial consistency and detection performance.
Manual observations to compute human interactions are also time-consuming, repetitive, and exhausting processes. In such analyses, the systematic evaluation of observations obtained at different times and places is often vulnerable to human error and inconsistencies. To overcome this problem, a unified interaction prediction approach is proposed that leverages object detection and ML on the collected observation data.
The proposed approach includes a unified AI-based framework that analyzes the spatial relationships among individuals and predicts their social interaction densities from visual data, using conventional data-gathering and analysis methods to provide a basis for benchmarking and validation. In the first stage of the AI-based framework, the object detector determines the class label and the position of individuals exhibiting various actions and postures in the visualized observational data. In the second stage, a distance-based interaction analysis is performed between all individuals detected in an image. The Euclidean distances between each pair of individuals within the same image are computed using the centers of bounding-box coordinates determined by the object detector. Then, a structured dataset is created from the interaction scores derived from pairwise distances, which is used to train and evaluate an ML classifier. The classifier learns to predict an interaction score for each individual, enabling automatic prediction of human social interaction from visual data. Finally, the predicted social interaction scores are analyzed in relation to spatial data using an AI-based feature selector. The most discriminative spatial features for social interaction score are selected and compared with the spatial analysis results obtained using conventional methods.
This unified framework establishes an effective bridge between AI and socio-behavioral analysis, thereby enabling the extraction of human interaction models from complex scenes and their relation to spatial data. This represents a significant step toward integrating AI and architecture. For the triangulation of the findings and validation of the proposed methodology, conventional approaches and techniques are beneficial within the scope of the study.
3.2. Behavioral and Spatial Metric Specifications and Calculations
In this study, the social interaction scores of individuals and the spatial features of the spaces they occupy are determined, and these behavioral and spatial evidence are analyzed comparatively to determine socio-spatial affordances in educational settings, both in conventional and AI-based manners. Based on conventional approaches, observation-based behavioral data and spatial data are collected and analyzed using SPSS version 29.0 [
73] and Jamovi version 2.6 [
74].
3.2.1. Compound Social Interaction Index Construction
The study develops a calculation system, the Social Interaction Behavior Index, to determine the social interaction scores of each observee, which is carried out through a multi-phased procedure structured as a comprehensive and quantitative measurement. In the scope of this index calculation, the behavioral components of social interaction are determined with reference to the relevant environment-behavior studies and four components are utilized for the calculation of social interaction scores: (1) number of persons [
21], (2) interaction distance [
21,
28], (3) physical orientation [
20], and (4) body posture [
20,
22]. Secondly, each component is converted into ordinal data for the classification of social interaction behavior as indicated in
Table 2. These variables are combined to calculate a Social Interaction Index for each observee utilizing a calculation approach based on equal weighting to avoid creating predefined hierarchies or subjective prioritization among the behavior indicators. The internal consistency of the calculation is measured with reliability tests.
Within this study, social interaction behavior has been conceptualized as a compound behavioral structure, enabling a standardized, reproducible quantitative measure of the social interaction index and artificial intelligence-based prediction rather than qualitative typological differentiation. These components are listed at
Table 2.
The approach is developed in accordance with the proposed research method, and the social interaction scores are determined to be analyzed in relation to spatial features.
3.2.2. Spatial Metric Specifications
In the scope of the study, the spatial distribution of observed social interaction behaviors was first addressed in order to expose the spatial aspect of social interaction patterns. The following spatial metrics are systematically specified as the prominent spatial features included in the study, based on the related studies in the existing body of research, and Affordance Theory created an analytical basis for the discussion on socio-behavioral possibilities of space. The precisely selected spatial features are listed as follows:
The proximity to educational facilities and the proximity to main circulation axes: Considering the educational facilities and circulation axes as the main functions in the educational settings, these features are evaluated as the primary distance-based spatial features in educational settings [
35],
The proximity to the main entrances: Alexander emphasizes the role of ground floors in connecting indoor and outdoor spaces, highlighting the functional significance of main entrance areas in facilitating outdoor use [
75].
Pedestrianizm: The strategy of pedestrianization is suggested to first encourage the use of space and then social interaction behavior [
2].
Surface characteristics and shading strategies: On account of their effects on thermal comfort perception in open spaces, surface characteristics [
76] and shading strategies [
77] are evaluated as important influential factors.
Enclosure: Feeling of privacy and tendency to interact with other individuals are affected by the enclosure level of the space, and although the extent of the effect varies across different user groups, it is determined that the users generally do not prefer to engage in social activities in spaces with excessively high levels of enclosure levels [
78].
Existence and rotation of seating units: Both existence and rotation of seating units are robust and well-established variables, and the presence of seating areas has an amplifying effect on high-density interaction, while seating arrangements based on face-to-face interaction further enhance this effect [
79].
Food and beverage supply: It is suggested that, within spaces where food and beverages are provided, the setting serves as a focal point for social interaction behaviors [
2].
These spatial features, informed by prior studies, capture the physical, spatial, and functional characteristics of space. Selecting the ones relevant to the fieldwork addressed in this study from this set is important for an accurate assessment of the empirical study’s framework. As all the spaces examined in the empirical study are designed with a pedestrianization strategy, the “Pedestrianism” variable is excluded from the scope. Additionally, the “Food and beverage supply” is excluded from the scope of the study, as all spaces examined are within close proximity to areas serving food and beverages. Thus, the spatial features determined by the contextual conditions were operationalized for the spatial classifications, and the spaces included within the case study area are categorized under the following subgroups
Table 3.
The determined set of spatial features was compiled based on the existing body of research and inclusively structured to represent the socio-spatial affordances across the entire study area. Based on the Affordance Theory, spatial features are conceptualized not only as physical characteristics but also as relational characteristics that mediate possible social actions. For instance, proximity to main activities and the presence of seating areas increase the likelihood of being present in and spending time in the space, thereby enhancing the probability of interaction, whereas seating orientation is interpreted as an affordance that influences the potential for face-to-face interaction. On the other hand, shading elements provide climatic comfort, enabling prolonged use and informal gatherings, while the space’s enclosed feature is evaluated as an affordance that determines the balance between privacy and social behavior.
This structure has constituted a basis for revealing the relationship between the two factors and subsequently been integrated into an AI-based, Affordance Theory-based analytical framework.
3.2.3. Comparative Analysis Approach for Behavioral and Spatial Evidences
Within the methodological approach, the precise determination of social interaction scores is crucial for revealing the overall interactional distribution, interpreting individual interaction scores, and examining spatial distributions to interpret the results within the Affordance Theory framework, based on the behavior-space relationship. To this end, after classifying spatial characteristics, the correlation between the two factors is statistically assessed using Spearman’s rho correlation, and the relational structure is discussed in the context of the Affordance Theory.
3.3. AI-Based Unified Framework
The primary concept of the proposed AI-based unified framework is to feed the observation data generated by behavioral mapping into the object detector as the first step, construct distance-based interaction features using the class and location of the detected individuals, and then predict social interaction scores using ML classifiers regarding the previously determined social interaction scores as target values. Based on the predicted social interaction scores, spatial features are analyzed to identify the most discriminative spatial features with respect to the social interaction score, and the consistency of conventional methods with the AI-based framework is assessed. The individual steps of the proposed AI-based framework are described in detail below.
3.3.1. Deep Learning-Based Human Detection
Object detection is a field of computer vision with various applications ranging from surveillance systems [
80,
81,
82] to autonomous vehicles [
83,
84,
85,
86], medical imaging [
87,
88], and agriculture [
89,
90,
91]. In recent years, DL-based object detection methods have revolutionized this field with the emergence of Convolutional Neural Networks (CNNs). The YOLO (You Only Look Once) detectors developed in this context are among the most effective approaches for classifying and predicting object locations in video frames or images, while also enabling real-time object detection.
In addition to these broad applications, object detection can also be employed in architectural studies, particularly for the analysis of human behavior, spatial interactions, and the evaluation of spatial efficiency within built environments.
YOLOv8 Object Detector
In this study, as the first step of the proposed AI-based unified framework, the YOLOv8 object detector [
92] was employed to detect individuals in the visualized observation data.
The YOLOv8 framework comprises two major components: the backbone and the detection head. The backbone and neck parts of YOLOv8 are based on the design principles of YOLOv7 and replace the C3 module of YOLOv5 with the C2f structure. The C2f module enables the fusion of higher-level feature representations with contextual insights to improve detection performance. Compared with YOLOv5, YOLOv8 has two major enhancements in the head: First, the unified head of YOLOv5 is replaced by the popular separated head framework, which separates the classifier and detection head. Second, the model has changed from an anchor-based approximation to a no-anchor approximation.
YOLOv8 has five variations: YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x. Compared to other YOLOv8 versions, YOLOv8s provides reasonable detection speed and good detection accuracy [
93,
94]. Therefore, YOLOv8s is chosen as the baseline model in this study.
Table 4 provides an overview of the YOLOv8s architecture, including the number of layers, the input connections for each layer, the total parameters, and the module types.
Suppose the image set of observation data is represented by
for
where each
is handled individually by the YOLOv8 object detector. The detector produces a series of raw predictions for a given image
:
where
denotes the predicted bounding box of the
k-th detected object in image
and comprises
,
is the class label, and
is the confidence score.
Confidence Filtering (CF)
Predictions with low confidence scores for each image
are eliminated by using a threshold
:
This step discards unreliable detections and selects only predictions that exceed the specified threshold.
Class-Wise Non-Maximum Suppression (NMS)
To eliminate redundant overlapping boxes, non-maximum class-specific suppression is performed. For two boundary boxes
and
, the horizontal and vertical intersections are computed as follows:
The Intersection over Union (IoU) between two boxes is then calculated by the following:
For each class, boxes with are removed, thus only the most representative boundary box for each object remains.
3.3.2. Distance-Based Interaction Feature Construction (DIFC)
Based on the social interaction components, a calculation-based approach is developed for constructing distance-based interaction features.
Suppose that each image
includes
individuals detected by YOLOv8. We denote the set of indices of detected individuals by the following:
For every individual , the center coordinates of the predicted boundary box are denoted by , as mentioned in GFE. In addition, the corresponding class label of individual i is represented by .
For each individual
i in image
, we compute its Euclidean distance to every other individual
,
, using
The interaction vector for the individual
i is then defined as follows:
Here, represent all other objects in the same image. Since each image may contain a different number of objects, the maximum number of neighbors in the dataset is defined by the longest neighbor list, and each vector is padded with zeros to ensure the columns have equal dimensions.
3.3.3. Social Interaction Score Prediction Using a Machine Learning Approach
ML in architecture can be applied to complex problems, such as the relationship between user behavior and space, the analysis of interaction patterns, the detection of movement patterns, and the quantification of social interaction scores. In the learning phase, the ML model aims to minimize the difference between the predicted and desired outputs using a supervised learning algorithm. Thus, the multidimensional structure of human interactions in spatial areas can be analyzed using a data-driven approach, providing an analytical framework that provides more objective, measurable, and verifiable inputs for spatial design decisions.
This study aims to predict the score of social interaction between individuals in campus open spaces using a unified AI-based approach. In ML, samples are commonly represented as vectors; each element of a data point corresponds to a quantized value for a specific feature in the vector. Such a representation enables the mathematical modeling of spatial relationships necessary for the calculation of interaction scores, which are then incorporated into the prediction process by being transferred to a formal learning framework. The fixed-length distance-based interaction dataset obtained in
Section 3.3.2 forms the input feature matrix:
where each row corresponds to a single individual in the dataset, and each column represents one of the
D interaction features. Let the corresponding interaction-score ground-truth labels be denoted by
where each
represents the interaction score. A supervised learning model is trained to learn a mapping
such that
approximates the true interaction score of the individual represented by
.
3.3.4. Ranking of Spatial Features by ReliefF
In addition to Spearman’s rho correlation based on social interaction score determined by behavioral coding, spatial features were also ranked based on social interaction scores predicted by an ML classifier using a feature selection algorithm. Therefore, the aim is to compare the overlap rate between statistics-based analysis and AI-based data-driven spatial feature rankings. In terms of the AI dimension, the results were calculated via ReliefF [
95] feature selector for the triangulation of the statistical results, and both the eventual performance and the procedural success of the AI tools are demonstrated incrementally.
The Relief algorithm was initially proposed by Kira and Rendell in 1992 to address binary classification tasks [
96]. Since it was limited to only binary classification problems and was unable to deal with missing data, Robnik improved the Relief algorithm to handle multi-class classification tasks and proposed the ReliefF algorithm [
95]. The ReliefF algorithm is a traditional multi-variable filtering approach that computes correlations between features and class labels to assign feature weights. In general, more heavily weighted features have a greater impact on classification performance. The weight of feature
A is calculated using Equation (
13):
where
is
ith sample,
is
jth nearest miss,
is
jth nearest miss in class
C,
s represents the number of samples,
k is the number of nearest neighbors,
is the Euclidean similarity between the sample
and
for each feature
A, and
is the probability of class
C [
97].
3.4. Comparative Analysis of Unified Methodology
The unified methodological approach proposed in this study addresses social interaction scores and the distribution of these scores according to spatial characteristics through two methodological approaches: The conventional approach based on basic statistical calculations, and an innovative AI-based approach that is more practical and efficient in terms of many aspects; findings are represented through simultaneous processes. These multi-factorial relations are primarily tested statistically using Spearman’s rank-order correlation, and the findings are subsequently reproduced using the proposed AI-based methodology. The results of both approaches are compared to assess their compatibility. This approach provides a strong foundation for both contributing to the field of implementation through the results presented and evaluating new methodological approaches through comparative analysis. The results are presented to reveal the sociospatial affordances of campus open spaces across diverse spatial features.
4. Experimental Tests and Results
4.1. Behavioral and Spatial Implications
Based on the behavioral and spatial metrics assigned in the scope of the study, the social interaction scores of the individuals, and the spatial features of the spaces are analyzed using conventional statistical approaches to determine socio-spatial affordances in educational settings.
4.1.1. Social Interaction-Based Results
Observation-based empirical evidence forms the central premise of the study. A total of 746 observees are recorded via behavioral maps and detailed checklists. Excluding the section of this data used in the initial phases of the AI-based methodology for the training and validation processes, 181 of them were used for social interaction calculations in this phase. Social interaction levels are calculated from the identified components using behavioral coding. Components of the social interaction index are analyzed statistically, and internal consistency is assessed. Since the reliability test, Cronbach’s
is calculated based on Pearson correlations, and the assumption of continuous data, producing accurate results with these sub-factors as ordinal (categorical) data were not feasible. Therefore, polychoric correlation and factor-based McDonald’s
were used for reliability estimation using Jamovi [
74]. As a result, a high reliability level (
p = 0.858) has been obtained.
For basic qualitative analysis, social interaction scores were categorized into three groups based on interaction scores -low, medium, and high- to facilitate interpretation. Descriptive graphs were created accordingly, and the general and spatial distribution of the social interaction categories results in
Figure 6. On the other hand, social interaction scores were retained as continuous index scores in their original format for quantitative analysis, consistent with the main conventional approach.
According to the distribution shown in the graph
Figure 6a, 13.3% of those observed have a low level of interaction, 48.1% have a medium level, and 38.7% have a high level. Considering the general distribution across space groups, it is observed that high social interaction scores were most prevalent in courtyards at 24.3%, followed by circulation axes at 14.4%, and were absent in frontyards or backyards. It was observed that moderate social interaction scores were evenly distributed in front yards, circulation axes, and courtyards (17.7%, 15.5%, 13.8%), but were not encountered in backyards. Low social interaction scores were most prevalent in circulation axes at 7.2%, followed by front yards at 3.9% and backyards at 1.7%. Only 0.6% of participants with low interaction scores were found in backyards, indicating a very low rate, as shown in
Figure 6b.
4.1.2. Spatial Feature-Based Results
The spatial features addressed in the study are taken as a basis, and all subspaces in the study area have been classified according to them as shown in
Figure 7.
Examining the distribution of spatial characteristics across subspaces reveals that even spaces that appear equivalent differ in terms of their physical and spatial characteristics. Spatial results are classified for comparison with behavioral data.
4.1.3. Comparative Analysis of Behavioral and Spatial Evidences
The behavioral evidence, including the social interaction scores and spatial evidence, including spatial feature-based classifications, is revealed, and the correlation between the two factors is statistically assessed through Spearman’s rho correlation as shown in
Table 5.
The results were analyzed to determine the direction and strength of the relationship between behavioral and spatial findings. Ultimately, the strongest relationships were found between social interaction scores and the presence and rotation of seating units, and then with shading strategies, whereas proximity to main circulation axes showed a moderate level of relationship with social interaction scores. Additionally, the proximity to the main entrances and main educational facilities exhibits a weak correlation with social interaction scores, and no significant relationship was found with enclosure and surface characteristics variables.
4.2. AI-Based Behavioral and Spatial Implications
The first stage of the proposed AI-based approach includes utilizing transfer learning to leverage pre-trained features from the COCO dataset, which is a large collection of image data consisting of common objects. Transfer learning enables the YOLOv8s model to enhance its object detection capabilities by training on an independent yet relevant dataset without having to start training from scratch.
In this study, 331 observations collected over 5 days at different times and locations on campus were visualized, and ground-truth annotations were generated for training the YOLOv8 object detector. Of the 331 observations, 232 were used for training YOLOv8, 67 for validation, and 32 to predict the social interaction score in the second stage of the AI-based unified framework. To increase training diversity and robustness of the detector, the number of training data was increased to 443 samples by applying data augmentation techniques such as grayscale conversion and horizontal flipping before training. To evaluate the generalization capability of the trained object detector, the observation data from the sixth day was also added to the test images; thus, the number of images used in the second stage was also increased to 83. The trained YOLOv8 model was evaluated on 83 images comprising 181 individuals, and the position and class label of each individual in each image were detected for the second stage of the AI-based unified framework.
In the second phase of the AI-based unified framework, the input data consists of the class labels of individuals detected by the object detector and the center coordinates of their predicted locations. The Euclidean distances between all pairs of individuals were computed using the center coordinates of the individuals detected in each image, and the resulting distances were converted into a fixed-size feature vector per individual using DIFC. The distance-based feature vectors created were fed into ML classifiers, and the interaction scores for each individual were predicted by the ML classifiers.
In the final step of the proposed AI-based unified framework, social interaction scores predicted by the ML classifier were employed instead of behavioral coding for spatial feature analysis. The spatial feature importance ranking was performed by ReliefF using the social interaction score predictions of the classification model that showed the highest prediction performance.
4.2.1. Experimental Setup
All experimental evaluations related to the object detection stage presented in this study were conducted in a consistent experimental environment, Google Colab Pro+, which uses a GPU (A100) with ample RAM. The Colab Pro environment provided a high-performance computing platform that enabled the efficient execution of the model’s training. The A100 features up to 80 GB of VRAM and high processing capacity to support efficient experiments.
Table 6 shows the experimental environment and hardware specifications for the object detection stage.
The following steps of the proposed AI-based approach were performed on a Windows 11 system with an Intel Core i5 CPU (2.5 GHz), 8 GB of RAM, and an NVIDIA GeForce GTX 1650 Ti graphics card.
4.2.2. Hyperparameter Tuning
The hyperparameters for the YOLO object detectors were set based on the default values of the Ultralytics [
92] framework, commonly used in object detection tasks, and minor adjustments were experimentally performed using validation data. All the input images were resized to a fixed resolution for both training and testing. For machine learning classifiers, the default hyperparameter values provided by the scikit-learn [
98] were used. The hyperparameters used in this study are presented separately in the
Table 7 and
Table 8 for the object detection and social interaction score prediction stages, respectively.
4.2.3. Evaluation Metrics
We assess the performance of the AI-based model using six metrics: Accuracy (
ACC), Precision (
P), Recall (
R), Mean Average Precision at IoU 0.5 (
mAP50), Mean Average Precision at IoU 0.5 to 0.95 (
mAP50–95), and
F1 score. Accuracy, Precision and Recall are defined on the basis of True Positives (
TP), True Negatives (
TN), False Positives (
FP), and False Negatives (
FN) as follows:
Average precision (AP) is calculated based on precision–recall values and corresponds to the area under the precision–recall curve. mAP is defined as the average of APs for all the objects, where refers to the number of classes.
When evaluating the training and validation performance of YOLOv8, the
P,
R,
mAP50, and
mAP50–95 metrics are utilized, while
ACC,
P,
R, and
F1 score are used to evaluate social interaction score predictions.
4.2.4. YOLOv8 Detection Results
Before training YOLOv8, the images were resized to the maximum available resolution of the Roboflow platform (2048 × 2048 pixels) to maintain the consistency across the datasets. Additionally, data augmentation was used to increase the diversity of the training set, a 15% grayscale conversion was applied to the images, and a horizontal flip operation was performed. The training parameters of YOLOv8 are shown in
Table 7. Several standard object detection evaluation metrics were used to measure the training and validation performance of YOLOv8. The primary evaluation metrics are
mAP50 and
mAP50–95, which are used to measure the detection performance of the object detector at various levels. Furthermore, precision and recall scores were also analyzed to assess the ability of the detector to mitigate
FP and
FN.
In addition to YOLOv8, the first phase of the proposed approach was also evaluated using YOLOv12 with the same parameters to highlight model-independent performance and generalization capability across different state-of-the-art detectors. The experimental results with YOLOv12 show that the proposed approach is compatible with state-of-the-art architectures and largely consistent with those obtained with YOLOv8.
Table 9 summarizes the training and validation results of YOLO models including Precision (
P), Recall (
R),
mAP50,
mAP50–95, classification, box, and distribution focal loss (
) metrics. The evaluation results of YOLO models show that the validation losses of YOLOv8 are closer to the training losses compared to YOLOv12, showing that the model does not overfit and learns the structures of the dataset accurately. Furthermore, the
results indicate that YOLOv8 is highly stable in improving bounding box regression accuracy and optimizing coordinates more precisely. The 0.877 and 0.854
values obtained for training and validation, respectively, demonstrate that the training-validation performance is consistent in terms of bounding box regression. Similar to YOLOv12, YOLOv8 achieved 0.988 and 0.993 in precision and recall metrics, respectively. This demonstrates that the object detector accurately detects almost all individuals in the correct locations and classes, without any missing detections.
YOLOv8 achieved a score of 0.993 on the
mAP50 evaluation metric. This highlights that the detector recognizes individuals with near-perfect accuracy at a 50% IoU threshold. On the other hand, the value
mAP50–95 = 0.790 indicates that the model provides stable localization accuracy across different IoU thresholds, establishing a reliable geometric foundation, particularly for human detection studies requiring high precision. Similarly, YOLOv12 achieved a performance very close to YOLOv8, with 0.992 on
mAP50 and 0.799 on
mAP50–95. The experimental results obtained with YOLOv8 and YOLOv12 demonstrate that the proposed unified model is consistent and reliable across detector versions. This consistency validates the use of YOLOv8 as a representative and efficient foundation for the subsequent stages of the proposed unified framework. The overall detection results of YOLOv8 in
Table 9 are also presented in
Figure 8.
Figure 9 presents the confusion matrix of YOLOv8 on validation data. The detection results reveal that the model exhibits both high accuracy and recall across all classes. The sitting and walking classes both achieve excellent or near-excellent classification without
FP or
FN. The only misclassification occurs in the standing class, where one instance is misclassified as walking, indicating a slight overlap in the feature characteristics. Furthermore, no background scenes are incorrectly predicted as human, which confirms the high robustness of the detector in separating human objects in the foreground from background scenes.
4.2.5. Social Interaction Score Prediction Results Using ML Approach
In the first stage, individuals were detected in visualized observation data using YOLOv8, interaction scores were generated using behavioral coding, and the social interaction dataset was prepared using DIFC. Subsequently, interaction scores were predicted using ML classifiers. After implementing the YOLOv8-based detection model on the observation dataset, a test subset consisting of 83 visualized observation images was then used to extract geometric and distance-based interaction features for a total of 181 individuals. These feature representations constitute the input data for the ML models used in this stage. Thus, this section presents the experimental results from the social interaction score prediction task, evaluates the classifiers’ performance, and discusses the predictive capability of the proposed approach for modeling human social interactions. Some examples of human detection results generated by YOLOv8, transformed into interaction vectors for each individual, and fed into the second stage as input are shown in
Figure 10. To evaluate the generalization performance of the classification tasks, the dataset was split into a 70% training set and a 30% test set.
In the second phase of the proposed framework, five different classifiers were utilized to predict social interaction scores of individuals. These classifiers are Support Vector Machine (SVM), Random Forest, Logistic Regression, Gradient Boosting, and XGBoost. Each classifier was selected to represent a family of different learning models. Thus, the aim is to enable a comprehensive comparison among linear models, ensemble-based approaches, and boosting techniques. By evaluating multiple algorithms with varying learning characteristics, the aim is also to identify the most effective classification strategy for modeling interaction scores using geometric and distance-based features extracted from the detected individuals. The parameter settings of the classifiers are shown in
Table 8.
The proposed model was executed in 10 runs using each classifier for social interaction score prediction. The average prediction results in terms of accuracy (
ACC), precision (
P), recall (
R), and
F1-score are presented in
Table 10. The experimental classification results show that the Gradient Boosting classifier has the highest performance for almost all metrics. This is followed by the Random Forest and the XGBoost, respectively.
Experimental studies demonstrate that AI-based approaches have a strong capability in the prediction of social interaction scores. Evaluations with multiple classifiers show that the most successful approach achieved an F1-score of 91%, showing that such behavioral analysis can be performed with high accuracy from visualized observation data. The findings prove that, despite the complex and multidimensional nature of social interactions, object detection and ML techniques can efficiently model these relationships. Therefore, it demonstrates that contemporary AI approaches can provide a reliable and effective tool for the analysis of human behavior in spatial contexts.
4.2.6. Spatial Feature Scoring Results Using ReliefF
The spatial features were scored using ReliefF based on social interaction predictions from the Gradient Boosting classifier, which achieved the highest classification performance.
Table 11 shows ReliefF-based importance scores of spatial features. AI-based experimental analyses reveal that the spatial features that most strongly influence the social interaction score are the presence and rotation of seating units, and the use of shading strategies, respectively.
4.3. Methodological Comparison and Empirical Implications
In the scope of the study, both conventional and artificial intelligence-based methodological approaches were conducted concurrently. Within the scope of the conventional calculation of social interaction score, a social interaction index was employed consisting of four observable components, and the internal consistency of the components was calculated via McDonald’s , and a fairly high reliability (p = 0.858) was presented. Using Spearman’s rho correlation to assess the relationship between social interaction scores and spatial features, statistically significant associations were detected for six spatial features, three of which exhibited strong correlations.
In the AI-based methodology, the social interaction score predictions generated by Gradient Boosting, the ML classifier that yielded the highest average
F1-score, were utilized as target values for ReliefF in the AI-based ranking of spatial features. The spatial feature ranking based on the social interaction scores predicted by the ML classifier was then compared with the selected spatial features based on Spearman’s rho correlation analysis performed on the actual social interaction scores determined by behavioral coding.
Table 5 and
Table 11 show that the findings of both conventional statistical and AI-based spatial feature analyses are highly consistent.
The first hypothesis of the study regarding the methodological results focuses on the use of artificial intelligence-based approaches in revealing the relationship between spatial features and social interaction scores. The study confirms the hypothesis, as results from conventional and AI-based methods demonstrate significant, consistent findings in both stages: calculating social interaction scores and revealing the relationship between social interaction scores and spatial features.
Empirical results indicate that the presence and orientation of seating units and shading features are the most significant factors affecting the social interaction score in both approaches. Similar to Spearman’s rho correlation analysis, the surface characteristic could not be calculated in the ReliefF-based feature scoring since all individuals were included in the hardscape area, as seen in
Table 11. The rankings of spatial features are nearly identical across both approaches.
The study’s second hypothesis, regarding the empirical results of the study, inquires into the relationship between social interaction scores and different spatial features. The results indicate that the hypothesis is partially confirmed, with support for spatial features in which statistically significant relationships were identified, including proximity to the main entrances and educational facilities, the presence and orientation of seating units, and shading strategies. However, the hypothesis is not confirmed in terms of features that are not statistically significant or related, including surface characteristics and enclosure features.
5. Discussion
In today’s digital world, strengthening the connection between architecture and digital technologies and promoting the use of digital tools in architecture has become essential for understanding and managing the field’s development. The design processes, implementation practices, and even educational processes in the field are dominated by these tools, and being part of this shift seems vital to engage with the new era. Post-occupancy process is another aspect of the field in which the perception, behavior, and experience of users become the main actors of the process. The approach based on user centralization, however, requires processing large amounts of complex data to ensure a scientific approach, which most clearly justifies the use of artificial intelligence tools within the process. In particular, in studies based on behavioral approaches, it is quite time-consuming and has a high margin of error to identify and classify behavior at an adequate level of accuracy, to ensure that sufficient behavioral data is included, and to establish its connection with spatial data in conventional methods, making the use of artificial intelligence tools inevitable in these processes.
In this study, the impact of spatial features on social behavior is addressed, and in addition, data were analyzed using both methods, including conventional statistical calculations and the proposed AI-based unified methodology.
The development of a social interaction index based on behavioral components of social interaction facilitates the statistical analysis and spatial comparability of social interaction behavior. The generation of this index entails a conscious abstraction from qualitative components. The index reflects the overall intensity of social interaction behavior induced by spatial configurations rather than distinguishing among qualitatively different types of interactions. This abstraction is consistent with the quantitative and AI-supported analytical structure of the study.
Based on the methodological comparison, the findings from conventional statistical methods are largely consistent with those from the three-phased AI-based unified methodology. The approach based on user centralization, however, requires processing large amounts of complex data to ensure a scientific approach, which provides the clearest justification for the use of artificial intelligence tools within the process. Thus, the study proposes a successful and reproducible approach for detecting the behavioral aspect of social interaction through Deep Learning (DL)-based Human Detection via mapping, classifying it based on the relevant behavioral coding approaches via Distance-Based Interaction Feature Construction (DIFC) and Machine Learning (ML)-based Interaction Score Prediction approach, and revealing the spatial dimension of behavior by associating it with spatial variables using a Spatial Feature Selection tool. The study also confirms that this structure is consistent with the theoretical framework based on Affordance Theory, which is grounded in the multicomponent strategy that includes space and behavior.
In this context, a case study was conducted using both conventional methods and an AI-based unified methodology to examine the relationship between social interaction behavior and spatial features. The empirical evidence indicates the correlations between social interaction scores and spatial features in descending order:
Existence and rotation of seating units: Existence of seating units (Correlation Coeff. = 0.810,
p (2-tailed) = 0.000) and rotation of seating units (Correlation Coeff. = 0.763,
p (2-tailed) = 0.000) are identified to have the strongest monotonic relationship with social interaction score. The direction of the relations is determined to be positive, indicating that the presence of a seating unit and its positioning face-to-face are associated with higher interaction scores, which is consistent with the former studies in the existing body of literature [
2,
44].
Shading strategies: In areas with a limited number of shading elements, there is a strong negative correlation between the exposure to climatic conditions and a high social interaction score (Correlation Coeff. = −0.657,
p (2-tailed) = 0.000) based on the shading strategies, including protective function created by the buildings and the limited number of shading elements. These findings, beyond their consistency with the literature [
48], also emphasize the importance of using shading elements in outdoor campus environments in regions experiencing extreme temperatures to provide protection from climatic conditions.
The proximity to main circulation axes: A decrease in proximity to main circulation axes has a positive monotonic relationship with social interaction score, with a moderate level relationship relatively weaker than former relationships (Correlation Coeff. = 0.401,
p (2-tailed) = 0.000). This result demonstrates that higher interaction scores increase in areas other than the axis compared to areas on the circulation axes. These spatial distributions also highlight the fact that, compared to courtyards as gathering areas, high interaction decreases in circulation areas, low interaction levels increase, and medium interaction levels remain at a balanced level, consistent with the quantitative findings. However, these findings produce contradictory results compared to the studies presented in the theoretical discussions. While the former study of Negm [
2] emphasizes that circulation axes create a focal point for social interaction, it should be noted that differences in scale between different studies and study areas may be significant in the differentiation of results. The differences between the results are interpreted as correlating with the fact that all subspaces within the study area of this research are within walking distance.
The proximity to the main entrances and main educational facilities: The proximity to the main entrances (Correlation Coeff. = 0.256, p (2-tailed) = 0.000) and the proximity to the main educational facilities (Correlation Coeff. = 0.219, p (2-tailed) = 0.003) have positive relationship with higher social interaction levels, suggesting a weak relationship. Despite the identification of a slight but statistically significant positive relationship, the positive correlation between the variables and the social interaction score indicates that the interaction score increases in proximity to educational structures, and indeed, the main entrances of these structures also form a center of attraction, albeit a weak one.
The varying degrees of strength and significance identified among spatial characteristics and social interaction scores indicate that the selected variables not only define the physical conditions of a space but also can encourage or constrain social interaction to varying degrees, thereby creating differences in the socio-spatial opportunities they offer. Although the spatial features addressed in the study vary, Chen et al. [
45] point to walking areas as the most significant feature supporting social interaction in their studies, while in this study, proximity to walking areas has been more associated with low interaction levels, and the courtyards, which can be described as gathering areas, have accommodated higher interaction beyond the circulation areas. Furthermore, components such as the existence and face-to-face arrangement of seating units and shading strategies based on protection from climatic conditions were found to exhibit a stronger relationship in this study. Additionally, there are spatial features that do not demonstrate a statistically significant relationship that have been identified.
Enclosure: The relationship between enclosure level and social interaction score was identified as very weak and statistically insignificant (Correlation Coeff. = 0.082,
p (2-tailed) = 0.272). Thus, the assumption presented in the existing literature regarding the tendency of users to avoid engaging in social activities in areas with a high degree of privacy cannot be substantiated [
78]. According to the findings, it has been assessed that the reason for the absence of a strong and reliable relationship between enclosure level and social interaction score is considered to be due to secondary spatial characteristics. When examining sub-spaces with high enclosure rates in the study area, it is found that these spaces are both located on circulation axes that trigger low interaction levels and simultaneously contain the presence of seating units, including the ones that are arranged for face-to-face interaction, which contribute to high social interaction levels. Thus, it is determined that it is not possible to identify the effect of the enclosure rate individually on social interaction due to this combined structure. This situation is interpreted as the enclosure feature being surpassed by additional features that have a stronger and statistically significant relationship with the social interaction score.
Surface characteristics: The correlation status and significance of the spatial characteristics could not be calculated due to all users being identified in hardscape areas. However, approximately 20% of the area consists of natural and soft ground, and all observees prefer hardscaping for their social interaction activities. Although this does not produce a statistically significant result, it is considered an important finding in terms of monitoring user preferences.
Additionally, previous studies have highlighted the importance of several other spatial features such as pedestrianization strategies [
2,
44], and food and beverage supply [
2], which were left out of the scope of the empirical study. Since the entire study area was designated as vehicle-free, based on a pedestrianization strategy, all users were observed in the pedestrianized area. In the study area, the entire subspaces are located at a similar proximity to several spaces where food and beverage supply was provided; thus, the feature was not included in the study.
The findings of this study are noteworthy in revealing that certain spatial features can systematically demonstrate statistical relationships that either encourage or constrain social interaction. Thus, the results support the utilization of these spatial variables as socio-spatial opportunities. Consistent with Affordance Theory, the presence of seating areas is found to support socio-spatial affordances, as they facilitate spending time together and provide more opportunities for social interaction. The orientation of seating areas emerges to create a relational opportunity to mediate the potential for face-to-face interaction, and the shading strategies function as a protection from exposure to climatic conditions and raise the socio-spatial affordances. On the contrary, features such as enclosure have not exhibited significant relationships, yielding an open-ended outcome that requires repeated examination across spatial, sociocultural, and institutional contexts and under different climatic conditions.
Recent studies have increasingly investigated computer vision-based frameworks for quantifying social interactions of people in public open spaces. However, fully automated AI-based approaches for social interaction analysis are relatively limited in the literature. Chen et al. [
63] proposed an approach that utilizes computer vision techniques to analyze pedestrian social interaction in urban public spaces through two measurable dimensions: interpersonal distance and interaction duration. The study employed open-source video data obtained from a Skyline live webcam installed at a hotel overlooking Jubilee Square in Leicester, United Kingdom. The authors combined YOLOv7 and DeepSORT to track pedestrians’ movement trajectories in each video frame. OpenCV’s findHomography function was employed to convert the pixel coordinates of detected pedestrians in the video frames to their real-world coordinates. Social interaction was determined using predefined rule-based thresholds, such as two or more people being within 3.7 m of each other and maintaining that distance for more than 10 s, rather than a machine learning model. The proposed method achieved 98.95% accuracy, 92.31% precision, and a 78.69%
F1-score. In another study [
64], the authors proposed a novel methodology that uses computer vision and machine learning to analyze co-presence and micro-social interactions in urban spaces. The study utilized 22.5 min (1350 s) of video data captured by five strategically placed cameras on a university campus (CUHK). A precise 3D model of the study area was created using unmanned aerial vehicles (UAVs) and photogrammetry software. In the proposed model, YOLOv8, a single-stage object detector, was used to detect pedestrians. To improve the accuracy of the object detector, 750 images randomly selected from the dataset were manually labeled, and the pre-trained weights of YOLOv8 were refined. This training resulted in an average accuracy rate of 92.6%. The BoT-SORT algorithm, integrated within YOLOv8, was used to assign a unique and persistent identity (ID) to each detected pedestrian. Although the AI algorithms YOLOv8 and BoT-SORT were used as the primary tools to obtain raw motion data and trajectories in the proposed model, the categorization of social interactions was performed on this data using the Rhino/Grasshopper platform.
The results obtained by AI-based unified methodology for examining the relationship between social interaction behavior and spatial characteristics is consistent with the results of conventional methods. The presence of seating units, with a feature score of 0.0719, was identified by ReliefF as the most decisive spatial feature for the social interaction score. In addition, the orientation of seating units and shading strategies also play important roles in explaining social interaction scores, with corresponding scores of 0.0536 and 0.0534, respectively. Proximity to circulation axes and proximity to main entrances were found to have a moderate effect on social interaction scores. Furthermore, enclosure and main educational facilities have a weak effect, whereas surface characteristics could not be calculated due to the inclusion of all users in hard-surfaced areas.
When these results are examined within the context of the network-based Affordance Theory [
3] of the environment-behavior interaction, (1) the study confirms that the spatial components of this network-based structure are decisive in occupancy patterns; (2) social interaction, as a unit of social behavior, has a significant relationship with several spatial features, supporting the suggested affordance-based structure; and (3) social–spatial affordances are analyzed and determined through specific spatial qualities, thereby offering a repeatable, adaptable structure in different contexts and under different conditions. In this regard, this study proposes a socio-spatial affordance approach that addresses social interaction behavior in its spatial dimension when re-examined within the framework of Affordance Theory. Thus, by propounding the term “socio-spatial affordances” with reference to its contextual interconnectedness, the study evaluates the factors mentioned above as supporting these affordances, thereby offering a conceptual contribution that signifies functional unity, beyond the mainstream perspective that views space and behavior as separate entities.
The contributions of the study could be discussed under three topics: theoretical, methodological, and empirical contributions. From a theoretical perspective, the term “socio-spatial affordances” and the discussion of findings based on Affordance Theory constitute the study’s main theoretical contribution. From a methodological perspective, it contributes to the field by proposing a unified AI-based methodological framework that can be applied to similar approaches and studies, providing reliable and valid reliability and prediction statistics. The study also has empirical contribution to the field, suggesting that socio-spatial affordances of open spaces in educational settings can be directed and enhanced by locating seating units in a face-to-face direction, employing shading elements to control the climatic conditions, and locating the spaces in an optimal distance to the main circulation axes and closer to the main educational facilities, especially closer to the main entrances of the buildings.
Limitations and Future Research
This study has several methodological and empirical limitations. In the context of methodological investigations, the proposed AI-based unified framework has certain limitations that should also be considered. The selection of the most discriminative spatial features in the proposed AI-based framework relies on the correct prediction of social interaction scores. DIFC was applied using the object detector outputs, resulting in a sequential relationship in which the accuracy of each stage of the proposed framework affected the prediction performance of subsequent stages. To evaluate the generalization capability of the proposed AI-based framework, the object detection phase was performed using two different state-of-the-art object detection methods, and the social interaction prediction phase was conducted using five different machine learning classifiers. Despite these dependencies, the proposed framework demonstrated stable and consistent performance across all experiments.
Empirical limitations are derived primarily from a singular context-based arrangement. Relying on a single spatial, socio-cultural, and institutional environment and a specific observation period, the findings reveal consistent patterns in this particular campus setting; however, they represent empirically observed tendencies within this context rather than deterministic behavioral rules. The fact that the study area consists entirely of open spaces makes seasonal and climatic conditions decisive, while increasing the potential to shape components such as shade structures designed to provide protection from climatic conditions. Moreover, due to local ethical and privacy rules, non-intrusive, observation-based data collection methods were preferred, thus prioritizing behavioral visibility over experiential or subjective dimensions of interaction. These limitations have resulted in interaction scores being restricted to behavioral and quantitative data. Thus, alternative qualitative research designs that do not focus on quantitative data and/or observable behaviors, and further studies based on self-reported or interpretive data can complement the proposed framework by providing deeper and multi-dimensional insights. However, these research approaches are outside the scope of this study, as they require a different analytical framework.
It should also be noted that the findings were not evaluated in terms of causal relationships within the scope of the study; rather, the study was shaped by correlations demonstrating the relational trends between spatial and behavioral data.
6. Conclusions
This study proposes an AI-based, unified framework that combines object detection and machine learning techniques to analyze social interaction patterns and their relationships with spatial features in campus open spaces, while relying on conventional data-gathering and analysis methods as a basis for comparison and validation. To this end, the proposed methodological stages are addressed using conventional methods and then reconsidered using AI-based approaches to assess their applicability. The proposed AI-based unified framework initially detects individuals using observation data generated by behavioral mapping, constructs distance-based interaction feature vectors based on the class and position of the detected individuals, and then predicts social interaction scores using ML classifiers, with predefined social interaction scores as target values. Thus, the aim is to provide a comprehensive and systematic approach to evaluating space-interaction relationships.
The relationship between the social interaction scores and spatial features was assessed using both Spearman’s rho correlation and ReliefF, an AI-based feature selection method, for methodological triangulation. Both results were consistent. The consistency between the spatial feature-social interaction relationships obtained by conventional approaches and the results of the proposed AI-based unified framework can primarily be explained by the high performance achieved in both stages of the unified AI-based model. In the first stage, the object detector demonstrated strong reliability by achieving a value of 0.79 mAP50–95 to detect individuals within the visualized observation data. In addition to YOLOv8, the initial stage of the proposed framework was also evaluated using YOLOv12 to emphasize model-independent performance and the ability to generalize across various state-of-the-art object detectors. Systematically, the distance-based interaction feature construction was employed based on the results of an accurate object detection task. In the second stage, the reliable predictions of machine learning classifiers further reinforced this consistency. The proposed model was evaluated using five machine learning classifiers for predicting social interaction scores, resulting in high levels of accuracy across all classifiers. Social interaction scores were predicted by the Gradient Boosting classifier, achieving a remarkable F1 score of 0.91. The high performance of the object detection and classification tasks ensured that the spatial feature scoring determined by ReliefF in the final stage strongly overlapped the results obtained by conventional methods, thereby confirming the reliability and validity of the proposed AI-based framework. In social interaction-space analyses performed using both conventional statistical and AI-based approaches, the relationship between spatial features and social interaction scores was ranked nearly identically. The results of the empirical stage indicate that the presence and rotation of seating units, as well as shading strategies, lead to higher social interaction scores and contribute most to denser social interaction densities.
This study has several major methodological and operational limitations. The empirical study conducted within the scope of this research is limited to a single campus setting within a specific spatial, socio-cultural, and institutional context, and under specific climatic conditions. The observations were conducted during a specific seasonal period, during which the formation of behavioral patterns may be influenced by these conditions. Additionally, ethical and confidentiality concerns limited data collection to non-intrusive, observation-based methods. These factors constrain the direct generalization of empirical findings. However, in terms of methodological structure, the proposed AI-based framework is transferable and adaptable, applicable across different spatial typologies, climatic conditions, and institutional contexts.
Despite the empirical limitations, this study contributes to the theoretical framework by applying the Affordances Theory and suggesting the term socio-spatial affordance to emphasize the relational structure and utilizing it in the methodology. The methodological contribution of the study is to suggest a pioneering unified methodology based on AI approaches and tools, while the empirical contribution comprises the determination of spatial features that stimulate social interaction behavior.
In future research, the theoretical framework can be improved by the inclusion of alternative theoretical approaches beyond the Affordance Theory-based approach and diverse behavioral concepts within the analytical framework. The methodological approach developed in this study can be further promoted by utilizing alternative AI-based methods and integrating complementary qualitative approaches. Moreover, the proposed model can be further enriched in future work by incorporating additional social interaction components. Analysis approaches can be expanded to encompass different spatial characteristics; the approach can be extended to include multiple campuses, and different types of public spaces can also be addressed using this approach.