Irregular Openings Identification at Construction Sites Based on Few-Shot Learning

Seo, Minjo; Kim, Hyunsoo

doi:10.3390/buildings15111834

Open AccessArticle

Irregular Openings Identification at Construction Sites Based on Few-Shot Learning

by

Minjo Seo

and

Hyunsoo Kim

^*

Department of Architectural Engineering, Dankook University, 152 Jukjeon-ro, Suji-gu, Yongin-si 16890, Gyeonggi-do, Republic of Korea

^*

Author to whom correspondence should be addressed.

Buildings 2025, 15(11), 1834; https://doi.org/10.3390/buildings15111834

Submission received: 2 May 2025 / Revised: 21 May 2025 / Accepted: 23 May 2025 / Published: 27 May 2025

(This article belongs to the Section Construction Management, and Computers & Digitization)

Download

Browse Figures

Versions Notes

Abstract

The construction industry frequently encounters safety hazards, with falls related to undetected openings being a major cause of fatalities. Identifying unstructured openings using computer vision is challenging due to their unpredictable nature and the difficulty of acquiring large labeled datasets in dynamic construction environments. Conventional deep learning methods require substantial data, limiting their applicability. Few-shot learning (FSL) offers a promising alternative by enabling models to learn from limited examples. This study investigates the effectiveness of an FSL approach, specifically model-agnostic meta-learning (MAML), enhanced with domain-specific attributes, for identifying unstructured openings with minimal labeled data. We developed and evaluated an attribute-enhanced MAML framework under various few-shot conditions (k-way, n-shot) and compared its performance against conventional supervised fi-ne-tuning. The results demonstrate that the proposed FSL model achieved high classification accuracy (over 90.5%) and recall (over 85.5%) using only five support shots per class. Notably, the FSL approach significantly outperformed supervised fine-tuning methods under the same limited data conditions, exhibiting substantially higher recall crucial for safety monitoring. These findings validate that FSL, augmented with relevant attributes, provides a data-efficient and effective solution for monitoring unpredictable hazards like unstructured openings, reducing the reliance on extensive data annotation. This research contributes valuable insights for developing adaptive and robust AI-powered safety monitoring systems in the construction domain.

Keywords:

few-shot learning (FSL); meta-learning; transfer-learning; construction safety; attribute enhancement; model-agnostic meta-learning (MAML); unstructured hazards; hazard identification

1. Introduction

The construction industry suffers from various types of safety accidents annually, with falls recognized as one of the most critical causes of fatalities [1,2,3,4]. According to statistics from the Ministry of Employment and Labor of Korea (Construction Safety Management Integrated Information, CSI) over the past five years (2020–2024) [5], falls account for approximately 51% of fatal accidents on construction sites [1,2,3], the highest among all accident types. Given the increasing scale and height of construction projects, enhanced safety management strategies to prevent such accidents are urgently required [1,6,7].

One of the major contributing factors to fall-related accidents at construction sites is the presence of openings in floor slabs or structural elements [8,9,10,11,12,13,14]. These openings may be intentionally created for building access or construction convenience [9,15,16]. However, temporary openings not reflected in design documentation are also frequently made [9]. When such openings are not properly identified or lack sufficient protective measures (e.g., guardrails or covers), workers and equipment are exposed to significant fall hazards [9,17,18]. In fact, fall accidents occurring near these openings often lead to serious injuries or fatalities, emphasizing the necessity of continuous monitoring of their conditions for effective site safety management [9,10,11,19,20,21].

To effectively prevent fall-related accidents, it is essential to systematically identify and manage these hazards—specifically openings—at an early stage and respond appropriately [18,21,22,23]. This necessitates a robust system capable of continuously monitoring the status of openings across the site and providing timely alerts for necessary safety interventions [20,24,25,26,27]. Proactive hazard identification before incidents occur is paramount [28,29,30].

However, systematically identifying all openings including regular and irregular ones poses challenges [12]. Openings on construction sites can generally be classified into two types: regular openings specified in design documents, and irregular (or unstructured) openings created arbitrarily or modified during construction activities [31]. While regular openings are relatively manageable due to their predictable locations and sizes since they may be usually planned, irregular openings present a significant challenge because they may not be planned at the first stage of a construction project [21]. They often appear unexpectedly without documentation, resulting from temporary demolitions or alterations for construction convenience, making their shape, location, and timing unpredictable [9]. Such unrecognized and inadequately protected irregular openings act as latent hazards, significantly increasing the risk of worker falls [10].

Currently, many construction sites rely on traditional manual monitoring, where site managers or safety personnel conduct visual inspections by patrolling the site to identify hazards like openings [32,33,34]. While this method has low initial costs, it becomes increasingly inefficient and impractical as project scale and complexity grow [33,35]. Limited personnel struggle to cover vast areas thoroughly, and this type of process is prone to subjective judgment errors and omissions [33,36]. Furthermore, manual checks struggle to keep pace with the dynamic site environment and often fail to detect unpredictable hazards like irregular openings in time [36], thus revealing clear limitations for continuous and consistent safety monitoring [9,35].

To overcome these manual limitations, automated hazard identification using various technologies, including wearable sensor systems for detecting environmental barriers [37,38] and computer vision (CV) technologies applied to imagery from drones or CCTV has gained traction [39,40,41,42]. Deep learning-based object detection models (e.g., convolutional neural network (CNN), you only look once (YOLO), transformer) have shown potential by rapidly and accurately detecting equipment, workers, and structures, contributing to construction automation (e.g., through robotics like exoskeletons) and potentially enhancing safety management [43,44,45,46]. However, a significant challenge remains: these high-performance models typically require vast amounts of labeled training data to achieve reliable accuracy [47,48]. Construction sites are highly dynamic, with environments changing frequently across project phases and temporary structures appearing constantly [49,50]. Building comprehensive datasets that cover all variations, especially for highly irregular and unpredictable objects like unstructured openings, is practically difficult and resource-intensive [51,52]. This gap between available training data and real-world variability often leads to reduced recognition performance when models encounter novel situations [49,53].

To address this critical data scarcity problem and effectively cope with the dynamic nature of construction sites, this study proposes the adoption of few-shot learning (FSL) models [54]. FSL, based on a meta-learning approach, is designed to enable algorithms to learn and identify novel object classes using only a very small number of labeled examples (shots), leveraging knowledge learned from prior tasks [54,55,56]. This capability allows for the rapid recognition of various types of irregular openings that appear unexpectedly during construction, using minimal data, thereby offering a potential solution to adapt models to changing site conditions without the need for extensive labeling efforts [54,57,58,59]. Reflecting its potential, various attempts to apply FSL have recently emerged in the construction domain [54]. For instance, Kim et al. (2021) utilized a ProtoNet-based FSL model to recognize multiple types of safety facilities using a limited set of images [54], while Wang et al. (2024) successfully applied a relation network to classify worker safety equipment compliance with few samples, reporting significant reductions in labeling costs [55]. These studies highlight that FSL offers faster learning capabilities with significantly less data compared to conventional deep learning models, making it particularly effective for handling the highly variable and unpredictable objects commonly found in construction sites [54,55].

The suitability of FSL is particularly pronounced for identifying irregular openings [60]. These openings often lack standardized dimensions, and their location, size, and shape frequently change depending on the construction phase or method [55,61,62], making it practically infeasible to continuously supply large-scale labeled data required by conventional deep learning models [48,52,63]. In contrast, FSL models offer the potential to learn the defining features of such openings—including variations in contour, color contrast, and depth—from just a few sample images, enabling rapid recognition of newly emerging instances [55,61].

Furthermore, incorporating domain-specific attributes—such as outline irregularity, depth discontinuity, or edge material—into the FSL model can potentially enhance its performance, allowing it to adapt more flexibly and accurately to challenging shape variations [55,61,64]. Attribute information can help resolve ambiguities arising from learning with few samples and contribute to improved classification accuracy by providing richer contextual cues for identification [65,66,67]. As suggested by construction domain experts, utilizing such attributes in the learning process can help resolve ambiguities arising from learning with few samples and contribute to improved classification accuracy by reducing both false positives and missed detections [55,68].

Therefore, this study addresses the critical challenge of identifying unpredictable irregular openings in data-scarce construction environments by validating the effectiveness of an FSL approach, potentially enhanced with domain-specific attributes. We experimentally demonstrate that FSL models can achieve high identification performance (accuracy and recall) using significantly limited data (e.g., five shots per class), thereby overcoming the data dependency limitations inherent in conventional computer vision methods. This research establishes the viability of leveraging FSL for robustly monitoring these high-risk hazards with reduced labeling efforts, offering a crucial step towards more adaptive, data-efficient, and ultimately safer construction site management.

2. Methodology

2.1. Framework for FSL in Construction

Before establishing a framework for FSL to identify irregular openings that may be fatal to construction workers, it would be necessary to define regular and irregular openings based on their visual characteristics. As mentioned in the Introduction regarding their planning origins, their physical forms are also distinct. In this study, a regular opening is defined as an opening, typically rectangular or square in shape with relatively clean edges, formed according to design specifications, regardless of whether it is appropriately covered or uncovered. Conversely, an irregular (or unstructured) opening is characterized by non-standard, unpredictable shapes (e.g., non-rectangular, circular, or having jagged/damaged edges) or even standard openings that are improperly covered or obstructed, posing an immediate hazard. Figure 1 presents the examples of regular and irregular openings.

Based on the definition for two types of openings, this paper proposes an FSL-based framework designed to identify irregular openings on construction sites using a limited amount of labeled data. As illustrated in Figure 2, the proposed approach transfers the features of a pretrained model and applies an FSL algorithm to rapidly classify construction site images. Initially, a pretrained model is developed by training on publicly available image datasets such as UC Merced (TensorFlow Datasets), using transformer-based architectures to establish base weights that capture visual features. Approximately 3000 images containing construction site openings are collected via web crawling. These collected images, along with the pretrained feature extractor, are then integrated into the few-shot algorithm. The resulting system is structured to detect irregular openings through model-agnostic meta-learning (MAML)-based meta-learning, where (1) support and query images are used to rapidly adapt model parameters, (2) classification results for query images are produced, and (3) ultimately, irregular openings in construction environments can be identified with only a small dataset. Additionally, attribute training is employed to incorporate domain-specific information into the prediction phase, thereby reducing false detections and improving overall accuracy. Each module of the framework is described in detail in the following sections.

2.2. Dataset Development

The image dataset used in this study is divided into two categories: one for pretraining and the other for FSL-based meta-learning. For pretraining, the UC Merced land-use dataset and other publicly available TensorFlow datasets—comprising thousands of images across approximately 21 classes—were utilized to train the base weights of the model. These datasets include features from buildings, roads, parking lots, and other objects, enabling the model to acquire generalized visual characteristics such as edges, patterns, and color distributions, which are also relevant in construction environments.

Subsequently, to obtain images containing irregular openings in real construction sites, approximately 3000 images were collected via web crawling. These collected images were then processed and partitioned into support and query sets for the FSL experiments. For example, in a three-way, five-shot setting, the dataset is structured into three classes—including irregular openings, regular openings, and covered openings—with five support images per class and corresponding query images used for evaluation. In addition, domain-specific attributes such as contour, edge material, and depth information—defined during attribute training—were annotated in the support set and used to enhance the accuracy of query prediction in the FSL algorithm.

Finally, data augmentation techniques such as rotation, flipping, and brightness adjustment were applied to mitigate the risk of overfitting due to limited data, while also simulating a variety of site conditions (e.g., lighting variations, changes in camera angle). Figure 3 provides a visual overview of the entire dataset development process, including data collection, support/query set examples, and data augmentation. The final dataset, comprising both extracted features from the pretrained model and meta-learning-ready inputs, was used for few-shot object detection of irregular openings in construction environments.

2.3. FSL Model

2.3.1. Model Architecture

The base CNN structure used in this study was designed to classify approximately 3000 construction site opening images efficiently by transferring the pretrained weights obtained from publicly available image datasets such as UC Merced. During the pretraining phase, the model was trained for 20 epochs on 21 classes in the UC Merced dataset, achieving a multi-class classification accuracy of over 85.2%. This pretraining is crucial as the subsequent FSL phase benefits significantly from robust, generalized features learned on a larger, diverse dataset, enabling effective transfer learning. The convolutional layer weights obtained from this phase are utilized as initial parameters for this study.

The actual implementation is conducted through the build model function in code, which accepts input images of size 128 (pixels) × 128 (pixels) × 3 (RGB Channels). The network architecture consists of four sequential convolutional blocks followed by a dense layer. This multi-block structure is designed to progressively extract hierarchical features from the input images, starting from simple edges and textures to more complex patterns, providing a rich feature representation suitable for both leveraging pretrained knowledge via transfer learning and enabling effective meta-learning in the FSL phase. The model passes the input through these four convolutional blocks, followed by a dense layer to produce the final output. Each convolutional block applies ReLU activation and batch normalization to improve training stability, with max-pooling layers gradually reducing spatial resolution. The number of filters increases from 16 to 32, 64, and 128 across the blocks. After flattening (i.e., converting the 2-dimensional feature map into a 1-dimensional vector), a dense layer of size 128 is added, with L2 regularization and a dropout rate of 0.5 to prevent overfitting. The final output layer employs a SoftMax activation to classify the images into three categories defined in this study.

During the transfer learning stage, early convolutional blocks from the pretrained CNN are frozen, while the upper layers (e.g., the third and fourth convolutional blocks and the dense layers) are fine-tuned. This allows for efficient reuse of learned visual features—such as edges, patterns, and colors—while adapting the model to the specific characteristics of construction site opening images. The learning rate (e.g., 1 × 10⁻⁴) and regularization parameters were tuned based on the results of the UC Merced pretraining phase. Through this feature extraction process, stable training can be achieved even with a relatively small dataset (~3000 images). This CNN model, after transfer learning, serves as the foundation for integration with the FSL algorithm (MAML), enabling the model to quickly adapt to new types of openings using only a few samples in the support set.

2.3.2. Meta Learning: MAML Algorithm

To enable the identification of irregular openings with limited labeled data, this study adopts the MAML algorithm [69]. MAML employs a two-phase optimization process—an inner loop and an outer loop—to learn initial parameters that allow the model to quickly adapt to new tasks using only a small number of samples. Specifically, the inner loop updates temporary parameters using the support set, while the outer loop refines the original parameters based on the meta-loss computed from the query set. The overall meta-training process, including the inner loop adaptation and the outer loop update incorporating detailed backpropagation steps, is illustrated in Figure 4. The process for a given task

T_{i}

, using support data (

D_{t r a i n}^{(i)}

), yields a temporary parameter

θ_{i}^{'}

through inner loop adaptation:

\begin{matrix} θ_{i}^{'} = θ - α \nabla_{θ} L (f_{θ}, D_{t r a i n}^{(i)}) \end{matrix}

(1)

Here,

α

denotes the inner loop learning rate,

L

represents the classification loss (e.g., cross-entropy), and θ indicates the original parameters. The updated

θ_{i}^{'}

is then used to calculate the loss on the query data

D_{v a l}^{(i)}

, and the average loss across multiple tasks constitutes the meta-loss for the outer loop update.

\begin{matrix} θ \leftarrow θ - β \sum_{i} \nabla_{θ} L (f_{θ_{i}^{'}}, D_{v a l}^{(i)}) \end{matrix}

(2)

β

denotes the meta learning rate, which refines the global parameters

θ

at the meta level. As a result, the model learns initial parameters that enable rapid adaptation to new support sets with minimal gradient steps. In this study, we implement both the inner and outer loops to test whether the model can quickly adapt to the irregular opening class using only 5-shot samples.

Because MAML is architecture-agnostic, it can be directly applied to the CNN model developed in this study. However, due to increased computational complexity from operations such as second-order gradients, the learning rates for the inner and outer loops (

10^{- 4}

and

10^{- 5}

, respectively) were carefully tuned, and the number of epochs (50) was adjusted for training stability. This MAML-based approach offers strong adaptability to the variability of construction sites and the irregularity of target objects, enabling rapid recognition of new opening types from a small support set.

2.3.3. Attribute-Based Enhancement

Although the proposed FSL model can classify irregular openings with limited data, the performance can be further enhanced by incorporating additional attribute information, given the diverse conditions found in construction sites. In this context, “attributes” refer to key visual features that distinguish openings. This study defines representative attributes such as outline irregularity, depth difference from the slab, edge material, and shape distortion.

These attributes reflect intuitive visual cues typically used by construction experts when inspecting openings. By evaluating the similarity of these attributes between support and query images, the model can reduce false positives and false negatives. Each support image is annotated with labels indicating its attribute characteristics. Each attribute is encoded numerically, for instance, as an integer or a value within a defined range, reflecting its specific state. The query images can either be pre-labeled with estimated attributes or compared to support attributes during meta-learning to influence the final classification.

In the proposed FSL model, the final classification of query images is determined by combining CNN-based similarity (learned via inner and outer loops) with attribute-matching scores. Specifically, let

x_{q}

and

x_{s}

denote the query and support images, respectively. The similarity computed by the FSL model is

s i m_{F S L} (x_{q}, x_{s})

, and the similarity between their attribute vectors is

s i m_{a t t r} (A_{q}, A_{s})

. The final score is defined as:

S c o r e (x_{q}, x_{s}) = α \times s i m_{F S L} (x_{q}, x_{s}) + β \times s i m_{a t t r} (A_{q}, A_{s})

(3)

Here,

α

and β are hyperparameters that control the relative importance of the FSL model’s feature similarity (

s i m_{F S L}

) and the attribute similarity (

s i m_{a t t r}

) to the final score. To determine optimal values for these hyperparameters, a full grid search was conducted for

α

ranging from 0.0 to 1.0 in increments of 0.1, with

β

constrained by

α

+

β

= 1.0. The model’s accuracy under these varying hyperparameter combinations is presented in Table 1 and visualized in Figure 5. This sensitivity analysis reveals that model performance stabilized when α was in the range of 0.7 to 0.9 (consequently,

β

in the range of 0.3 to 0.1), with

α

= 0.8 (and

β

= 0.2) demonstrating the most consistent and robust classification performance across validation tasks. Therefore, based on this comprehensive grid search and comparative evaluation of the results, the combination of

α

= 0.8 and

β

= 0.2 was selected as it yielded the most robust and reliable classification performance with minimal signs of overfitting in our experiments. For instance, using these selected values, setting

α

= 0.8 and

β

= 0.2 allows the model to primarily rely on learned feature similarity while still incorporating attribute consistency to refine predictions. If a query image differs from the support images in color or lighting but matches in key attributes like outline irregularity or depth, a high attribute similarity score (weighted by

β

) can raise the overall score, increasing the likelihood of being correctly classified as an irregular opening.

This attribute-based score supplements the FSL model’s final classification stage by refining support–query matching. Each support image

x_{s}

is paired with a query image

x_{q}

, and the score is computed using Equation (3). The class of the support image with the highest score is assigned as the predicted class for the query. During inner loop training, parameter updates still focus on CNN similarity, but attribute matching logic is incorporated during post-processing or with lightweight auxiliary parameters. This enables accurate identification of irregular openings with only a few samples, especially in cases where outline or depth attributes help distinguish otherwise ambiguous cases.

Ultimately, the attributes defined in this study succinctly capture the domain characteristics of construction site openings, contributing to consistent prediction performance even under a 5-shot setting. However, attribute definitions and value ranges may vary across actual construction sites, and expert involvement is required during the annotation process—factors that must be considered for real-world implementation.

2.4. Experimental Setup

2.4.1. Data Preparation and Splitting

As described in Section 2.2, the image data used in this study were collected and curated through a multi-step development process. This section outlines how these datasets were prepared and split for final experimentation.

First, the pretraining images (from the UC Merced dataset, consisting of 21 classes and several thousand images) were used solely to build the pretrained model and were excluded from the actual FSL experiments. The CNN was trained for approximately 20 epochs on multiclass classification tasks, and the resulting weights were transferred as the initial parameters for the FSL model based on MAML. Subsequently, approximately 3000 construction site images containing openings—collected as described in Section 2.2—were used for meta-learning in the FSL stage. These images were categorized into a 3-way classification task with folders representing “irregular openings”, “regular openings”, and “others”. All images were reorganized into support and query sets. For the support set, five images per class (3 × 5 = 15 total) were selected to maintain an N = 5 shots structure. The remaining images were allocated to the query and test (validation) sets. To utilize attribute information, labels corresponding to the predefined attributes (as described in Section 2.3.3) were recorded for each image in the support set. Query images were either left untagged or handled via a separate process to infer attribute similarity during evaluation.

Additionally, the dataset was split into training, validation, and test sets at an approximate ratio of 80:10:10 to facilitate independent meta-training, model tuning, and final evaluation. To simulate diverse field conditions such as lighting variation and process changes, data augmentation techniques (rotation, flipping, brightness adjustment) mentioned in Section 2.2 were selectively applied. These were constrained within defined bounds (e.g., ±15° rotation, ±20% brightness adjustment) to preserve the integrity of the 5-shot structure. Original images in the support set were kept intact, while augmentations were applied to query or auxiliary training images to mitigate overfitting.

Ultimately, the prepared dataset was used such that the inner loop utilized the support images for rapid adaptation, and the outer loop used the query (or validation) images to optimize the MAML-based meta-learning algorithm for detecting new opening types with only five samples per class.

2.4.2. Implementation and Hyperparameters

The initial development and primary experiments were implemented in Python 3.8.19 using the TensorFlow and Keras libraries and executed on a high-performance desktop equipped with an NVIDIA RTX 4080 GPU. To optimize GPU memory usage during execution, TensorFlow’s memory growth option (tf.config.experimental.set_memory_growth) was enabled. On average, each epoch—including data preprocessing and augmentation—required approximately 0.04 s, with the full training process (50 epochs) taking about 3 s.

To further assess the model’s training time characteristics across different hardware environments potentially available for on-site or near-site implementation, supplementary tests were conducted on three additional GPUs: an NVIDIA RTX 4050 Laptop GPU (representing modern mobile high-performance computing), an NVIDIA GeForce GTX 1660 SUPER (representing common mid-range desktop capabilities), and an NVIDIA GeForce MX150 (representing older or lower-specification mobile GPUs). Additionally, training time was evaluated on a desktop CPU, an Intel i5-10400F, to assess performance in non-GPU accelerated environments. The average computation time per epoch and the total training time (for 50 epochs) recorded were 0.114 s and 8.55 s for the RTX 4050 Laptop; 0.174 s and 13.05 s for the GTX 1660 SUPER; 1.38 s and 103.5 s for the MX150; and 2.67 s and 193.58 s for the Intel i5-10400F CPU. These figures indicate the variation in training duration based on GPU processing power.

The inner and outer learning rates were deliberately set to small values due to the dual gradient update mechanism inherent in MAML. Even slight increases in learning rates could cause divergence or instability in the training curve. Although hyperparameters such as α, β, weight decay, and others could be further fine-tuned, the selected values led to stable convergence in this experiment.

The training procedure alternated periodically between the meta-training and meta-validation phases. During the inner loop, the model rapidly updated its parameters using 5-shot support images. In the outer loop, it adjusted the original parameters based on meta-loss from query images, thus learning initial weights that adapt well to small datasets.

To ensure the stability of the learning process and mitigate potential overfitting, particularly given the FSL setting with limited task-specific data, several regularization techniques were employed. These included L2 regularization and dropout (with a rate of 0.5) within the model architecture (as described in Section 2.3.1), as well as data augmentation (detailed in Section 2.2) applied to the training batches. Figure 6 illustrates the training and validation loss and accuracy curves. Figure 6a depicts a representative learning curve where overfitting might occur without sufficient regularization, characterized by a divergence between high training accuracy and stagnating or increasing validation loss. In contrast, Figure 6b shows the learning curves achieved with the implemented regularization strategies, demonstrating stable convergence where training and validation accuracies track each other closely, and the validation loss remains low and stable. This indicates that potential overfitting was effectively controlled, ensuring the model generalized well from the limited training samples.

For evaluation, the study used accuracy, recall, precision, and F1 score as performance metrics. Particular attention was given to recall, since missing irregular openings could pose serious safety risks on construction sites. A detailed performance analysis based on the dataset structure and hyperparameters is provided in the Results section.

3. Results

3.1. Performance of the Proposed FSL Model

This study evaluated whether the proposed FSL model, augmented with attribute information, can achieve high classification performance with limited data when identifying irregular openings. The experiments varied the number of classes (k) between two-way and three-way scenarios, and the number of shots (n) per class was set to 1, 2, 3, 5, and 10. The model’s performance was assessed using accuracy, recall, precision, and F1 score. Through the meta-learning structure (inner/outer loop), the model was tested under conditions ranging from extremely limited data (1-shot) to relatively sufficient samples (10-shot), to examine whether stable classification can be achieved even under restricted labeling conditions, such as those in real construction sites. The detailed results are summarized in Table 2. Overall, the results indicate that as the number of classes (k) increases, the classification task becomes more complex, and the learning process becomes less stable when the number of shots is very small (especially under one-shot). Conversely, as the number of shots increases, key performance metrics such as accuracy and recall generally exceed 90%, indicating that the proposed FSL approach maintains reliable performance even in low-data settings.

Before delving into the detailed analysis of the two-way and three-way results presented in Table 2, it is pertinent to discuss the exclusion of the k = 1 scenario. The k = 1 scenario was initially included in the experimental plan. However, it was later excluded due to limitations in practical applicability and interpretability. Effective identification of previously unseen irregular openings requires comparison with other classes that do not contain such openings. In a k = 1 setting, only one class is considered, which allows the model to simply classify all inputs into that class, yielding trivial metrics such as recall or accuracy close to 1.0 or 0.0. This scenario does not reflect the multiclass conditions encountered in actual construction sites and is not meaningful for evaluating classification performance in low-data environments.

In the two-way experiments, the model achieved a relatively high accuracy of approximately 0.8680 under the one-shot setting. However, the recall was 0.7525 and the precision dropped to 0.6624, indicating noticeable false positives and false negatives. This result reflects a typical limitation of the support set containing only one image per class: the inner and outer loops must update parameters based on extremely limited visual information in each episode. As a result, the model heavily adapts to the current episode’s features and must undergo significant parameter changes in the next episode when image characteristics differ. While the accuracy remains moderately high in the upper 80% range, precision and recall fluctuate considerably. As the number of shots increases, the availability of at least two images per class enables the inner loop to learn more diverse features and allows the outer loop to perform meta-updates that help prevent overfitting. In two-shot and three-shot settings, the accuracy rose significantly to 0.9086 and 0.9343, respectively. Correspondingly, recall improved from 0.7998 to 0.8747, and precision increased from 0.7327 to 0.8084. These results indicate that the model’s instability is substantially mitigated as more support samples are provided. At the five-shot level, the model achieved an accuracy of 0.9495 and a recall of 0.9222, demonstrating that it can maintain a mid-90% recognition rate even with limited labeled data. This suggests that the model meets practical performance standards at the five-shot level. While increasing the number of shots to 10 further improved performance slightly (accuracy: 95.96%, recall: 0.9278, F1 score: 0.8888), the results also indicate that 5-shot configurations already provide sufficiently high classification performance. Therefore, achieving meaningful identification of irregular openings does not require collecting double-digit numbers of support images.

In the three-way scenario, the increased number of classes resulted in lower overall performance across all metrics (accuracy, recall, precision, and F1 score) compared to the two-way results. Nonetheless, a consistent improvement was observed as the number of shots increased. At the one-shot setting, the accuracy was approximately 0.8255, and recall reached 0.6827—both somewhat lower than the corresponding results in the two-way scenario. Precision was limited to 0.6494, indicating a high frequency of false positives. As observed in the two-way experiments, this instability likely stems from the support set containing only one image per class, which forces the inner and outer loops to rely heavily on limited visual cues and causes the model to fluctuate significantly across episodes. In contrast, as the number of shots increased to two and three, accuracy steadily improved to 0.844 and 0.8761, respectively. Recall also rose to 0.7394 and 0.7999, while both precision and F1 score showed similar upward trends, demonstrating a more stable and accurate classification performance. At the five-shot setting, the model achieved an accuracy of 0.9054 and a recall of 0.8558, indicating that even in a relatively complex three-class classification task, a small number of labels (five per class) can yield sufficiently high recognition performance. When the number of shots was increased to 10, accuracy improved to 0.9134 and recall to 0.8832. Although this indicates a clear performance gain, the fact that recall already exceeded 85% at the 5-shot level suggests that it is not strictly necessary to secure more than 10 support images per class for practical on-site applications. In conclusion, the three-way classification scenario also exhibited a clear tendency toward unstable performance when the number of shots was small. However, with just five support images per class, the model was able to achieve recognition accuracy close to practical standards. These results demonstrate that the proposed FSL approach, enhanced with attribute information, provides a meaningful solution even in multi-class classification settings.

To further validate the reliability of the model’s performance, using the three-way five-shot setting as a representative case, we conducted additional cross-validation with five different random seeds (seed 1 (42), 2 (106), 3 (2024), 4 (77), and 5 (9)). Each run involved 120 episodic tasks, totaling 600 episodes across all seeds. As shown in Table 3, the model consistently demonstrated high classification performance, achieving an average accuracy of 0.9049 ± 0.0050, precision of 0.7991 ± 0.0055, recall of 0.8542 ± 0.0086, and F1 score of 0.8226 ± 0.0074. All metrics showed standard deviations below 1%, indicating minimal variation across different sampling conditions. These findings confirm that the proposed FSL model, enhanced with attribute information, demonstrates not only strong performance in low-data settings but also consistent and robust behavior across varying sampling conditions.

3.2. Comparison: FSL vs. Conventional Supervised Approach

This section compares the performance of the proposed FSL model with conventional CNN models when both are constrained to identical, extremely limited-label conditions. The primary aim of this comparison is to directly assess the practical advantage of the FSL paradigm in scenarios where acquiring large datasets is infeasible, by contrasting its performance against established deep learning architectures fine-tuned with minimal data without specific few-shot optimization techniques. Specifically, we evaluated a fine-tuning approach using pretrained CNNs with a small number of labeled images and compared it against the FSL method. The conventional models selected for comparison were ResNet-50 and EfficientNetB0. ResNet-50 utilizes residual learning to alleviate the vanishing gradient problem in deep networks and is known for stable training with relatively low computational cost. EfficientNetB0 is a lightweight architecture that employs compound scaling to balance depth, width, and resolution, offering high performance relative to the number of parameters. The choice of these particular baselines, known for their strong performance with ample data, was deliberate: to highlight the significant performance degradation they face when adapted to an extremely few-shot setting (15 images) via simple fine-tuning, thereby underscoring the distinct benefits of our MAML-based FSL approach, which is inherently designed for such data scarcity. Both models were pretrained on large-scale datasets such as ImageNet, and in this experiment, were fine-tuned using only 15 labeled images to simulate an extremely limited-label scenario. These 15 images matched the 3-way, 5-shot support set used for the FSL model in Section 3.1, with 5 images per class. This allowed us to ensure that both the supervised and FSL models were trained using the same amount of labeled data. Table 4 presents the comparison results, including accuracy, recall, precision, and F1 score. Given the focus on safety in construction sites, we placed particular emphasis on recall, which represents the model’s ability to identify potentially dangerous openings.

According to Table 4, the traditional supervised model ResNet-50, when trained with only 15 images, achieved an accuracy of 0.4933, recall of 0.4867, precision of 0.5837, and F1 score of 0.4606—all relatively low values. EfficientNetB0 performed slightly better, with an accuracy of 0.6905, recall of 0.6771, precision of 0.6902, and F1 score of 0.6629. However, due to insufficient labeled data, both models exhibited signs of overfitting and unstable generalization. In contrast, the FSL model—trained with the same number of labels (5 per class, total 15)—and using a meta-learning structure, achieved significantly better performance: accuracy of 0.9054, recall of 0.8558, precision of 0.8018, and F1 score of 0.8204. These results suggest that rather than simply fine-tuning a pretrained CNN backbone with limited data, the FSL model can adapt its parameters more effectively to the task of identifying irregular openings via inner and outer loops, even under five-shot conditions. Given that the primary concern in construction site safety monitoring is to avoid missing hazardous openings, the 20% higher recall (0.8558) compared to supervised models further highlights the practical applicability of the FSL approach.

In conclusion, while transfer learning-based CNN models tend to suffer from overfitting and reduced performance under limited-label conditions (15 images), the FSL model was able to maintain performance levels that meet on-site accuracy and recall requirements using the same number of labels. This comparison highlights the potential for rapid detection and classification in real construction monitoring scenarios with minimal labeling effort. The substantially improved recall indicates the practical effectiveness of FSL in reducing safety risks by reliably identifying irregular openings.

3.3. Comparison Between the Base MAML and Advanced MAML Algorithms

This study employed a base meta-learning approach, MAML, to enable the recognition of irregular openings with a limited number of labeled samples. As demonstrated in the previous results (Section 3.1 and Section 3.2), the MAML framework maintained stable classification performance even under few-shot conditions (e.g., using one to five shots per class), and the integration of attribute information further improved key metrics such as recall.

As shown in Section 3.1 and Section 3.2, the MAML-based meta-learning approach demonstrated consistent classification performance across varying support set conditions, even with only a small number of labeled samples. MAML iteratively trains on limited data (shots = five) through its inner and outer loop structure and rapidly adapts to new classes such as irregular openings. This adaptability is particularly advantageous in construction sites, where irregular shapes frequently appear and data labeling is often challenging.

Several variants of MAML have recently been proposed to enhance and extend its core principles. Representative examples include Reptile [69,70], Meta-SGD [71,72], and CAVIA [73]. Reptile is a first-order meta-learning method that eliminates second-order derivatives, reducing computational cost and GPU memory usage while maintaining performance comparable to MAML [69,70]. Meta-SGD includes the learning rate used in the inner loop as a meta-learnable parameter, allowing the model to automatically adjust the update magnitude for each parameter, even under extremely low-shot conditions [71]. CAVIA selectively updates only context parameters while freezing other layers, improving training stability and reducing computational overhead [73]. All these variants retain the inner/outer loop structure of MAML but are designed to simplify specific components (e.g., learning rate, update range, second-order gradients) or to incorporate additional learning objectives. To illustrate the potential impact of such advanced variants, Table 5 presents a performance comparison between the base MAML used in this study and CAVIA, under the primary experimental conditions (three-way, five-shot).

In addition to the main experiments conducted using the base MAML algorithm, supplementary comparisons were performed using Reptile, Meta-SGD, and CAVIA under the identical three-way, five-shot setting. Each of these advanced models demonstrated performance improvements of approximately 2–3 percentage points over the base MAML in key metrics such as accuracy, recall, and F1 score. These gains suggest that such variants enable faster and more precise meta-updates, particularly when adapting to new tasks with severely limited labeled data.

4. Discussion

4.1. Potential Applications of FSL for Hazard Identification

The findings of this study demonstrate that irregular openings can be reliably identified using only a small number of labeled samples, which significantly enhances the potential for direct application to construction site monitoring. In particular, the k-way, n-shot experiments presented in Section 3.1 revealed that even with only five shots per class, the proposed FSL model consistently achieved over 90% in both recognition accuracy and recall. This highlights the model’s capability for rapid decision-making in environments where labeling is difficult, such as construction sites. Furthermore, the superior performance of the FSL approach compared to CNN models, as shown in Section 3.2, suggests that FSL can effectively compensate for practical limitations in obtaining large labeled datasets—especially in cases where temporary structures or novel process-specific objects frequently emerge on-site.

From an on-site perspective, the most significant advantage of the FSL approach is its ability to quickly enhance classification performance even when previously unseen object types—such as irregular openings—suddenly appear. Although images can be collected using various sensors such as unmanned ground vehicles (UGVs), fixed CCTV, or drones, it is rarely feasible to acquire sufficient labels for each scenario. The FSL model addresses this by repeatedly updating its inner and outer loops with limited shots, allowing for rapid generalization. This generalization capability can be further strengthened by training on image samples collected under diverse environmental conditions, such as variable lighting, extreme viewing angles, or partial occlusions. This adaptability aligns with the needs of smart construction environments, where process stages frequently change and models must continuously adapt. Unlike conventional supervised learning, which requires large standardized datasets, FSL provides a differentiated advantage. Consequently, the proposed FSL model holds strong potential to solve urgent safety monitoring tasks in construction with limited labeled data and presents a viable solution for the practical challenge of identifying irregular openings under real-world constraints.

Building upon the demonstrated effectiveness of the FSL approach presented, while the attribute concept used in this study, though primarily designed for irregular openings, future research could extend this framework by systematically defining attribute information for a broader range of construction elements, such as temporary structures, personal protective equipment (PPE), or falling object prevention systems. Expanding the attribute framework could enhance the efficiency of learning under data-scarce conditions and enable faster recognition of new object classes—for example, by incorporating material texture, surface type, or object-level environmental cues—allowing the system to respond more effectively to the constantly changing environment of construction sites.

Ultimately, integrating the FSL model with continuously generated image data from field infrastructure such as UGVs, CCTV, and drones could establish a generalized framework for monitoring hazards with minimal labeling, even in highly dynamic construction environments.

4.2. Contributions

This study makes several key contributions to the fields of construction informatics and computer vision, particularly concerning data-efficient safety monitoring. Academically, it validates the effectiveness of applying FSL, specifically an attribute-enhanced MAML approach, to the novel and challenging task of unstructured opening detection—a critical gap in automated construction safety. We provide strong empirical evidence that FSL significantly surpasses conventional supervised fine-tuning techniques in extreme low-data scenarios common to construction, particularly excelling in recall performance vital for minimizing hazard non-detection. Furthermore, this research explores the beneficial integration of domain-specific attributes with FSL, offering insights into enhancing model performance and interpretability with limited data.

From an industrial perspective, this research demonstrates the practical potential of FSL as a data-efficient solution for automating the monitoring of critical fall hazards like unstructured openings. By proving high performance with minimal data (e.g., five shots), it offers a viable pathway for construction companies, especially those with limited data resources, to adopt AI-based safety systems. This approach can significantly reduce the time, cost, and effort associated with large-scale data collection and annotation, potentially leading to more widespread implementation of automated monitoring technologies and ultimately contributing to improved worker safety on construction sites.

4.3. Limitations and Future Research

This study focused on irregular openings as the primary target, but real-world construction sites also contain various types of temporary structures, equipment, and installations. Future work should aim to expand data diversity and evaluate the FSL model on a wider range of site objects to validate its performance in more generalized meta-learning environments.

This study primarily explored low-data scenarios (k = 1–3, n = 1–10), future research should also investigate how to maintain training stability when large-scale field data becomes available. This may involve strategies such as adjusting hyperparameters based on shot size or developing algorithmic enhancements that support efficient adaptation in response to continuously evolving field data streams.

Furthermore, preliminary comparisons with some advanced MAML approaches such as Reptile [69,70] and CAVIA [73] showed modest improvements in some metrics. This suggests that further integration of such optimization-based meta-learning models may help reduce computational costs in the meta-learning process while maintaining robust performance in extreme data-scarce or highly dynamic field conditions. Expanding the scope of meta-learning by incorporating various MAML-centric enhancements, future research could also benefit from exploring a wider array of scalable FSL alternatives to benchmark and potentially improve upon the current approach. For instance, metric-learning based methods like task-dependent adaptive metric (TADAM) [74], which learns task-specific distance metrics, or approaches that focus on adapting embedding spaces such as few-shot embedding adaptation with transformer (FEAT) [75], could offer complementary perspectives on feature representation and comparison. Similarly, methods like meta-learning with differentiable closed-form solvers (R2D2) [76], which leverage differentiable solvers within meta-learning, present another distinct family of FSL techniques. Systematically evaluating our attribute-enhanced framework against these diverse and scalable FSL alternatives could lead to more accurate object recognition and even safer operations in construction environments where labeling resources are limited. Additionally, to further contextualize the performance of the proposed attribute-enhanced FSL approach and enhance its robustness assessment, future investigations will aim to include comparative analyses against a broader spectrum of supervised baselines compatible with few-shot learning paradigms. This will involve incorporating methods such as prototypical networks or transfer-learning strategies specifically optimized with few-shot classifiers. While the current study (Section 3.2) focused on contrasting our FSL model with standard deep learning architectures (i.e., ResNet-50, EfficientNetB0) under conditions of extreme data scarcity to primarily highlight FSL’s fundamental data efficiency over conventional paradigms, more extensive comparative experiments involving diverse FSL-oriented methodologies are anticipated to offer deeper insights into the relative strengths and application-specific advantages. Such evaluations will be pursued through more elaborate and robust experimental designs in subsequent studies.

In addition to exploring different learning paradigms, future work could also investigate the integration of other efficient and robust vision-based models to further enhance the identification of irregular openings. For example, advanced semantic segmentation models like DeepLab [77], which are known for precise pixel-level classification, could potentially offer more detailed boundary delineation and shape understanding of openings. Concurrently, leveraging highly efficient and scalable architectures such as EfficientNet [78] as alternative or supplementary backbones within the FSL framework could be explored to improve computational performance, particularly for on-site or resource-constrained deployment scenarios, without significantly compromising detection accuracy. Investigating these avenues may lead to even more versatile and practical solutions for automated hazard monitoring in construction.

While this study demonstrated varying training times across different GPU and CPU configurations (as detailed in Section 2.4.2), indicating increased computational demand on less powerful hardware, the inherent data efficiency of the FSL approach is noteworthy. Future research should focus on optimizing the current model or developing even more lightweight FSL architectures specifically tailored for edge devices and mobile units. Given the rapid advancements in model compression and specialized hardware for AI on the edge, we believe that FSL-based hazard detection, such as the one proposed, holds significant potential for practical on-site deployment on commonly available, resource-constrained devices, enabling real-time safety interventions.

Multi-modal approaches may also be explored in subsequent studies. By integrating visual image features with geometric or sensor-based data (e.g., LiDAR), it may be possible to improve the detection of depth discontinuities and complex edge structures around openings, thus enabling more robust recognition under challenging site conditions.

5. Conclusions

This study addressed the significant challenge of identifying unstructured openings on dynamic construction sites, a critical task for preventing fall accidents often hindered by the data scarcity inherent in such environments. Conventional monitoring methods, including manual inspection and traditional deep learning approaches, face limitations in effectively detecting these unpredictable hazards due to difficulties in acquiring comprehensive labeled datasets. The primary objective was to validate the feasibility and effectiveness of FSL for reliably identifying these hazardous openings using minimal labeled data.

To overcome the data dependency issues, this research proposed and evaluated an FSL framework based on MAML. The approach utilized transfer learning from a pretrained CNN backbone and integrated domain-specific attribute information (e.g., outline irregularity, depth) to further enhance the classification accuracy and robustness with limited samples.

The experimental results demonstrate the strong performance of the proposed FSL model. Even under challenging few-shot conditions (three-way, five-shot), the model achieved high classification accuracy (over 90.5%) and recall (over 85.5%). Crucially, the FSL approach significantly outperformed conventional supervised learning models (ResNet-50, EfficientNetB0) that were fine-tuned with the same extremely limited dataset (15 images), particularly showing a substantial improvement in recall (approx. 20% points higher), which is vital for safety applications. Preliminary comparisons also indicated that advanced MAML variants like Meta-SGD could offer further marginal performance gains (approx. 2–3% points).

This research validates the viability of employing FSL, augmented with attributes, for the challenging task of irregular opening detection in construction, thereby addressing a critical gap where traditional methods falter due to data limitations. It demonstrates a pathway toward developing more adaptive and data-efficient automated monitoring systems capable of handling the dynamic and unpredictable nature of construction sites. This contributes to reducing the significant burden associated with data collection and annotation for AI model training in this domain.

Future work should focus on expanding the dataset diversity to include other types of construction hazards and evaluating the model’s scalability with larger data inflows. Further investigation into advanced MAML variants and sophisticated attribute integration methods could potentially yield additional performance improvements and computational efficiencies. Refining the system for real-time application using field-deployable sensors (e.g., UGV, CCTV) remains a key direction for practical implementation.

In conclusion, this study successfully demonstrated that an FSL approach leveraging MAML and attribute information can effectively and efficiently identify unstructured openings on construction sites using minimal data. By overcoming the limitations of data scarcity, this research provides a significant contribution towards developing more robust, adaptive, and practical AI-powered safety monitoring solutions for the construction industry, ultimately aiming to enhance worker safety and prevent fall-related accidents.

Author Contributions

Conceptualization, M.S. and H.K.; methodology, M.S. and H.K.; validation, M.S. and H.K.; investigation, M.S.; writing—original draft preparation, M.S.; writing—review and editing, M.S. and H.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by a grant (RS-2022-00143493) from a Digital-Based Building Construction and Safety Supervision Technology Research Program, funded by the Ministry of Land, Infrastructure and Transport of the Korean Government.

Data Availability Statement

All data, models, or code generated or used during the study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Son, S.; Na, Y.; Han, B. Assessment of Risk Priorities by Cause of Construction Safety Accidents: A Case Study of Falling Accidents in South Korea. Heliyon 2024, 10, e40303. [Google Scholar] [CrossRef] [PubMed]
Yoon, Y.-G.; Ahn, C.R.; Yum, S.-G.; Oh, T.K. Establishment of Safety Management Measures for Major Construction Workers through the Association Rule Mining Analysis of the Data on Construction Accidents in Korea. Buildings 2024, 14, 998. [Google Scholar] [CrossRef]
Hwang, J.-M.; Won, J.-H.; Jeong, H.-J.; Shin, S.-H. Identifying Critical Factors and Trends Leading to Fatal Accidents in Small-Scale Construction Sites in Korea. Buildings 2023, 13, 2472. [Google Scholar] [CrossRef]
Jo, D.; Kim, H. The Influence of Fatigue, Recovery, and Environmental Factors on the Body Stability of Construction Workers. Sensors 2024, 24, 3469. [Google Scholar] [CrossRef] [PubMed]
Construction Site Fall Accidents to Be Gradually Reduced by 10% Each Year—Pressreleases—Report|Korea Occupational Safety and Health Agency (KOSHA). Available online: https://kosha.or.kr/kosha/report/pressreleases.do?mode=view&articleNo=454431&article.offset=0&articleLimit=10&srSearchVal=%EA%B1%B4%EC%84%A4&srSearchKey=article_title (accessed on 1 May 2025).
Dewlaney, K.S.; Hallowell, M. Prevention through Design and Construction Safety Management Strategies for High Performance Sustainable Building Construction. Constr. Manag. Econ. 2012, 30, 165–177. [Google Scholar] [CrossRef]
Lee, B.; Hwang, S.; Kim, H. The Feasibility of Information-Entropy-Based Behavioral Analysis for Detecting Environmental Barriers. Int. J. Environ. Res. Public Health 2021, 18, 11727. [Google Scholar] [CrossRef]
Jeong, G.; Kim, H.; Lee, H.S.; Park, M.; Hyun, H. Analysis of Safety Risk Factors of Modular Construction to Identify Accident Trends. J. Asian Archit. Build. Eng. 2022, 21, 1040–1052. [Google Scholar] [CrossRef]
Park, M.; Kulinan, A.S.; Tran, D.Q.; Bak, J.; Park, S. Preventing Falls from Floor Openings Using Quadrilateral Detection and Construction Worker Pose-Estimation. Autom. Constr. 2024, 165, 105536. [Google Scholar] [CrossRef]
Khan, M.; Nnaji, C.; Khan, M.S.; Ibrahim, A.; Lee, D.; Park, C. Risk Factors and Emerging Technologies for Preventing Falls from Heights at Construction Sites. Autom. Constr. 2023, 153, 104955. [Google Scholar] [CrossRef]
Chi, C.-F.; Chang, T.-C.; Ting, H.-I. Accident Patterns and Prevention Measures for Fatal Occupational Falls in the Construction Industry. Appl. Ergon. 2005, 36, 391–400. [Google Scholar] [CrossRef]
Nadhim, E.A.; Hon, C.; Xia, B.; Stewart, I.; Fang, D. Falls from Height in the Construction Industry: A Critical Review of the Scientific Literature. Int. J. Environ. Res. Public Health 2016, 13, 638. [Google Scholar] [CrossRef] [PubMed]
Huang, X.; Hinze, J. Analysis of Construction Worker Fall Accidents. J. Constr. Eng. Manag. 2003, 129, 262–271. [Google Scholar] [CrossRef]
Fall Protection in Construction: Protecting Floor Openings. Available online: https://www.onlinesafetytrainer.com/fall-protection-in-construction-protecting-floor-openings/ (accessed on 1 May 2025).
Chi, C.-F. Accident Causes and Prevention Measures for Fatal Occupational Falls in the Construction Industry. In Fall Prevention and Protection; CRC Press: Boca Raton, FL, USA, 2016; ISBN 978-1-315-37374-4. [Google Scholar]
Winge, S.; Albrechtsen, E. Accident Types and Barrier Failures in the Construction Industry. Saf. Sci. 2018, 105, 158–166. [Google Scholar] [CrossRef]
Bobick, T.G.; McKenzie, E.A.; Kau, T.-Y. Evaluation of Guardrail Systems for Preventing Falls through Roof and Floor Holes. J. Saf. Res. 2010, 41, 203–211. [Google Scholar] [CrossRef]
Liy, C.H.; Ibrahim, S.H.; Affandi, R.; Rosli, N.A.; Nawi, M.N.M. Causes of Fall Hazards in Construction Site Management. Int. Rev. Manag. Mark. 2016, 6, 257–263. [Google Scholar]
Navon, R.; Kolton, O. Algorithms for Automated Monitoring and Control of Fall Hazards. J. Comput. Civ. Eng. 2007, 21, 21–28. [Google Scholar] [CrossRef]
Tanvi Newaz, M.; Ershadi, M.; Carothers, L.; Jefferies, M.; Davis, P. A Review and Assessment of Technologies for Addressing the Risk of Falling from Height on Construction Sites. Saf. Sci. 2022, 147, 105618. [Google Scholar] [CrossRef]
Kaskutas, V.; Dale, A.M.; Nolan, J.; Patterson, D.; Lipscomb, H.J.; Evanoff, B. Fall Hazard Control Observed on Residential Construction Sites. Am. J. Ind. Med. 2009, 52, 491–499. [Google Scholar] [CrossRef]
Almaskati, D.; Kermanshachi, S.; Pamidimukkala, A.; Loganathan, K.; Yin, Z. A Review on Construction Safety: Hazards, Mitigation Strategies, and Impacted Sectors. Buildings 2024, 14, 526. [Google Scholar] [CrossRef]
Chellappa, V.; Salve, U.R. Fall Risk Assessment for Vertical Formwork Activities in Construction. ASCE-ASME J. Risk Uncertain. Eng. Syst. Part A Civ. Eng. 2023, 9, 04023027. Available online: https://ascelibrary.org/doi/abs/10.1061/AJRUA6.RUENG-958 (accessed on 1 May 2025). [CrossRef]
Zhang, M.; Shi, R.; Yang, Z. A Critical Review of Vision-Based Occupational Health and Safety Monitoring of Construction Site Workers. Saf. Sci. 2020, 126, 104658. [Google Scholar] [CrossRef]
Jin, Z.; Gambatese, J. Development of a Cost-Effective Proximity Warning System for Fall Protection. Comput. Civ. Eng. 2024, 2023, 375–382. [Google Scholar] [CrossRef]
Zhu, H.; Hwang, B.-G. Development of a Sensor-Based Safety Performance Analytic Mobile System to Detect, Alert, and Analyze Workers’ Unsafe Behaviors. Comput. Civ. Eng. 2024, 2023, 476–482. [Google Scholar] [CrossRef]
Khan, A.M.; Alrasheed, K.A.; Waqar, A.; Almujibah, H.; Benjeddou, O. Internet of Things (IoT) for Safety and Efficiency in Construction Building Site Operations. Sci. Rep. 2024, 14, 28914. [Google Scholar] [CrossRef] [PubMed]
Chen, H.; Luo, X.; Zheng, Z.; Ke, J. A Proactive Workers’ Safety Risk Evaluation Framework Based on Position and Posture Data Fusion. Autom. Constr. 2019, 98, 275–288. [Google Scholar] [CrossRef]
Li, H.; Lu, M.; Hsu, S.-C.; Gray, M.; Huang, T. Proactive Behavior-Based Safety Management for Construction Safety Improvement. Saf. Sci. 2015, 75, 107–117. [Google Scholar] [CrossRef]
Maali, O.; Ko, C.-H.; Nguyen, P.H.D. Applications of Existing and Emerging Construction Safety Technologies. Autom. Constr. 2024, 158, 105231. [Google Scholar] [CrossRef]
OSHA’s Fall Prevention Campaign—Educational Materials and Resources for Workers and Employers|Occupational Safety and Health Administration. Available online: https://www.osha.gov/stop-falls/educational-resources (accessed on 1 May 2025).
Choo, H.; Lee, B.; Kim, H.; Choi, B. Automated Detection of Construction Work at Heights and Deployment of Safety Hooks Using IMU with a Barometer. Autom. Constr. 2023, 147, 104714. [Google Scholar] [CrossRef]
Nakanishi, Y.; Kaneta, T.; Nishino, S. A Review of Monitoring Construction Equipment in Support of Construction Project Management. Front. Built Environ. 2022, 7, 632593. [Google Scholar] [CrossRef]
Kulinan, A.S.; Park, M.; Aung, P.P.W.; Cha, G.; Park, S. Advancing Construction Site Workforce Safety Monitoring through BIM and Computer Vision Integration. Autom. Constr. 2024, 158, 105227. [Google Scholar] [CrossRef]
Liu, L.; Guo, Z.; Liu, Z.; Zhang, Y.; Cai, R.; Hu, X.; Yang, R.; Wang, G. Multi-Task Intelligent Monitoring of Construction Safety Based on Computer Vision. Buildings 2024, 14, 2429. [Google Scholar] [CrossRef]
Oh, J.; Hong, S.; Choi, B.; Ham, Y.; Kim, H. Integrating Text Parsing and Object Detection for Automated Monitoring of Finishing Works in Construction Projects. Autom. Constr. 2025, 174, 106139. [Google Scholar] [CrossRef]
Kim, H. Feasibility of DRNN for Identifying Built Environment Barriers to Walkability Using Wearable Sensor Data from Pedestrians’ Gait. Appl. Sci. 2022, 12, 4384. [Google Scholar] [CrossRef]
Lee, B.; Kim, H. Two-Step k-Means Clustering Based Information Entropy for Detecting Environmental Barriers Using Wearable Sensor. Int. J. Environ. Res. Public Health 2022, 19, 704. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Wei, H.; Han, Z.; Jiang, N.; Wang, W.; Huang, J. Computer Vision-Based Hazard Identification of Construction Site Using Visual Relationship Detection and Ontology. Buildings 2022, 12, 857. [Google Scholar] [CrossRef]
Choi, W.; Na, S.; Heo, S. Integrating Drone Imagery and AI for Improved Construction Site Management through Building Information Modeling. Buildings 2024, 14, 1106. [Google Scholar] [CrossRef]
Lee, J.; Lee, S. Construction Site Safety Management: A Computer Vision and Deep Learning Approach. Sensors 2023, 23, 944. [Google Scholar] [CrossRef]
Paneru, S.; Jeelani, I. Computer Vision Applications in Construction: Current State, Opportunities & Challenges. Autom. Constr. 2021, 132, 103940. [Google Scholar] [CrossRef]
Islam, M.S.; Shaqib, S.M.; Ramit, S.S.; Khushbu, S.A.; Sattar, A.; Noori, S.R.H. A Deep Learning Approach to Detect Complete Safety Equipment For Construction Workers Based On YOLOv7. arXiv 2024, arXiv:2406.07707. [Google Scholar]
Yoon, S.; Kim, H. Occlusion-Aware Worker Detection in Masonry Work: Performance Evaluation of YOLOv8 and SAMURAI. Appl. Sci. 2025, 15, 3991. [Google Scholar] [CrossRef]
Yang, K.; Ahn, C.R.; Kim, H. Deep Learning-Based Classification of Work-Related Physical Load Levels in Construction. Adv. Eng. Inform. 2020, 45, 101104. [Google Scholar] [CrossRef]
Oh, J.; Cho, G.Y.; Kim, H. Performance Analysis of Wearable Robotic Exoskeleton in Construction Tasks: Productivity and Motion Stability Assessment. Appl. Sci. 2025, 15, 3808. [Google Scholar] [CrossRef]
Jiang, W.; Banna, V.; Vivek, N.; Goel, A.; Synovic, N.; Thiruvathukal, G.K.; Davis, J.C. Challenges and Practices of Deep Learning Model Reengineering: A Case Study on Computer Vision. Empir. Softw. Eng. 2024, 29, 142. [Google Scholar] [CrossRef]
Hong, S.; Choi, B.; Ham, Y.; Jeon, J.; Kim, H. Massive-Scale Construction Dataset Synthesis through Stable Diffusion for Machine Learning Training. Adv. Eng. Inform. 2024, 62, 102866. [Google Scholar] [CrossRef]
Xu, J.; Pan, W. Deep Learning-Based Object Detection for Dynamic Construction Site Management. Autom. Constr. 2024, 165, 105494. [Google Scholar] [CrossRef]
Jiang, H.; Lin, P.; Fan, Q.; Qiang, M. Real-Time Safety Risk Assessment Based on a Real-Time Location System for Hydropower Construction Sites. Sci. World J. 2014, 2014, 235970. [Google Scholar] [CrossRef]
Lee, M.; Kim, S.; Kim, H.; Hwang, S. Pedestrian Visual Satisfaction and Dissatisfaction toward Physical Components of the Walking Environment Based on Types, Characteristics, and Combinations. Build. Environ. 2023, 244, 110776. [Google Scholar] [CrossRef]
Rabbi, A.B.K.; Jeelani, I. AI Integration in Construction Safety: Current State, Challenges, and Future Opportunities in Text, Vision, and Audio Based Applications. Autom. Constr. 2024, 164, 105443. [Google Scholar] [CrossRef]
Nagabandi, A.; Clavera, I.; Liu, S.; Fearing, R.S.; Abbeel, P.; Levine, S.; Finn, C. Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning. arXiv 2019, arXiv:1803.11347. [Google Scholar]
Kim, J.; Chi, S. A Few-Shot Learning Approach for Database-Free Vision-Based Monitoring on Construction Sites. Autom. Constr. 2021, 124, 103566. [Google Scholar] [CrossRef]
Wang, X.; El-Gohary, N. Few-Shot Object Detection and Attribute Recognition from Construction Site Images for Improved Field Compliance. Autom. Constr. 2024, 167, 105539. [Google Scholar] [CrossRef]
Losada del Olmo, J.J.; Perales Gómez, Á.L.; López-de-Teruel, P.E.; Ruiz, A. A Few-Shot Learning Methodology for Improving Safety in Industrial Scenarios through Universal Self-Supervised Visual Features and Dense Optical Flow. Appl. Soft Comput. 2024, 167, 112375. [Google Scholar] [CrossRef]
Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a Few Examples: A Survey on Few-Shot Learning. ACM Comput. Surv. 2020, 53, 63:1–63:34. [Google Scholar] [CrossRef]
Kaul, P.; Xie, W.; Zisserman, A. Label, Verify, Correct: A Simple Few Shot Object Detection Method. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 14237–14247. [Google Scholar]
Fu, Y.; Qiu, X.; Ren, B.; Fu, Y.; Timofte, R.; Sebe, N.; Yang, M.-H.; Gool, L.V.; Zhang, K.; Nong, Q.; et al. NTIRE 2025 Challenge on Cross-Domain Few-Shot Object Detection: Methods and Results. arXiv 2025, arXiv:2504.10685. [Google Scholar]
Chen, J.; Wang, C.; Hong, Y.; Mi, R.; Zhang, L.-J.; Wu, Y.; Wang, H.; Zhou, Y. A Survey on Anomaly Detection with Few-Shot Learning. In Proceedings of the Cognitive Computing—ICCC 2024, Bangkok, Thailand, 16–19 November 2024; Xu, R., Chen, H., Wu, Y., Zhang, L.-J., Eds.; Springer Nature Switzerland: Cham, Switzerland, 2025; pp. 34–50. [Google Scholar]
Zhu, Y.; Min, W.; Jiang, S. Attribute-Guided Feature Learning for Few-Shot Image Recognition. IEEE Trans. Multimed. 2021, 23, 1200–1209. [Google Scholar] [CrossRef]
Zhang, L.; Wang, S.; Chang, X.; Liu, J.; Ge, Z.; Zheng, Q. Auto-FSL: Searching the Attribute Consistent Network for Few-Shot Learning. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 1213–1223. [Google Scholar] [CrossRef]
Porikli, F. Challenges of Computer Vision Research from an Industry Perspective. In Computer Vision; Chapman and Hall/CRC: Boca Raton, FL, USA, 2024; ISBN 978-1-00-332895-7. [Google Scholar]
Madan, S.; Chaudhury, S.; Gandhi, T.K. Explainable Few-Shot Learning with Visual Explanations on a Low Resource Pneumonia Dataset. Pattern Recognit. Lett. 2023, 176, 109–116. [Google Scholar] [CrossRef]
Ding, K.; Wang, J.; Li, J.; Shu, K.; Liu, C.; Liu, H. Graph Prototypical Networks for Few-Shot Learning on Attributed Networks. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual, 19–23 October 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 295–304. [Google Scholar]
Xu, W.; Xian, Y.; Wang, J.; Schiele, B.; Akata, Z. Attribute Prototype Network for Any-Shot Learning. Int. J. Comput. Vis. 2022, 130, 1735–1753. [Google Scholar] [CrossRef]
Hu, M.; Chang, H.; Guo, Z.; Ma, B.; Shan, S.; Chen, X. Understanding Few-Shot Learning: Measuring Task Relatedness and Adaptation Difficulty via Attributes. Adv. Neural Inf. Process. Syst. 2023, 36, 19397–19409. [Google Scholar]
Xu, Y.; Fan, Y.; Bao, Y.; Li, H. Few-Shot Learning for Structural Health Diagnosis of Civil Infrastructure. Adv. Eng. Inform. 2024, 62, 102650. [Google Scholar] [CrossRef]
Finn, C.; Abbeel, P.; Levine, S. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In International Conference on Machine Learning; PMLR: London, UK, 2017. [Google Scholar]
Nichol, A.; Achiam, J.; Schulman, J. On First-Order Meta-Learning Algorithms. arXiv 2018, arXiv:1803.02999. [Google Scholar]
Li, Z.; Zhou, F.; Chen, F.; Li, H. Meta-SGD: Learning to Learn Quickly for Few-Shot Learning. arXiv 2017, arXiv:1707.09835. [Google Scholar]
Jamal, M.A.; Qi, G.-J. Task Agnostic Meta-Learning for Few-Shot Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 11719–11727. [Google Scholar]
Zintgraf, L.; Shiarli, K.; Kurin, V.; Hofmann, K.; Whiteson, S. Fast Context Adaptation via Meta-Learning. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; PMLR: London, UK, 2019; pp. 7693–7702. [Google Scholar]
Oreshkin, B.; Rodríguez López, P.; Lacoste, A. TADAM: Task Dependent Adaptive Metric for Improved Few-Shot Learning. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
Ye, H.-J.; Hu, H.; Zhan, D.-C.; Sha, F. Few-Shot Learning via Embedding Adaptation with Set-to-Set Functions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8808–8817. [Google Scholar]
Bertinetto, L.; Henriques, J.F.; Torr, P.H.S.; Vedaldi, A. Meta-Learning with Differentiable Closed-Form Solvers. arXiv 2019, arXiv:1805.08136. [Google Scholar]
Song, Z.; Zou, S.; Zhou, W.; Huang, Y.; Shao, L.; Yuan, J.; Gou, X.; Jin, W.; Wang, Z.; Chen, X.; et al. Clinically Applicable Histopathological Diagnosis System for Gastric Cancer Detection Using Deep Learning. Nat. Commun. 2020, 11, 4294. [Google Scholar] [CrossRef]
Kabir, H.; Wu, J.; Dahal, S.; Joo, T.; Garg, N. Automated Estimation of Cementitious Sorptivity via Computer Vision. Nat. Commun. 2024, 15, 9935. [Google Scholar] [CrossRef]

Figure 1. Visual classification examples of construction site openings: Regular (Open/Covered—with attached warning signs indicating “Danger” or “Watch for Opening”) and Irregular.

Figure 2. Research framework.

Figure 3. Dataset development (data collection, support set, query set, labelling, data augmentation).

Figure 4. Meta-training step of the MAML model.

Figure 5. Hyperparameter sensitivity analysis of attribute-weighted fusion model.

Figure 6. Training and validation curves with (b) and without (a) regularization.

Table 1. Fusion weight optimization via grid search: accuracy trends Across

α

and

β

.

Table 1. Fusion weight optimization via grid search: accuracy trends Across

α

and

β

.

	1	2	3	4	5	6	7	8	9	10	11
$α$	0.0	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9	1.0
$β$	1.0	0.9	0.8	0.7	0.6	0.5	0.4	0.3	0.2	0.1	0.0
Accuracy	0.8600	0.8615	0.8624	0.8637	0.8658	0.8734	0.8830	0.8972	0.9032	0.8991	0.8828

Table 2. Experimental results of the proposed approach.

K-Way N-Shot		Validation Accuracy	Recall	Precision	F1 Score
K-Way	N-Shot	Validation Accuracy	Recall	Precision	F1 Score
Two-way	1	0.8680	0.7525	0.6624	0.6916
	2	0.9086	0.7998	0.7327	0.7601
	3	0.9343	0.8747	0.8084	0.8370
	5	0.9495	0.9222	0.8276	0.8668
	10	0.9596	0.9278	0.8580	0.8888
Three-way	1	0.8255	0.6827	0.6494	0.6441
	2	0.8440	0.7394	0.6956	0.7000
	3	0.8761	0.7999	0.7520	0.7570
	5	0.9054	0.8558	0.8018	0.8204
	10	0.9134	0.8832	0.8443	0.8598

Table 3. Cross-validation results on three-way five-shot classification using five random seeds.

Seed No.	Accuracy	Recall	Precision	F1 Score
Seed 1 (42)	0.9054	0.8558	0.8018	0.8204
Seed 2 (106)	0.8967	0.8413	0.7922	0.8131
Seed 3 (2024)	0.9112	0.8625	0.8076	0.8322
Seed 4 (77)	0.9031	0.8504	0.7940	0.8187
Seed 5 (9)	0.9079	0.8612	0.7998	0.8284
Mean ± SD	0.9049 ± 0.0050	0.8542 ± 0.0086	0.7991 ± 0.0055	0.8226 ± 0.0074

Table 4. Experimental results of various learning models.

Model	Dataset	Accuracy	Recall	Precision	F1 Score
ResNet-50	15images	0.4933	0.4867	0.5837	0.4606
EfficientNetB0	15images	0.6905	0.6771	0.6902	0.6629
FSL model	3way, 5shot	0.9054	0.8558	0.8018	0.8204

Table 5. Performance comparison of the proposed base MAML and advanced MAML Variants (Reptile, Meta-SGD, and CAVIA) under three-way, five-shot setting.

Model	Dataset	Accuracy	Recall	Precision	F1 Score
Base MAML	3way, 5shot	0.9054	0.8558	0.8018	0.8204
CAVIA	3way, 5shot	0.9336	0.8774	0.8302	0.8516
Meta-SGD	3way, 5shot	0.9268	0.8699	0.8196	0.8403
Reptile	3way, 5shot	0.9221	0.8645	0.8133	0.8312

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Seo, M.; Kim, H. Irregular Openings Identification at Construction Sites Based on Few-Shot Learning. Buildings 2025, 15, 1834. https://doi.org/10.3390/buildings15111834

AMA Style

Seo M, Kim H. Irregular Openings Identification at Construction Sites Based on Few-Shot Learning. Buildings. 2025; 15(11):1834. https://doi.org/10.3390/buildings15111834

Chicago/Turabian Style

Seo, Minjo, and Hyunsoo Kim. 2025. "Irregular Openings Identification at Construction Sites Based on Few-Shot Learning" Buildings 15, no. 11: 1834. https://doi.org/10.3390/buildings15111834

APA Style

Seo, M., & Kim, H. (2025). Irregular Openings Identification at Construction Sites Based on Few-Shot Learning. Buildings, 15(11), 1834. https://doi.org/10.3390/buildings15111834

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Irregular Openings Identification at Construction Sites Based on Few-Shot Learning

Abstract

1. Introduction

2. Methodology

2.1. Framework for FSL in Construction

2.2. Dataset Development

2.3. FSL Model

2.3.1. Model Architecture

2.3.2. Meta Learning: MAML Algorithm

2.3.3. Attribute-Based Enhancement

2.4. Experimental Setup

2.4.1. Data Preparation and Splitting

2.4.2. Implementation and Hyperparameters

3. Results

3.1. Performance of the Proposed FSL Model

3.2. Comparison: FSL vs. Conventional Supervised Approach

3.3. Comparison Between the Base MAML and Advanced MAML Algorithms

4. Discussion

4.1. Potential Applications of FSL for Hazard Identification

4.2. Contributions

4.3. Limitations and Future Research

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI