Relationship-Based Ambient Detection for Concrete Pouring Verification: Improving Detection Accuracy in Complex Construction Environments

Yang, Seungwon; Kim, Hyunsoo

doi:10.3390/app15126499

Open AccessArticle

Relationship-Based Ambient Detection for Concrete Pouring Verification: Improving Detection Accuracy in Complex Construction Environments

by

Seungwon Yang

and

Hyunsoo Kim

^*

Department of Architectural Engineering, Dankook University, 152 Jukjeon-ro, Suji-gu, Yongin-si 16890, Gyeonggi-do, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(12), 6499; https://doi.org/10.3390/app15126499

Submission received: 2 May 2025 / Revised: 5 June 2025 / Accepted: 5 June 2025 / Published: 9 June 2025

Download

Browse Figures

Versions Notes

Abstract

Efficient monitoring of concrete pouring operations is critical for ensuring compliance with construction regulations and maintaining structural quality. However, traditional monitoring methods face limitations such as overlapping objects, environmental similarities, and detection errors caused by ambiguous boundaries. This study proposes an Ambient Detection-based Monitoring Framework that enhances object detection by incorporating contextual relationships between objects in complex construction environments. The framework employs the You Only Look Once version 11 (YOLOv11) algorithm, addressing issues of boundary ambiguity and misrecognition through relational analysis. Key components including Distance Relationship (

D_{R}

), Attribute Relationship (

A_{R}

), and Spatial Relationship (

S_{R}

) allow the system to quantitatively evaluate contextual associations and improve detection accuracy. Experimental validation using 232 test images demonstrated a 12.07% improvement in detection accuracy and a 71% reduction in false positives compared with baseline YOLOv11. By automating the monitoring process, the proposed framework not only improves efficiency but also enhances construction quality, demonstrating its adaptability to diverse construction scenarios.

Keywords:

concrete pouring monitoring; object detection; YOLOv11 algorithm; ambient detection

1. Introduction

In recent years, sensor-based technologies for analyzing workers’ movements and automated monitoring systems for real-time evaluation of construction status have gained significant attention in construction sites [1,2,3,4]. Among these technologies, automated monitoring systems have emerged as a key solution for overcoming the subjectivity and inefficiency of traditional human-centered supervision methods [5,6]. Such systems contribute to improved overall construction quality and safety by objectively and accurately evaluating compliance with various work regulations [7,8]. In particular, compliance with regulations such as concrete pouring height and compaction intervals directly impacts the final quality of concrete structures [9]. Noncompliance with these standards may lead to concrete material segregation, potentially reducing the structural strength and durability of the completed construction [10].

In current construction practices, verification of work regulation compliance is primarily conducted through visual inspection by supervisors [11,12]. This approach presents several limitations that reduce its effectiveness and reliability [13]. Concrete pouring operations often involve multiple simultaneous tasks, making it practically difficult for a small number of supervisors to effectively monitor all points at once [14]. Additionally, the current approach relies heavily on subjective judgment by supervisors, presenting challenges in accurately measuring quantitative criteria such as pouring height [15,16]. The complexity of construction environments and combined with the rapid pace of operations complicates consistent and continuous monitoring through traditional methods [17,18].

To address these issues deep learning-based object recognition technologies have been utilized in construction sites to automatically identify various work progress situations [19,20,21]. These include rebar arrangement verification, formwork installation progress tracking, masonry pattern analysis, jack-support installation spacing measurement, and heavy equipment operation monitoring [13,22,23,24]. The advancement of real-time object detection algorithms like You Only Look Once version 11 (YOLOv11) has significantly increased the feasibility of field applications [6,25,26,27,28,29]. YOLOv11 and similar algorithms provide real-time object detection capabilities that are well-suited for construction environments [18,30,31,32]. These technologies offer improved accuracy and efficiency compared to traditional visual monitoring methods by supervisors [33,34]. They help track work status in real-time and support construction managers in making informed decisions [35].

Object recognition technology has proven particularly useful in work progress tracking and quality management [24,36]. For instance, object recognition in masonry work can be utilized to check the overall situation and progress of the bricklaying activities through image analysis [24]. In jack-support installation, it allows for verifying the installation locations continuously without manual checks [23]. Safety monitoring applications have also been developed using these technologies [26,28]. Systems can now automatically monitor the safety distance between workers and equipment, helping prevent accidents and ensure compliance with safety regulations [37,38]. The ability to detect and track multiple objects simultaneously makes these systems effective even in dynamic construction environments [39,40,41].

Despite these advancements, YOLO-based automated monitoring systems for construction sites present several critical limitations when deployed in real-world environments [42]. In the context of concrete pouring operations, these systems encounter two major issues. First, they rely on single-object recognition methods that fail to consider the inherent relationships between objects [43,44]. During pouring tasks, the pouring boom head and the concrete are logically and physically connected, yet current systems detect them independently [45]. This results in frequent misrecognition, where visually similar objects are falsely identified as concrete, or actual pouring concrete is not detected at all [46,47].

Second, the systems struggle with visual ambiguity [48,49]. Construction materials often share similar visual characteristics, which leads to boundary ambiguity and significantly reduces recognition accuracy [50]. For instance, the concrete discharged from the pouring boom and the already poured surface concrete have nearly identical color and texture. Conventional object detection algorithms are not well-equipped to distinguish between such visually similar but functionally distinct elements [51,52]. This challenge is intensified in congested construction environments where multiple objects operate in close proximity [53,54]. Such conditions result in occlusions and complex visual scenes that significantly reduce the recognition accuracy of existing detection methods [55,56,57,58].

Architectures such as YOLOv11 are inherently focused on learning individual object features, without incorporating contextual or relational information [59,60]. As a result, it cannot reliably differentiate between concrete actively being poured and concrete already poured onto the floor surface, despite their distinct functional roles in the construction process [61]. The misdetections and complete failure caused by this limitation significantly impair the accuracy of measuring critical parameters such as pouring height [62].

To address these critical issues, our study introduces an ambient detection approach that analyzes spatial and visual relationships between objects in construction environments. The fundamental difference between conventional YOLO-based detection and the proposed Ambient Detection lies in their approach to object recognition. Traditional YOLO algorithms rely solely on individual object features, analyzing each detected object independently without considering its contextual relationships with surrounding elements. This single-object recognition approach often fails in construction environments where objects share similar visual characteristics or exist in complex spatial arrangements. In contrast, Ambient Detection enhances the baseline YOLO detection by incorporating three critical relationship factors that analyze contextual associations between objects. Specifically, these factors include: (1) Distance Relationship—quantifying the physical proximity between the pouring boom and concrete objects, (2) Attribute Relationship—measuring visual distinctiveness through feature vector analysis, and (3) Spatial Relationship—evaluating positional constraints that reflect the physical laws of concrete pouring operations. These relationship factors are combined through a weighted mathematical formulation rather than additional neural networks, ensuring computational efficiency while maintaining interpretability. The superior performance of Ambient Detection stems from its ability to leverage contextual information that conventional YOLO systems ignore. By analyzing inter-object relationships, the system can distinguish between actively poured concrete and already placed surface concrete, even when they appear visually identical. This relationship-based approach directly addresses the two primary limitations of YOLO-based systems: boundary ambiguity caused by visual similarity and misdetection due to lack of contextual understanding.

By integrating contextual information and modeling object relationships, this approach aims to overcome the inherent limitations of single-object recognition. It also significantly enhances the monitoring accuracy for concrete pouring operations. This study introduces Ambient Detection technology, an approach designed to enhance the accuracy of monitoring concrete pouring operations by analyzing the relationships between objects. Ambient Detection addresses boundary ambiguity inherent in single-object detection methods through the examination of distance, spatial, and attribute relationships among objects. Specifically, this study aims to accurately monitor compliance with concrete pouring height regulations on construction sites. Pouring height is a critical standard intended to prevent concrete quality deterioration, yet it remains challenging to measure precisely using current visual inspection practices.

The proposed Ambient Detection-based approach explicitly considers the relationship between the pouring boom head and the concrete object during pouring operations, significantly improving detection accuracy. During pouring, the pouring boom head and discharged concrete maintain a continuous physical relationship. Therefore, our approach simultaneously analyzes the relative spatial positioning and visual attributes of the pouring boom head and the concrete to enhance object detection performance. Such relational analysis is particularly effective in resolving boundary ambiguity, a frequent issue on construction sites.

Consequently, this research overcomes the shortcomings of existing YOLO-based object recognition algorithms, enabling objective and precise assessments of compliance with pouring height regulations. Furthermore, the proposed approach aims to establish a technological foundation capable of expanding beyond concrete pouring tasks. By leveraging object relationships, the developed detection methodology can be applied more broadly across various construction monitoring scenarios.

Guided by these observations, we hypothesize that relationship-based ambient detection can markedly outperform single-object YOLOv11 detection in concrete-pouring scenes plagued by boundary ambiguity and visual similarity. This approach explicitly embeds distance, attribute, and spatial relationships into the detection pipeline. To verify this hypothesis we set five research tasks: (T1) design a framework that fuses the three contextual cues without adding an extra trainable network; (T2) implement a rigorous region-of-interest (ROI) definition and closed-form formulas for all three relationship scores—distance

D_{R}

, attribute similarity

A_{R}

, and the binary spatial constraint

S_{R}

; (T3) compare accuracy, misdetection, and complete-failure rates of YOLOv11 and the proposed Ambient Detection on a hold-out test set of 232 unseen field images; (T4) analyse the computational overhead on an RTX 3090 GPU, noting that additional calculations inevitably slow Ambient Detection, yet may be justified by the accuracy gain; and (T5) identify current limitations, such as the fixed 1.5 m neck-height assumption, and outline future work for lightweight optimization and cross-trade generalization.

2. Methodology

2.1. Research Framework

The proposed framework aims to monitor the compliance of concrete pouring height regulations by analyzing object relationships in real construction environments. As illustrated in Figure 1, the framework consists of five main modules: data collection, data labelling, Region of Interest (ROI) definition based on boom detection, relationship-based weight analysis, and compliance verification.

A diverse set of concrete pouring images is collected through web crawling. From the initial dataset, images that clearly capture the pouring scene are selected for analysis. These selected images are annotated to define the concrete boom and concrete objects, which serve as input for the detection module. The YOLOv11 architecture is used to detect the boom object, and the ROI is defined based on its bounding box. Within the ROI, the relationship between the boom and concrete objects is analyzed based on three factors: distance, attribute, and spatial relationships. These relationships are converted into association values, normalized, and weighted to calculate a final detection score. The pouring height is estimated from the bounding box dimensions of the detected concrete object. The height is then compared against a predefined threshold to determine whether the pouring complies with construction regulations. This framework enables a structured and objective approach to real-time pouring monitoring in construction sites.

2.2. Data Collection and Annotation

To construct a training dataset suitable for concrete pouring detection, a total of 1200 images were collected through random web crawling. These images were selected to reflect diverse environmental conditions observed at real construction sites. In particular, environmental diversity was ensured by including images captured under various lighting and weather conditions.

Next, a filtering process was conducted to select images that clearly depicted actual concrete pouring activities. The selection focused on task visibility, ensuring that only scenes in which pouring was actively occurring were retained. Images showing unclear operations or irrelevant content were excluded. From the 1200 images initially collected, 873 met the selection criteria; of these, 641 were annotated for model development and the remaining 232 were reserved as the test set. Two object classes were defined: the boom and the concrete being poured. Each object was annotated using bounding boxes formatted for YOLOv11. The boom class included visible portions of the pouring boom head, and the concrete class included only the concrete that was actively being poured from the boom at the time of capture. Surfaces of already poured concrete were excluded to reduce recognition errors caused by boundary ambiguity.

2.3. ROI Definition Using Boom Detection

The YOLOv11 architecture was applied to detect the boom object, which plays a central role in identifying the pouring zone. Note that in our coordinate system, the y-axis is oriented downward from the top of the image, which is the standard convention in image processing. This means that higher y-values indicate positions lower in the image. Once the boom is detected, its bounding box is used to define the ROI, where associated concrete objects are likely to be located during the pouring operation. As illustrated in Figure 2, the ROI is designed to spatially restrict detection to a region directly beneath the boom. This minimizes false positives from unrelated objects and improves detection accuracy by focusing only on the active pouring area. The ROI is mathematically defined as:

{R O I}_{C o n c r e t e} = \{(x, y)∣ x \in [x_{B o o m . m i n}, x_{B o o m . m a x}], y > y_{B o o m, m a x}\}

(1)

The ROI is mathematically defined as in Equation (1). Here,

x_{B o o m, m i n}

and

x_{B o o m, m a x}

denote the left and right boundaries of the detected boom’s bounding box, respectively;

y_{B o o m, m a x}

represents the vertical bottom edge of the bounding box. The ROI includes all pixel locations that are horizontally within the boom’s bounding box range and vertically below the bottom boundary of the boom.

This constraint reflects the physical assumption that concrete objects actively involved in the pouring task must appear directly beneath the boom in the image. By filtering out objects located outside this spatial range, the framework ensures that only relevant areas are analyzed. This allows the subsequent relationship-based analysis to focus solely on valid candidate regions that are contextually associated with actual pouring activity.

2.4. Relationship-Based Weight Analysis

This study proposes a relationship-based weighting module that quantitatively analyzes the association between the pouring boom and concrete objects. The goal is to improve detection accuracy in construction environments where visually similar elements frequently coexist. The analysis is conducted within the ROI defined in the previous step. The association is computed based on three relationship factors—distance, attribute, and spatial positioning—to determine whether each detected concrete object is actively involved in the pouring operation.

Initially, to quantitatively analyze the distance-based association value derived from the distance relationship, this study includes the process of calculating and normalizing distance-based association values. As shown in Equation (2),

D_{R}

is the distance-based association value calculated from the distance relationship between the two objects. It represents the physical proximity between the pouring boom and the detected concrete object, where closer distances indicate higher association. The distance association value is calculated based on the difference between the bottom boundary (

y_{m a x}

) of the detected pouring boom bounding box and the top boundary (

y_{m i n}

) of the detected concrete bounding box. Here,

D_{R}

is the distance association value before normalization, representing the vertical distance difference between the two objects.

y_{B o o m, m a x}

is the bottom boundary of the pouring boom bounding box, and

y_{C o n c r e t e, m i n}

is the top boundary of the detected concrete bounding box.

D_{R} = | y_{B o o m, m a x} - y_{C o n c r e t e, m i n} |

(2)

The smaller this value, the closer the pouring boom and concrete object are positioned, indicating higher association, while increasing distance difference indicates lower association. However, since detected distance values can have a wide range depending on the data, there is a need to normalize them for use within a consistent range. As defined in Equation (3), this study calculates

D_{n o r m}

by normalizing the distance association value and uses it for the weight calculation process. The normalization process is defined as follows:

D_{n o r m} = 1 - \frac{D_{R}}{D_{m a x}}

(3)

Through Equation (3), a value of

D_{n o r m}

close to 1 corresponds to very close objects (

D_{R}

≈ 0), whereas a value approaching 0 denotes little association as

D_{R}

approaches

D_{m a x}

. Here,

D_{m a x}

is set as the maximum detectable distance within the ROI, and relative distance values are normalized based on this. The normalized distance association value is used in the weight calculation process for object detection, enhancing final detection confidence by reflecting the physical relationship with the pouring boom. Through this, the proposed design overcomes the limitations of individual object-based detection methods and enables detection that considers relationships between objects.

Building upon the distance relationships, the attribute association value

A_{R}

is defined by Equation (4), analyzing the visual distinctiveness between the pouring boom and the detected concrete object, and the normalized value

A_{n o r m}

is used ultimately.

The attribute association value is calculated based on cosine similarity, serving as an indicator that evaluates how distinctly the visual characteristics (feature vectors) of the two objects are differentiated. It is designed to have higher values when the two objects are visually clearly distinguishable. When defining the visual attributes of the pouring boom and concrete as

F_{B o o m}

and

F_{C o n c r e t e}

, respectively, the similarity between the two vectors is calculated as follows. Here,

A_{R}

is the attribute association value, which is the similarity value before normalization.

S i m (F_{B o o m}, F_{C o n c r e t e})

represents the similarity between the feature vectors of the two objects, based on cosine similarity.

A_{R} = 1 - S i m (F_{B o o m}, F_{C o n c r e t e})

(4)

As shown in Equation (5), cosine similarity is the value obtained by dividing the dot product of two vectors by the magnitude of each vector, ranging from −1 to 1. Similarity closer to 1 indicates that the two vectors are similar, while values closer to −1 indicate opposite directions. Therefore, the

A_{R}

value, which is the similarity subtracted from 1, represents how visually distinguishable the two objects are, with higher values indicating better visual distinction between the two objects. Here

F_{1} {= F}_{B o o m}

is the visual attribute (feature vector) of the pouring boom (Boom) including color and texture, and

F_{2} {= F}_{C o n c r e t e}

is the visual attribute (feature vector) of the detected concrete (Concrete) including color and texture.

The feature vector

F_{1}, F_{2}

of each object is generated through a Convolutional Neural Network-based (CNN) feature extractor using the image patch of the object from the YOLOv11 detection result. Specifically, we extract feature maps with dimensions N × N × 512 from the penultimate layer (“neck”) of the YOLOv11 backbone. To convert these feature maps into 512-dimensional feature vectors, we apply Global Average Pooling (GAP) operation across the spatial dimensions. This operation computes the average value for each of the 512 channels, effectively reducing the spatial dimensions while preserving the essential feature information for each object. We then apply L2 normalization (‖F‖₂ = 1) to each vector before similarity computation. These feature vectors numerically represent the visual characteristics of objects in multi-dimensional space and are used to evaluate attribute relationships between objects by analyzing the similarity between the two vectors.

S i m (F_{1}, F_{2}) = \frac{F_{1} \cdot F_{2}}{‖F_{1}‖ ‖F_{2}‖}

(5)

Since

A_{R}

can have varying ranges depending on the dataset, it is normalized to

A_{n o r m}

as defined in Equation (6) and used for the weight calculation. The normalization process applies min–max normalization and is expressed as follows. Here,

A_{n o r m}

is the normalized attribute association value, and

A_{m i n}

and

A_{m a x}

are the minimum and maximum attribute association values within the dataset. This transforms the attribute association value to a consistent scale within the 0–1 range. To address potential out-of-range values in real-world applications, we implement a clamping procedure that constrains

A_{n o r m}

values to the [0, 1] interval. If the computed

A_{n o r m}

exceeds 1, it is set to 1; if it falls below 0, it is set to 0. Additionally, for optimal performance in different deployment scenarios, the

A_{m i n}

and

A_{m a x}

values can be recalibrated based on site-specific data or robust normalization techniques can be employed to handle outliers. Values closer to 1 indicate that the visual characteristics between the pouring boom and the detected concrete object are clearly distinguishable. The normalized attribute association value is used as a factor reflecting the relationship with the pouring boom in the weight calculation process. This enables more accurate object detection and pouring height analysis.

A_{n o r m} = \frac{A_{R} - A_{m i n}}{A_{m a x} - A_{m i n}}

(6)

In addition to distance and attribute relationships, the spatial relationship association value

S_{R}

is a factor that quantitatively evaluates the relative positional information between the pouring boom and the detected concrete object. It plays a role in improving detection accuracy by analyzing whether the relationship between objects follows a specific pattern. Generally, concrete discharged from a pouring boom exists directly below the boom, and this relationship can serve as a structural constraint in the object detection process. Therefore, it is important to check whether the detected object satisfies a specific spatial relationship with the pouring boom.

The spatial relationship association value

S_{R}

is calculated according to Equation (7), by evaluating whether the detected object is positioned at the vertical bottom of the boom bounding box. For this, the bottom of the boom bounding box (

y_{B o o m, m a x}

) and the top of the concrete bounding box (

y_{C o n c r e t e, m i n}

) are compared to analyze whether the object is positioned below the boom. Here,

S_{R}

is the spatial relationship association value, which is a binary value (0 or 1) before normalization.

S_{R} = \{\begin{matrix} 1, i f y_{C o n c r e t e, m i n} > y_{B o o m, m a x} (C o n c r e t e i s b e l o w b o o m) \\ 0, O t h e r w i s e \end{matrix}

(7)

According to this condition, if the detected concrete object is positioned at the vertical bottom of the boom object,

S_{R}

= 1, otherwise

S_{R}

= 0. This allows us to determine whether the detected object has the correct spatial relationship with the boom.

Having established the three relationship factors, the final detection is determined by evaluating the normalized association values through mathematical formula-based analysis. The final weight between the two detected objects is defined by Equation (8).

W (B o o m, C o n c r e t e) = α \cdot D_{n o r m} + β \cdot A_{n o r m} + γ \cdot S_{R}

(8)

where α, β, and γ are coefficients adjusting the importance of distance, attributes, and spatial relationships, respectively, starting with initial values set equally at 0.33. This initially sets the contribution of each factor equally to evaluate the reliability of relationship-based detection, and then adjusts based on training data. During the learning process, optimal importance coefficient values are determined experimentally, considering the balance between detection precision (Precision) and recall (Recall).

Following Equation (9), the computed value is normalized through a sigmoid function and compared with a pre-set threshold to determine whether the detected object is finally classified as concrete being poured.

\hat{W} = σ (W) = \frac{1}{1 + e^{- W}}

(9)

Upon comparison with the threshold, if the calculated weight value exceeds the pre-set threshold, the object is confirmed as concrete being poured; otherwise, the detection is invalidated. In this study, the threshold was set based on the top 80% of average weight values based on training data, and optimized considering the balance between precision and recall. The detailed process by which the threshold was calculated will be elaborated on in Section 2.6.

Through this module, the limitations of YOLOv11 are complemented, enabling more reliable concrete pouring monitoring through the application of a detection technique that reflects inter-object relationships. The final decision, based on normalized weight, improves detection reliability compared to the existing individual object detection method. It maintains high accuracy even in environments with high boundary ambiguity by reflecting the structural relationship between the pouring boom and concrete objects. Consequently, the Ambient Detection module proposed in this study is designed to go beyond simple object detection. It enables more precise pouring monitoring and regulatory compliance review by introducing an approach that analyzes relationships between objects.

It should be noted that the final integration of the three relationship factors (distance, attribute, and spatial relationships) is performed through mathematical formulation rather than trainable neural networks. While deep learning models are utilized in the preprocessing stages—specifically YOLOv11 for object detection and pose estimation algorithms for human skeleton extraction—the relationship-based weight calculation follows a deterministic approach using the weighted combination defined in Equations (8) and (9). This design choice ensures interpretability and computational efficiency in the final decision-making process while maintaining the benefits of deep learning-based feature extraction.

2.5. Pouring Height Measurement System

To enable accurate evaluation of concrete pouring height in real-world units, this study proposes a measurement system that utilizes human skeletal information extracted from site imagery. The system estimates the actual height of the poured concrete by converting pixel distances into metric units, using the worker’s body as a dynamic reference.

The measurement process begins by identifying the neck midpoint of on-site workers in the image using a pose estimation algorithm. Specifically, we employed the OpenPose algorithm for real-time 2D human-pose estimation [63]. OpenPose outputs 14 key body joints in our configuration; we additionally compute the midpoint between the neck and the nose keypoints and use this synthetic landmark as the reference for height calibration [64,65]. This anatomical landmark is assigned a height of 1.5 m, based on the average height of adult Korean males including footwear, which is approximately 175 cm. The vertical pixel position of the neck point is then used to derive a pixel-to-meter conversion factor. To ensure consistency, if the worker’s posture is not upright due to movement or bending, the skeleton is algorithmically corrected to a straightened form before calculating the reference point. Once the conversion factor is established, the height of the detected concrete object—measured in pixels—is converted into meters.

For example, if the neck midpoint appears at y = 375 pixels and is defined as 1.5 m, the conversion factor becomes approximately 0.004 m/pixel. If the concrete bounding box height is 120 pixels, the actual pouring height is calculated as 0.48 m. This result is then compared against a predefined threshold of 1.5 m to determine whether the pouring operation complies with safety and quality regulations.

As shown in Figure 3, this skeleton-based system adapts flexibly to varying environmental conditions, camera angles, and worker positions, ensuring high reliability and field applicability. Unlike traditional manual methods, the proposed approach provides consistent, objective, and evaluation of pouring height compliance in construction environments.

2.6. Model Training and Optimization

Prior to the relationship-based weight optimization, the YOLOv11 model was trained using our annotated dataset of 641 images with an 80:20 train-validation split. Training was conducted for 100 epochs with an initial learning rate of 0.001, batch size of 16, and input image resolution of 640 × 640 pixels. The model was implemented using PyTorch v2.2.0 framework and trained using an NVIDIA RTX 3090 GPU (NVIDIA Corporation, Santa Clara, CA, USA) with 24 GB memory. Data augmentation techniques, including random rotation (±15°), scaling (0.8–1.2), and color jittering, were applied to enhance model robustness.

In this study, we conducted a systematic optimization process to determine the optimal weight coefficients (α, β, γ) for the three relationship factors (distance, attribute, spatial). This process was designed to quantitatively evaluate and adjust the importance of each relationship factor in the final detection decision.

Initially, equal weights (α = β = γ = 0.33) were assigned to each relationship factor to evaluate the reliability of relationship-based detection. This established a baseline under the assumption that all factors contribute equally to the detection process. Subsequently, we explored optimal weight combinations through 10-fold cross-validation. For weight coefficient optimization, we performed a grid search across a total of 120 combinations (α, β, γ

\in

{0.10, 0.15, …, 1.0} ∧ α + β + γ = 1.0, α, β, γ ≥ 0.1). For each combination, model performance was measured using accuracy and F1 score as evaluation metrics. Since all test images contained the target object, the image-level precision was fixed at 1.0, and in this case, accuracy is identical to recall. Therefore, this study used accuracy and F1 score as the primary performance metrics. Through this process, the learned optimal coefficients were determined to be α = 0.50 (distance relationship), β = 0.20 (attribute relationship), and γ = 0.30 (spatial relationship). The highest weight being allocated to the distance relationship (

D_{R}

) reflects the physical characteristics of concrete pouring operations. The closer the physical distance between the pouring boom head and concrete, the higher the likelihood that pouring is actually in progress. The spatial relationship (

S_{R}

) receiving the second-highest weight demonstrates the importance of the structural constraint that concrete must be positioned below the pouring boom head. The attribute relationship (

A_{R}

) received a relatively lower weight but plays a complementary role through visual characteristic distinction.

Additionally, to set the threshold for final detection decisions, we analyzed the precision-recall curve. We established a candidate threshold range from 0.5 to 0.9 at 5% intervals and measured performance at each point. The optimal threshold was selected at the point where the F1 score reached its maximum along the precision–recall curve. Finally, at a threshold of 0.58, the most balanced performance between precision and recall was demonstrated; thus, this was adopted as the final threshold. This optimization process played a crucial role in ensuring model robustness across various field conditions (lighting changes, viewpoint changes, partial occlusions, etc.). It provided the foundation for stable object detection performance even in situations with high boundary ambiguity. Experimental results showed that the optimized model demonstrated significant performance improvement compared to conventional single-object recognition methods. This validates the effectiveness of the optimal weight coefficients and threshold setting.

3. Results

This section presents the experimental results used to evaluate the effectiveness of the proposed Ambient Detection model in monitoring concrete pouring operations. The analysis focuses on a comparative evaluation of the proposed method and a baseline object detection model (YOLOv11). It uses key performance indicators, such as object recognition success rate, accuracy, F1 score, misdetection rate, and complete failure rate. These metrics were chosen to reflect both the detection accuracy and reliability of the systems in complex construction scenarios.

The results are organized into three subsections. Section 3.1 compares the performance results between the conventional object detection model and the proposed Ambient Detection model. Section 3.2 presents a statistical analysis of the performance difference between the two models, including McNemar’s test. Section 3.3 provides an ablation study on the proposed Ambient Detection model, analyzing the contribution of individual relationship factors to the overall detection performance.

3.1. Comparative Performance Analysis

The detection performance and computational efficiency results for actively poured concrete objects during pouring operations are shown in Table 1.

Table 1 presents the detection performance and computational efficiency for actively poured concrete objects during pouring operations. The evaluation metrics are defined as follows: (1) Accuracy represents the percentage of correctly detected concrete pouring instances out of all test images, calculated as the ratio of true positives to the total number of test images; (2) F1-Score is the harmonic mean of precision and recall, providing a balanced measure of detection performance that accounts for both the model’s ability to correctly identify positive cases (recall) and avoid false positives (precision); (3) Misdetection Rate indicates the percentage of cases where non-target objects were incorrectly identified as pouring concrete; (4) Complete Failure Rate represents the percentage of cases where the model failed to detect any concrete pouring activity despite its presence in the image; (5) Processing Time represents the average computational time required to process a single image using an NVIDIA RTX 3090 GPU(NVIDIA Corporation, Santa Clara, CA, USA).

In this study, 232 images were selected from a total of 873 field images to serve as the test set for evaluating the object detection performance of the two models. All test images contained at least one instance of a concrete pouring boom, meaning they consisted solely of positive samples. As a result, only true positives (TP) and false negatives (FN) were counted when constructing the confusion matrix, while false positives (FP) and true negatives (TN) were considered to be zero. Accordingly, accuracy and recall yielded identical values, and precision was calculated as 100% due to the absence of false positives. The F1-score, which represents the harmonic mean of precision and recall, was used as the primary metric for comparing the two models.

The YOLOv11-based detection model correctly identified 188 out of 232 test images, resulting in an accuracy and recall of 81.03%, and an F1-score of 89.52%. However, it exhibited a relatively high error rate, with 6.03% misdetections and 12.93% complete failures. In contrast, the proposed Ambient Detection model correctly detected 216 images, achieving an accuracy and recall of 93.10%, and an F1-score of 96.43%. The misdetection and complete failure rates were significantly reduced to 1.72% and 5.17%, respectively. Compared to YOLOv11, Ambient Detection showed a 12.07

% p

improvement in accuracy and recall, and a 6.91

% p

improvement in F1-score. These results suggest that the relationship-based feature enhancement approach in Ambient Detection effectively improved detection reliability. Regarding computational efficiency, the processing time analysis shown in Table 1 reveals that while Ambient Detection requires 65 ms per image compared to 45 ms for YOLOv11, this 44% increase in processing time is justified by the substantial accuracy improvements (12.07% increase in detection accuracy and 71% reduction in false positives). The computational overhead demonstrates that the significant gains in detection reliability outweigh the additional processing cost. Moreover, in the context of concrete pouring operations, where accurate detection of pouring objects is critical for verifying regulatory compliance, this performance improvement enhances the model’s practical applicability on construction sites.

To enable a more detailed assessment of model performance, this study classified detection failures into two distinct types: misdetections and complete failures. In the context of concrete pouring detection, a misdetection occurs when a bounding box is incorrectly generated around a non-target object, such as a nearby structure or material. A complete failure refers to the absence of a bounding box due to ambiguous boundaries between the actively poured concrete (target object) and surrounding elements with similar visual characteristics. Ambient Detection was specifically designed to overcome these issues—namely, misdetections caused by object similarity and complete failures caused by visual ambiguity between freshly poured and already placed concrete. To verify how effectively these challenges were addressed, the study analyzed misdetections and complete failures separately. YOLOv11 was vulnerable to both types of error, showing a misdetection rate of 6.03% and a complete failure rate of 12.93%. In contrast, Ambient Detection effectively addressed these limitations through its relationship-based feature enhancement module, reducing the misdetection and complete failure rates to 1.72% and 5.17%, respectively.

Rather than relying solely on the count of undetected images, this study distinguished between the total number of detection failures, misdetections, and complete failures. This approach enabled a more quantitative and detailed verification of the proposed model’s error-reduction capabilities. This approach contributes to improving the generalizability and practical applicability of Ambient Detection across diverse construction environments and operational conditions.

3.2. Qualitative Analysis and Statistical Significance Analysis

To provide a more comprehensive evaluation of the proposed approach, this section presents qualitative examples alongside statistical validation of the performance differences. Figure 4 illustrates representative examples that demonstrate the superior performance of Ambient Detection in challenging scenarios. These qualitative results support the quantitative findings presented in Table 1, highlighting how the relationship-based approach effectively addresses the key limitations of conventional object detection methods in construction environments.

Figure 4 demonstrates two critical challenges overcome by the proposed approach: complete failure due to boundary ambiguity and misdetection caused by visual similarity between objects. In the complete failure case, Figure 4a shows YOLOv11 failing to detect actively poured concrete due to visual similarity with already placed surface concrete. Figure 4b demonstrates how Ambient Detection successfully identifies the target concrete through spatial and distance relationship analysis, overcoming the boundary ambiguity challenge. In the misdetection scenario, Figure 4c illustrates YOLOv11 incorrectly detecting multiple non-target objects as pouring concrete due to visual similarity. Figure 4d shows how Ambient Detection accurately detects only the actual pouring concrete by leveraging relationship factors to eliminate misdetections.

To further validate these qualitative observations and determine whether the observed performance difference between the two models is statistically significant, McNemar’s test was applied. This non-parametric test is particularly suitable for paired nominal data and is commonly used to compare the performance of two models on the same test dataset.

Table 2 presents the contingency table used for McNemar’s test, categorizing all 232 test images based on the success or failure of detection by each model.

McNemar’s test focuses on the discordant pairs—cases where one model succeeded while the other failed. From Table 2, these are categories B (YOLOv11 succeeded, Ambient Detection failed) and C (Ambient Detection succeeded, YOLOv11 failed).

The calculated χ² value of 16.57 exceeds the critical value of 3.84 for a significance level of 0.05 with 1 degree of freedom. This result indicates that the performance difference between the YOLOv11 and Ambient Detection models is statistically significant (p < 0.05).

3.3. Ablation Study by Relationship Factors

To further understand the contribution of each relationship factor (distance, attribute, and spatial relationships) within the proposed Ambient Detection model, we conducted an ablation study. Specifically, we evaluated seven different configurations, systematically activating or deactivating each factor to measure their individual and combined impacts on object detection performance. These seven configurations included the complete model (

D_{n o r m} + A_{n o r m} + S_{R}

), three two-factor combinations (

D_{n o r m} + A_{n o r m}

,

D_{n o r m} + S_{R}

,

A_{n o r m} + S_{R}

), and three single-factor models (

D_{n o r m}, A_{n o r m}, S_{R}

).

Table 3 summarizes the detection performance of each configuration based on 232 test images, measured using the conservative criterion outlined in Section 3.1. The complete model, utilizing all three relationship factors (

D_{n o r m} + A_{n o r m} + S_{R}

), achieved the highest success rate of 93.10%. When one relationship factor was removed, performance consistently declined, but to varying degrees. The removal of the spatial relationship (

D_{n o r m} + A_{n o r m}

) and attribute relationship (

D_{n o r m} + S_{R}

) resulted in modest decreases, producing success rates of 88.79% and 87.50%, respectively. However, excluding the distance relationship (

A_{n o r m} + S_{R}

) led to a significant drop to 80.60%, highlighting the critical importance of proximity cues in concrete-pouring detection tasks.

Further, single-factor analyses revealed that the distance relationship alone maintained a relatively high success rate of 81.90%, reinforcing its primary role. In contrast, single-factor models relying solely on attribute (75.43%) or spatial (76.72%) relationships performed notably worse, indicating these relationships mainly provide complementary information rather than independently strong cues.

Overall, the ablation results demonstrate that the distance relationship factor (

D_{n o r m}

) plays the dominant role in object detection reliability, while attribute (

A_{n o r m}

) and spatial (

S_{R}

) relationships significantly enhance overall detection accuracy when combined effectively.

4. Discussion

4.1. Comprehesive Analysis of Object Recognition Performance

The Ambient Detection model demonstrated superior performance compared to YOLOv11 in terms of accuracy, F1 score, misdetection rate, and complete failure rate. Experimental results showed that Ambient Detection achieved a 93.10% detection success rate versus YOLOv11’s 81.03%, representing a 12.07% performance improvement. While YOLOv11 maintained reasonable accuracy under ideal conditions, its performance deteriorated significantly in scenes with partial occlusions or visual confusion. In contrast, Ambient Detection maintained stable performance in complex environments by leveraging inter-object relationships.

The Ablation Study results clearly illustrate the impact of each component on the model’s performance. The complete model utilizing all three relationship factors (distance, attribute, and spatial) recorded the highest success rate at 93.10%, while performance decreased substantially when using single factors alone. Notably, the distance relationship (

D_{n o r m}

) proved to be the most influential factor, achieving an 81.90% success rate independently. In comparison, attribute (

A_{n o r m}

) and spatial (

S_{R}

) relationships achieved only 75.43% and 76.72% success rates, respectively, when used independently.

These findings indicate that the physical proximity between pouring equipment and concrete serves as the most critical signal in detecting pouring operations. Configurations excluding the distance relationship

{(A}_{n o r m} + S_{R})

showed significant performance degradation to 80.60%. In contrast, excluding spatial

(D_{n o r m} + A_{n o r m}

) or attribute

(D_{n o r m} + S_{n o r m}

) relationships resulted in relatively smaller decreases to 88.79% and 87.50%, respectively.

While attribute and spatial relationships showed limited performance individually, they complementarily enhanced detection performance when combined with the distance factor. Attribute relationships contributed to distinguishing between visually similar but functionally different objects, while spatial relationships provided contextual information based on object positioning. Failure type analysis revealed that YOLOv11’s complete failure rate of 12.93% and misdetection rate of 6.03% were significantly reduced to 5.17% and 1.72%, respectively, with Ambient Detection. This demonstrates the relationship-based approach’s robustness in complex construction environments.

4.2. Contributions and Limitations

The proposed Ambient Detection method made significant contributions to improving object recognition reliability in construction operation monitoring through context-based analysis. This research results can overcome the limitations of existing single-object recognition by quantitatively modeling relationships between objects. Conventional appearance-based detection approaches often produce both misdetection and complete failure errors in complex scenes [66,67]. In contrast, the relationship-based approach yields more precise judgments by capturing the semantic connections between operations. This approach is particularly useful in construction sites where object boundaries are ambiguous and visual similarities are high.

The systematic integration of three relationship factors—distance, attribute, and spatial relationships—and the experimental validation of each factor’s contribution represent a notable advancement. By clearly identifying the individual and combined effects of each relationship factor through Ablation Studies, this research established a robust theoretical framework for relationship-based object recognition. This framework can serve as a foundation for future research in relationship-based object recognition across various construction monitoring applications. Additionally, the proposed methodology has been implemented as a practical system capable of automatically monitoring compliance with concrete pouring height regulations. This directly contributes to improved quality management efficiency at construction sites. By replacing the subjectivity and inefficiency of existing manual inspection methods with an objective and accurate automated system, it provides scalability to monitor multiple pouring points simultaneously.

However, several limitations exist in this study. The current model structure was validated using a diverse set of test images to determine appropriate relationship factor thresholds. However, for optimal performance in real-world applications, the model should be adjusted and fine-tuned using data that reflect the specific conditions of each construction site. Additionally, the complex model calculating distance, attribute, and spatial relationships has high computational complexity, potentially making real-time operation difficult on resource-constrained field equipment.

Another significant limitation concerns the accuracy of height estimation, which is a core component of the pouring compliance monitoring system. This study did not perform quantitative validation of the estimated pouring heights, due to the lack of reference height data in the collected images. Specifically, this study employed the OpenPose algorithm for real-time 2D human pose estimation. OpenPose returns 14 key joint coordinates in our configuration, and we additionally calculated the midpoint between the neck and nose keypoints to use as a reference landmark for height calibration. This anatomical landmark sets the worker’s height at approximately 175 cm and assigns the midpoint of the neck at a height of 1.5 m. The worker’s height was set at 175 cm considering the average height of adult Korean males plus footwear. While the method of using the landmark between the head and neck as a 1.5 m reference to calibrate height according to worker behavior is appropriate, it has limitations in that it does not consider individual workers’ physical conditions, which can introduce systematic measurement errors by universally applying the same pixel-to-meter conversion factor.

When different workers are used as references, each individual’s unique physical characteristics introduce different biases into the AI system’s pixel size calculations. For example, when a worker with an actual neck height of 1.4 m appears at 300 pixels, the system calculates a conversion factor of 0.005 m/pixel (1.5 m/300 px), but the correct factor should be 0.0047 m/pixel (1.4 m/300 px), resulting in systematic overestimation of approximately 7%. Based on typical height variations among construction workers (±10 cm), we estimate that this could result in approximately ±7% relative error in pouring height calculations. While this level of accuracy may be acceptable for general compliance monitoring, future work needs to verify and improve the accuracy of height estimation through site-specific measurements or calibration procedures.

In this study, lighting and visibility issues did not pose major challenges during slab concrete pouring tasks, as the environment was relatively clear and well-lit. However, such favorable conditions may not be guaranteed in other types of construction work—particularly in indoor operations—where low lighting, shadows, or limited visibility are more likely to occur [68,69,70]. Therefore, future applications of the proposed method should carefully consider these environmental variations, as they may significantly affect detection accuracy in real-world scenarios [71,72,73]. This finding indicates that the relatively low performance contribution of attribute relationships

{(A}_{n o r m})

observed in the Ablation Study is meaningful. It suggests that the consistency of visual characteristics may deteriorate under varied lighting conditions commonly encountered in real construction sites.

While Ambient Detection maintained high performance in partial occlusion situations, it tended to fail in detection when most of the object was completely occluded [74,75]. This demonstrates the structural limitation of the current relationship-based detection, which is restricted in extreme occlusion situations [76]. In particular, if the physical connection between the pouring boom and concrete is obscured, distance relationship calculation becomes difficult, potentially degrading the overall system performance. Finally, the current model is optimized for the specific task of concrete pouring, requiring additional adjustments and validation for application to other types of construction work. Since the relationship factor weights (α = 0.50, β = 0.20, γ = 0.30) have been optimized to reflect the characteristics of concrete pouring operations, different weight combinations may be necessary for other task types.

4.3. Future Research Directions

To overcome the limitations identified in this study, we propose several directions for future research. To begin with, to address the real-time processing constraints, optimization of the relationship analysis algorithms and development of lightweight network structures are necessary [71,76]. For systems to operate effectively in resource-constrained field environments, computational efficiency must be improved without sacrificing detection accuracy [76,77,78,79].

To enhance robustness across various environmental conditions, future work should integrate data augmentation techniques [80,81,82]. Additionally, domain adaptation methods that can respond to lighting changes, weather variations, and other environmental factors should also be incorporated [83,84]. This would help maintain consistent detection performance across the diverse conditions encountered at construction sites [85]. To improve object recognition performance in extreme occlusion situations, techniques that utilize temporal context for tracking and methods that combine multi-view information should be considered [86]. These approaches could help maintain detection continuity even when physical connections between objects are temporarily obscured.

Future research should also extend the Ambient Detection approach to encompass diverse objects and workflow across the entire concrete pouring process. While this study focused on pouring scene recognition, subsequent research should consider recognition models for equipment essential to ensuring concrete quality, such as vibrators. By detecting these objects comprehensively, the understanding of the entire process and the precision of quality assessment can be further enhanced.

In addition, future research should extend the Ambient Detection approach to encompass diverse structural elements beyond concrete slabs, such as beams and foundation structures. These different structural elements present unique geometric characteristics and pouring constraints that require specialized adaptations of our framework. For beams, the narrower pouring area and vertical formwork create different monitoring challenges compared to horizontal slabs. Foundation structures, especially deep foundations, involve pouring in confined spaces with limited visibility.

By adapting our relationship-based detection methodology to these various elements, we can develop a more comprehensive monitoring system. Such a system would be capable of identifying process anomalies specific to different structural elements that may compromise concrete quality and structural integrity. This would involve developing specialized relationship factors and ROI definitions tailored to the unique characteristics of each structural element and its corresponding pouring equipment configurations.

The goal would be to create an integrated monitoring platform that tracks concrete quality across all structural elements throughout the construction project lifecycle. This expanded application would significantly enhance the practical utility of the Ambient Detection approach. It would provide construction managers with a unified system for maintaining concrete quality standards across diverse structural components. Such extensions would further contribute to addressing the industry-wide challenge of ensuring uniform concrete quality in increasingly complex construction projects.

The integration of Large Language Models (LLM) with the detection system represents another promising direction [87,88]. Such integration could enable simultaneous recognition of multiple operations and their communication to managers in natural language [89,90,91]. This would allow the creation of an integrated smart monitoring system that provides high-level information supporting decision-making in real-time to site managers, rather than merely quantifying object recognition results.

Additionally, transfer learning and domain adaptation techniques that consider the specificity of construction sites could help develop models that maintain stable performance across various construction environments. Research on developing models robust to regional and seasonal characteristics of construction sites will be particularly important for practical industrial application.

In conclusion, while Ambient Detection has successfully overcome limitations of conventional object recognition through relationship-based approaches, it has shown clear advantages in construction monitoring tasks. However, extending this paradigm to create comprehensive systems that support automation and quality management across the broader construction industry remains a challenge for future work.

5. Conclusions

This study proposed a relationship-based Ambient Detection model to achieve reliable object recognition even under visual complexity and occlusion during concrete-pouring operations. Comparative experiments with the YOLOv11 model confirmed that the proposed method significantly improves recognition performance by embedding distance, attribute, and spatial relationships in the detection pipeline.

(T1) The complete framework outperformed the baseline, raising accuracy from 81.03% to 93.10%, and lowering the misdetection and complete-failure rates to 1.72% and 5.17%, respectively.

(T2) A rigorously defined ROI and closed-form scores for all three relationships (

D_{R} {, S}_{R} {, A}_{R}

) enabled deterministic weight computation.

(T3) The model’s robustness was further evidenced by an ablation study and a McNemar test showing a statistically significant performance gap in 232 field images.

(T4) Processing time inevitably increased from 45 ms to 65 ms per image because the three relationship scores must be calculated; however, the 44% overhead is a reasonable trade-off for the 12% accuracy gain. Future work will focus on lightweight optimization device inference to satisfy stricter real-time constraints.

(T5) Limitations such as the uniform 1.5 m neck-height assumption (≈±7% height-error) and the single-trade dataset will be addressed through adaptive scaling, depth sensing, and cross-trade validation.

In summary, relationship-based Ambient Detection substantially mitigates boundary ambiguity and visual-similarity errors, fulfilling all five research tasks and thereby confirming the initial hypothesis that contextual cues markedly enhance detection reliability in concrete-pouring scenarios.

Author Contributions

Conceptualization, S.Y. and H.K.; methodology, S.Y. and H.K.; validation, S.Y. and H.K.; investigation, S.Y.; writing—original draft preparation, S.Y.; writing—review and editing, S.Y. and H.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was by a grant (RS-2022-00143493) from a Digital-Based Building Construction and Safety Supervision Technology Research Program funded by Ministry of Land, Infrastructure and Transport of Korean Government.

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of DANKOOK UNIVERSITY (DKU 2020-09-027, date of approval 14 September 2020).

Informed Consent Statement

Written informed consent has been obtained from the patient(s) to publish this paper.

Data Availability Statement

All data, models, or code generated or used during the study are available from the corresponding author by request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lee, B.; Hwang, S.; Kim, H. The Feasibility of Information-Entropy-Based Behavioral Analysis for Detecting Environmental Barriers. Int. J. Environ. Res. Public Health 2021, 18, 11727. [Google Scholar] [CrossRef] [PubMed]
Lee, B.; Kim, H. Two-Step k-Means Clustering Based Information Entropy for Detecting Environmental Barriers Using Wearable Sensor. Int. J. Environ. Res. Public Health 2022, 19, 704. [Google Scholar] [CrossRef] [PubMed]
Kim, H. Feasibility of DRNN for Identifying Built Environment Barriers to Walkability Using Wearable Sensor Data from Pedestrians’ Gait. Appl. Sci. 2022, 12, 4384. [Google Scholar] [CrossRef]
Oh, J.; Cho, G.; Kim, H. Performance Analysis of Wearable Robotic Exoskeleton in Construction Tasks: Productivity and Motion Stability Assessment. Appl. Sci. 2025, 15, 3808. [Google Scholar] [CrossRef]
Kim, S.; Hong, S.H.; Kim, H.; Lee, M.; Hwang, S. Small Object Detection (SOD) System for Comprehensive Construction Site Safety Monitoring. Autom. Constr. 2023, 156, 105103. [Google Scholar] [CrossRef]
Bai, R.; Wang, M.; Zhang, Z.; Lu, J.; Shen, F. Automated Construction Site Monitoring Based on Improved YOLOv8-Seg Instance Segmentation Algorithm. IEEE Access 2023, 11, 139082–139096. [Google Scholar] [CrossRef]
Ekanayake, B.; Wong, J.K.-W.; Fini, A.A.F.; Smith, P. Computer Vision-Based Interior Construction Progress Monitoring: A Literature Review and Future Research Directions. Autom. Constr. 2021, 127, 103705. [Google Scholar] [CrossRef]
Liu, K.; Meng, Q.; Kong, Q.; Zhang, X. Review on the Developments of Structure, Construction Automation, and Monitoring of Intelligent Construction. Buildings 2022, 12, 1890. [Google Scholar] [CrossRef]
De Schutter, G.; Bartos, P.J.M.; Domone, P.J.G.; Gibbs, J. Self Compacting Concrete; Whittles Publishing: Dunbeath, UK, 2008; ISBN 978-1904445302. Available online: http://ndl.ethernet.edu.et/bitstream/123456789/75013/1/89.pdf (accessed on 4 June 2025).
Howes, R.; Hadi, M.N.; South, W. Concrete Strength Reduction Due to over Compaction. Constr. Build. Mater. 2019, 197, 725–733. [Google Scholar] [CrossRef]
Luo, H.; Lin, L.; Chen, K.; Antwi-Afari, M.F.; Chen, L. Digital Technology for Quality Management in Construction: A Review and Future Research Directions. Dev. Built Environ. 2022, 12, 100087. [Google Scholar] [CrossRef]
Guo, D.; Onstein, E.; La Rosa, A.D. A Semantic Approach for Automated Rule Compliance Checking in Construction Industry. IEEE Access 2021, 9, 129648–129660. [Google Scholar] [CrossRef]
Rafieyan, A.; Sarvari, H.; Chan, D.W. Identifying and Evaluating the Essential Factors Affecting the Incidence of Site Accidents Caused by Human Errors in Industrial Parks Construction Projects. Int. J. Environ. Res. Public Health 2022, 19, 10209. [Google Scholar] [CrossRef] [PubMed]
Yang, J.; Wilde, A.; Menzel, K.; Sheikh, M.Z.; Kuznetsov, B. Computer Vision for Construction Progress Monitoring: A Real-Time Object Detection Approach. In Proceedings of the Working Conference on Virtual Enterprises; Springer: Berlin/Heidelberg, Germany, 2023; pp. 660–672. [Google Scholar]
Yu, L.; Huang, M.M.; Jiang, S.; Wang, C.; Wu, M. Unmanned Aircraft Path Planning for Construction Safety Inspections. Autom. Constr. 2023, 154, 105005. [Google Scholar] [CrossRef]
Hussamadin, R.; Jansson, G.; Mukkavaara, J. Digital Quality Control System—A Tool for Reliable on-Site Inspection and Documentation. Buildings 2023, 13, 358. [Google Scholar] [CrossRef]
Rao, A.S.; Radanovic, M.; Liu, Y.; Hu, S.; Fang, Y.; Khoshelham, K.; Palaniswami, M.; Ngo, T. Real-Time Monitoring of Construction Sites: Sensors, Methods, and Applications. Autom. Constr. 2022, 136, 104099. [Google Scholar] [CrossRef]
Yang, K.; Wang, H.; Wang, K.; Chen, F. An Effective Monitoring Method of Dynamic Compaction Construction Quality Based on Time Series Modeling. Measurement 2024, 224, 113930. [Google Scholar] [CrossRef]
Oliveira, B.A.S.; Neto, A.P.D.F.; Fernandino, R.M.A.; Carvalho, R.F.; Fernandes, A.L.; Guimaraes, F.G. Automated Monitoring of Construction Sites of Electric Power Substations Using Deep Learning. IEEE Access 2021, 9, 19195–19207. [Google Scholar] [CrossRef]
Shanti, M.Z.; Cho, C.-S.; de Soto, B.G.; Byon, Y.-J.; Yeun, C.Y.; Kim, T.Y. Real-Time Monitoring of Work-at-Height Safety Hazards in Construction Sites Using Drones and Deep Learning. J. Saf. Res. 2022, 83, 364–370. [Google Scholar] [CrossRef]
Wong, J.K.W.; Bameri, F.; Ahmadian Fard Fini, A.; Maghrebi, M. Tracking Indoor Construction Progress by Deep-Learning-Based Analysis of Site Surveillance Video. Constr. Innov. 2025, 25, 461–489. [Google Scholar] [CrossRef]
Sun, T.; Fan, Q.; Shao, Y. Deep Learning-Based Rebar Detection and Instance Segmentation in Images. Adv. Eng. Inform. 2025, 65, 103224. [Google Scholar] [CrossRef]
Yoon, S.; Kim, H. Time-Series Image-Based Automated Monitoring Framework for Visible Facilities: Focusing on Installation and Retention Period. Sensors 2025, 25, 574. [Google Scholar] [CrossRef] [PubMed]
Oh, J.; Hong, S.; Choi, B.; Ham, Y.; Kim, H. Integrating Text Parsing and Object Detection for Automated Monitoring of Finishing Works in Construction Projects. Autom. Constr. 2025, 174, 106139. [Google Scholar] [CrossRef]
Feng, R.; Miao, Y.; Zheng, J. A YOLO-Based Intelligent Detection Algorithm for Risk Assess-Ment of Construction Sites. J. Intell. Constr. 2024, 2, 1–18. [Google Scholar] [CrossRef]
Liu, L.; Guo, Z.; Liu, Z.; Zhang, Y.; Cai, R.; Hu, X.; Yang, R.; Wang, G. Multi-Task Intelligent Monitoring of Construction Safety Based on Computer Vision. Buildings 2024, 14, 2429. [Google Scholar] [CrossRef]
Yang, Y.; Li, Y.; Tao, M. FE-YOLO: A Lightweight Model for Construction Waste Detection Based on Improved YOLOv8 Model. Buildings 2024, 14, 2672. [Google Scholar] [CrossRef]
Jiao, X.; Li, C.; Zhang, X.; Fan, J.; Cai, Z.; Zhou, Z.; Wang, Y. Detection Method for Safety Helmet Wearing on Construction Sites Based on UAV Images and YOLOv8. Buildings 2025, 15, 354. [Google Scholar] [CrossRef]
Huang, K.; Abisado, M.B. Lightweight Construction Safety Behavior Detection Model Based on Improved YOLOv8. Discov. Appl. Sci. 2025, 7, 326. [Google Scholar] [CrossRef]
Biswas, M.; Hoque, R. Construction Site Risk Reduction via YOLOv8: Detection of PPE, Masks, and Heavy Vehicles. In Proceedings of the 2024 IEEE International Conference on Computing, Applications and Systems (COMPAS), Cox’s Bazar, Bangladesh, 25–26 September 2024; pp. 1–6. [Google Scholar]
Seth, Y.; Sivagami, M. Enhanced YOLOv8 Object Detection Model for Construction Worker Safety Using Image Transformations. IEEE Access 2025, 13, 10582–10594. [Google Scholar] [CrossRef]
Singh, S.; Jain, A.; Sharma, V.; Girdhar, P. PPE Detection for Construction Site Safety Leveraging YOLOv8. 2024. SSRN 4923318, 2024. Available online: https://papers.ssrn.com/abstract=4923318 (accessed on 4 June 2025).
Alzubi, K.M.; Alaloul, W.S.; Malkawi, A.B.; Al Salaheen, M.; Qureshi, A.H.; Musarat, M.A. Automated Monitoring Technologies and Construction Productivity Enhancement: Building Projects Case. Ain Shams Eng. J. 2023, 14, 102042. [Google Scholar] [CrossRef]
Musarat, M.A.; Khan, A.M.; Alaloul, W.S.; Blas, N.; Ayub, S. Automated Monitoring Innovations for Efficient and Safe Construction Practices. Results Eng. 2024, 22, 102057. [Google Scholar] [CrossRef]
Jeelani, I.; Asadi, K.; Ramshankar, H.; Han, K.; Albert, A. Real-Time Vision-Based Worker Localization & Hazard Detection for Construction. Autom. Constr. 2021, 121, 103448. [Google Scholar]
Wu, W. Construction of Interactive Construction Progress and Quality Monitoring System Based on Image Processing. In Proceedings of the 2024 International Conference on Telecommunications and Power Electronics (TELEPE), Frankfurt, Germany, 29–31 May 2024; pp. 601–607. [Google Scholar]
Cho, Y.-W.; Kang, K.-S.; Son, B.-S.; Ryu, H.-G. Extraction of Workers and Heavy Equipment and Muliti-Object Tracking Using Surveillance System in Construction Sites. J. Korea Inst. Build. Constr. 2021, 21, 397–408. [Google Scholar]
Son, H.; Kim, C. Integrated Worker Detection and Tracking for the Safe Operation of Construction Machinery. Autom. Constr. 2021, 126, 103670. [Google Scholar] [CrossRef]
Zhang, Y.; Guan, D.; Zhang, S.; Su, J.; Han, Y.; Liu, J. GSO-YOLO: Global Stability Optimization YOLO for Construction Site Detection. arXiv 2024, arXiv:2407.00906. Available online: https://arxiv.org/abs/2407.00906 (accessed on 4 June 2025).
Lv, Z.; Wang, R.; Wang, Y.; Zhou, F.; Guo, N. Road Scene Multi-Object Detection Algorithm Based on CMS-YOLO. IEEE Access 2023, 11, 121190–121201. [Google Scholar] [CrossRef]
Hou, L.; Chen, C.; Wang, S.; Wu, Y.; Chen, X. Multi-Object Detection Method in Construction Machinery Swarm Operations Based on the Improved YOLOv4 Model. Sensors 2022, 22, 7294. [Google Scholar] [CrossRef]
Jia, X.; Zhou, X.; Shi, Z.; Xu, Q.; Zhang, G. GeoIoU-SEA-YOLO: An Advanced Model for Detecting Unsafe Behaviors on Construction Sites. Sensors 2025, 25, 1238. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar]
Terven, J.; Córdova-Esparza, D.-M.; Romero-González, J.-A. A Comprehensive Review of Yolo Architectures in Computer Vision: From Yolov1 to Yolov8 and Yolo-Nas. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Zhang, W.; Fu, C.; Chang, X.; Zhao, T.; Li, X.; Sham, C.-W. A More Compact Object Detector Head Network with Feature Enhancement and Relational Reasoning. Neurocomputing 2022, 499, 23–34. [Google Scholar] [CrossRef]
Yang, X.; Li, Z.; Kuang, W.; Zhang, C.; Ma, H. Object Detection with a Dynamic Interactive Network Based on Relational Graph Routing. Appl. Soft Comput. 2024, 165, 112119. [Google Scholar] [CrossRef]
Bharti, D.; Puneeth, B.; Venkatesh, K.S. Ambiguous Boundary Uncertainty Reduction in Single Stage Detector Models. In Proceedings of the 2024 IEEE International Conference on Computer Vision and Machine Intelligence (CVMI), Prayagraj, India, 19–20 October 2024; pp. 1–8. [Google Scholar]
Mekled, A.S.; Abdennadher, S.; Shehata, O.M. Performance Evaluation of YOLO Models in Varying Conditions: A Study on Object Detection and Tracking. In Proceedings of the 2024 International Conference on Computer and Applications (ICCA), Cairo, Egypt, 17–19 December 2024; pp. 1–6. [Google Scholar]
Yao, G.; Zhou, W.; Liu, M.; Xu, Q.; Wang, H.; Li, J.; Ju, Y. An Empirical Study of the Convolution Neural Networks Based Detection on Object with Ambiguous Boundary in Remote Sensing Imagery—A Case of Potential Loess Landslide. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 15, 323–338. [Google Scholar] [CrossRef]
Chen, T.; Ren, J. MFL-YOLO: An Object Detection Model for Damaged Traffic Signs. arXiv 2023, arXiv:2309.06750. Available online: https://arxiv.org/abs/2309.06750 (accessed on 4 June 2025).
Lin, J.; Zhao, Y.; Wang, S.; Tang, Y. YOLO-DA: An Efficient YOLO-Based Detector for Remote Sensing Object Detection. IEEE Geosci. Remote Sens. Lett. 2023, 20, 6008705. [Google Scholar] [CrossRef]
Huang, T.-Y.; Lee, M.-C.; Yang, C.-H.; Lee, T.-S. Yolo-Ore: A Deep Learning-Aided Object Recognition Approach for Radar Systems. IEEE Trans. Veh. Technol. 2022, 72, 5715–5731. [Google Scholar] [CrossRef]
Zhang, Z.; Lu, X.; Cao, G.; Yang, Y.; Jiao, L.; Liu, F. ViT-YOLO: Transformer-Based YOLO for Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2799–2808. [Google Scholar]
Park, M.; Bak, J.; Park, S. Small and Overlapping Worker Detection at Construction Sites. Autom. Constr. 2023, 151, 104856. [Google Scholar] [CrossRef]
Ouardirhi, Z.; Mahmoudi, S.A.; Zbakh, M. Enhancing Object Detection in Smart Video Surveillance: A Survey of Occlusion-Handling Approaches. Electronics 2024, 13, 541. [Google Scholar] [CrossRef]
Kim, D.; Yu, X.; Xiong, S. Robust Skeleton-Based AI for Automatic Multi-Person Fall Detection on Construction Sites with Occlusions. Autom. Constr. 2025, 175, 106216. [Google Scholar] [CrossRef]
Liu, Y.; Zhou, Z.; Wang, Y.; Sun, C. Head-Integrated Detecting Method for Workers under Complex Construction Scenarios. Buildings 2024, 14, 859. [Google Scholar] [CrossRef]
Wang, T.; He, X.; Cai, Y.; Xiao, G. Learning a Layout Transfer Network for Context Aware Object Detection. IEEE Trans. Intell. Transp. Syst. 2019, 21, 4209–4224. [Google Scholar] [CrossRef]
Zhang, W.; Fu, C.; Xie, H.; Zhu, M.; Tie, M.; Chen, J. Global Context Aware RCNN for Object Detection. Neural Comput. Appl. 2021, 33, 11627–11639. [Google Scholar] [CrossRef]
Jiang, C.; Ren, H.; Ye, X.; Zhu, J.; Zeng, H.; Nan, Y.; Sun, M.; Ren, X.; Huo, H. Object Detection from UAV Thermal Infrared Images and Videos Using YOLO Models. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102912. [Google Scholar] [CrossRef]
Dewi, C.; Chen, A.P.S.; Christanto, H.J. Recognizing Similar Musical Instruments with YOLO Models. Big Data Cogn. Comput. 2023, 7, 94. [Google Scholar] [CrossRef]
Chen, I.-C.; Wang, C.-J.; Wen, C.-K.; Tzou, S.-J. Multi-Person Pose Estimation Using Thermal Images. IEEE Access 2020, 8, 174964–174971. [Google Scholar] [CrossRef]
Xiang, H. Lightweight Open Pose Based Body Posture Estimation for Badminton Players. For. Chem. Rev. 2022, 339–350. Available online: http://forestchemicalsreview.com/index.php/JFCR/article/view/923 (accessed on 4 June 2025).
OpenPose’s Evaluation in The Video Traditional Martial Arts Presentation. Available online: https://ieeexplore.ieee.org/abstract/document/8905243 (accessed on 4 June 2025).
Wang, X.; Han, C.; Huang, L.; Nie, T.; Liu, X.; Liu, H.; Li, M. AG-Yolo: Attention-Guided Yolo for Efficient Remote Sensing Oriented Object Detection. Remote Sens. 2025, 17, 1027. [Google Scholar] [CrossRef]
Chen, Y.; Zhu, X.; Li, Y.; Wei, Y.; Ye, L. Enhanced Semantic Feature Pyramid Network for Small Object Detection. Signal Process. Image Commun. 2023, 113, 116919. [Google Scholar] [CrossRef]
Zi, X.; Chaturvedi, K.; Braytee, A.; Li, J.; Prasad, M. Detecting Human Falls in Poor Lighting: Object Detection and Tracking Approach for Indoor Safety. Electronics 2023, 12, 1259. [Google Scholar] [CrossRef]
Ekanayake, B.; Ahmadian Fard Fini, A.; Wong, J.K.W.; Smith, P. A Deep Learning-Based Approach to Facilitate the as-Built State Recognition of Indoor Construction Works. Constr. Innov. 2024, 24, 933–949. [Google Scholar] [CrossRef]
Ahmed, M.; Hashmi, K.A.; Pagani, A.; Liwicki, M.; Stricker, D.; Afzal, M.Z. Survey and Performance Analysis of Deep Learning Based Object Detection in Challenging Environments. Sensors 2021, 21, 5116. [Google Scholar] [CrossRef]
Afif, M.; Said, Y.; Ayachi, R.; Hleili, M. An End-to-End Object Detection System in Indoor Environments Using Lightweight Neural Network. Trait. Signal 2024, 41, 2711. [Google Scholar] [CrossRef]
Ekanayake, B.; Wong, J.K.W.; Fini, A.A.F.; Smith, P.; Thengane, V. Deep Learning-Based Computer Vision in Project Management: Automating Indoor Construction Progress Monitoring. Proj. Leadersh. Soc. 2024, 5, 100149. [Google Scholar] [CrossRef]
Stjepandić, J.; Sommer, M. Object Recognition Methods in a Built Environment. In DigiTwin: An Approach for Production Process Optimization in a Built Environment; Stjepandić, J., Sommer, M., Denkena, B., Eds.; Springer Series in Advanced Manufacturing; Springer International Publishing: Cham, Switzerland, 2022; pp. 103–134. ISBN 978-3-030-77538-4. [Google Scholar]
Wang, Q.; Liu, H.; Peng, W.; Li, C. Accurate Detection of the Workers and Machinery in Construction Sites Considering the Occlusions. In International Conference on Neural Computing for Advanced Applications; Zhang, H., Ke, Y., Wu, Z., Hao, T., Zhang, Z., Meng, W., Mu, Y., Eds.; Communications in Computer and Information Science; Springer Nature Singapore: Singapore, 2023; Volume 1869, pp. 546–560. ISBN 978-981-9958-43-6. [Google Scholar]
Wang, Q.; Liu, H.; Peng, W.; Tian, C.; Li, C. A Vision-Based Approach for Detecting Occluded Objects in Construction Sites. Neural Comput. Appl. 2024, 36, 10825–10837. [Google Scholar] [CrossRef]
Chen, H.; Hou, L.; Zhang, G.K.; Wu, S. Using Context-Guided Data Augmentation, Lightweight CNN, and Proximity Detection Techniques to Improve Site Safety Monitoring under Occlusion Conditions. Saf. Sci. 2023, 158, 105958. [Google Scholar] [CrossRef]
Liang, H.; Seo, S. Automatic Detection of Construction Workers’ Helmet Wear Based on Lightweight Deep Learning. Appl. Sci. 2022, 12, 10369. [Google Scholar] [CrossRef]
Zhang, J.; Qian, S.; Tan, C. Automated Bridge Crack Detection Method Based on Lightweight Vision Models. Complex Intell. Syst. 2023, 9, 1639–1652. [Google Scholar] [CrossRef]
Liu, W.; Zhou, L.; Zhang, S.; Luo, N.; Xu, M. A New High-Precision and Lightweight Detection Model for Illegal Construction Objects Based on Deep Learning. Tsinghua Sci. Technol. 2024, 29, 1002–1022. [Google Scholar] [CrossRef]
Zoph, B.; Cubuk, E.D.; Ghiasi, G.; Lin, T.-Y.; Shlens, J.; Le, Q.V. Learning Data Augmentation Strategies for Object Detection. In Computer Vision—ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2020; Volume 12372, pp. 566–583. ISBN 978-3-030-58582-2. [Google Scholar]
Shi, M.; Chen, C.; Xiao, B.; Seo, J. Vision-Based Detection Method for Construction Site Monitoring by Integrating Data Augmentation and Semisupervised Learning. J. Constr. Eng. Manag. 2024, 150, 04024027. [Google Scholar] [CrossRef]
Rashid, K.M.; Louis, J. Times-Series Data Augmentation and Deep Learning for Construction Equipment Activity Recognition. Adv. Eng. Inform. 2019, 42, 100944. [Google Scholar] [CrossRef]
Kim, H.-S.; Seong, J.; Jung, H.-J. Optimal Domain Adaptive Object Detection with Self-Training and Adversarial-Based Approach for Construction Site Monitoring. Autom. Constr. 2024, 158, 105244. [Google Scholar] [CrossRef]
Weng, X.; Huang, Y.; Li, Y.; Yang, H.; Yu, S. Unsupervised Domain Adaptation for Crack Detection. Autom. Constr. 2023, 153, 104939. [Google Scholar] [CrossRef]
Koo, M.; Kim, T.; Lee, M.; Kim, K.; Kim, H. Domain Adaptation Through Weak and Self-Supervision for Small Object Segmentation in Construction Site Monitoring. SSRN 5200114, 2025. Available online: https://papers.ssrn.com/abstract=5200114 (accessed on 4 June 2025).
Heslinga, F.G.; Ruis, F.; Ballan, L.; van Leeuwen, M.C.; Masini, B.; van Woerden, J.E.; den Hollander, R.J.; Berndsen, M.; Baan, J.; Dijk, J. Leveraging Temporal Context in Deep Learning Methodology for Small Object Detection. In Proceedings of the Artificial Intelligence for Security and Defence Applications, Amsterdam, The Netherlands, 17 October 2023; Volume 12742, pp. 134–145. [Google Scholar]
Jang, H.-C.; Jang, H. A Study on the Web Building Assistant System Using GUI Object Detection and Large Language Model. In Proceedings of the Annual Symposium of KIPS 2024 (ASK 2024 Spring Conference), Pyeongchang, Republic of Korea, 23–25 May 2024; KIPS: Seoul, Republic of Korea, 2024; pp. 830–833. [Google Scholar]
Pu, H.; Yang, X.; Li, J.; Guo, R. AutoRepo: A General Framework for Multimodal LLM-Based Automated Construction Reporting. Expert Syst. Appl. 2024, 255, 124601. [Google Scholar] [CrossRef]
Ahmadi, E.; Muley, S.; Wang, C. Automatic Construction Accident Report Analysis Using Large Language Models (LLMs). J. Intell. Constr. 2025, 3, 1–10. [Google Scholar] [CrossRef]
Ding, Y.; Ma, J.; Luo, X. Applications of Natural Language Processing in Construction. Autom. Constr. 2022, 136, 104169. [Google Scholar] [CrossRef]
Wu, C.; Li, X.; Guo, Y.; Wang, J.; Ren, Z.; Wang, M.; Yang, Z. Natural Language Processing for Smart Construction: Current Status and Future Directions. Autom. Constr. 2022, 134, 104059. [Google Scholar] [CrossRef]

Figure 1. Research framework.

Figure 2. Region of interest and distance-relationship illustration: (a) ROI defined directly beneath the detected boom; and (b) distance relationship

D_{R}

visualised between the boom and the pouring concrete.

Figure 2. Region of interest and distance-relationship illustration: (a) ROI defined directly beneath the detected boom; and (b) distance relationship

D_{R}

visualised between the boom and the pouring concrete.

Figure 3. Skeletal-based pouring height measurement system.

Figure 4. Concrete pouring detection comparison: (a) YOLOv11 complete failure due to boundary ambiguity; (b) Ambient Detection success overcoming boundary ambiguity; (c) YOLOv11 misdetection of non-target objects; and (d) Ambient Detection accurate detection eliminating misdetections.

Table 1. Performance comparison between YOLOv11 and Ambient Detection.

Method	Number of Correctly Recognized Images	Accuracy (%)	F1-Score (%)	Misdetection Rate (%)	Complete Failure Rate (%)	Processing Time (ms/image)
Yolov11 Object Detection	188/232	81.03	89.52	6.03	12.93	45
Ambient Detection	216/232	93.10	96.43	1.72	5.17	65

Table 2. Contingency table for McNemar’s test comparing YOLOv11 and Ambient Detection.

Category	Description	Number of Images
A	Both YOLOv11 and Ambient Detection succeeded	180
B	YOLOv11 succeeded, Ambient Detection failed	8
C	Ambient Detection succeeded, YOLOv11 failed	36
D	Both YOLOv11 and Ambient Detection failed	8
Total		232

Table 3. Ablation study: impact of each relationship factor on detection performance (232 test images).

Configuration	Detection Success	Success Rate (%)	Total Failures	Misdetections	Complete Failure
$D_{n o r m} + A_{n o r m} + S_{R}$	216/232	93.10	16	4	12
$D_{n o r m} + A_{n o r m}$	206/232	88.79	26	6	20
$D_{n o r m} + S_{R}$	203/232	87.50	29	7	22
$A_{n o r m} + S_{R}$	187/232	80.60	45	11	34
$D_{n o r m}$ only	190/232	81.90	42	10	32
$A_{n o r m}$ only	175/232	75.43	57	14	43
$S_{R}$ only	178/232	76.72	54	14	40

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, S.; Kim, H. Relationship-Based Ambient Detection for Concrete Pouring Verification: Improving Detection Accuracy in Complex Construction Environments. Appl. Sci. 2025, 15, 6499. https://doi.org/10.3390/app15126499

AMA Style

Yang S, Kim H. Relationship-Based Ambient Detection for Concrete Pouring Verification: Improving Detection Accuracy in Complex Construction Environments. Applied Sciences. 2025; 15(12):6499. https://doi.org/10.3390/app15126499

Chicago/Turabian Style

Yang, Seungwon, and Hyunsoo Kim. 2025. "Relationship-Based Ambient Detection for Concrete Pouring Verification: Improving Detection Accuracy in Complex Construction Environments" Applied Sciences 15, no. 12: 6499. https://doi.org/10.3390/app15126499

APA Style

Yang, S., & Kim, H. (2025). Relationship-Based Ambient Detection for Concrete Pouring Verification: Improving Detection Accuracy in Complex Construction Environments. Applied Sciences, 15(12), 6499. https://doi.org/10.3390/app15126499

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Relationship-Based Ambient Detection for Concrete Pouring Verification: Improving Detection Accuracy in Complex Construction Environments

Abstract

1. Introduction

2. Methodology

2.1. Research Framework

2.2. Data Collection and Annotation

2.3. ROI Definition Using Boom Detection

2.4. Relationship-Based Weight Analysis

2.5. Pouring Height Measurement System

2.6. Model Training and Optimization

3. Results

3.1. Comparative Performance Analysis

3.2. Qualitative Analysis and Statistical Significance Analysis

3.3. Ablation Study by Relationship Factors

4. Discussion

4.1. Comprehesive Analysis of Object Recognition Performance

4.2. Contributions and Limitations

4.3. Future Research Directions

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI