Detection and Evaluation of Construction Cracks through Image Analysis Using Computer Vision

: The introduction of artiﬁcial intelligence methods and techniques in the construction industry has fostered innovation and constant improvement in the automation of monitoring and control processes at construction sites, although there are areas where more studies still need to be conducted. This paper proposes a method to determine the criticality of cracks in concrete samples. The proposed method uses a previously trained YOLOv4 neural network to identify concrete cracks. Then, the region of interest, determined by the bounding box resulting from the neural network model classiﬁcation, is extracted. Finally, the extracted image is converted to negative grayscale to quantify the number of white pixels above a certain threshold, automatically allowing the system to characterize the fracture’s extent and criticality. The classiﬁcation module reached a veracity between 98.36% and 99.75% when identifying ﬁve concrete crack types of failures in 1132 images. A qualitative analysis of the results obtained from the characterization module shows a promising alternative to evaluate the criticality of concrete cracks.


Introduction
In recent years, the construction industry has sought to develop new methodologies to improve its processes, where innovation plays a substantial role supported by new equipment and the permanent and rapid development of technologies. Innovative methodologies improve the construction processes and guide the automation of the monitoring and control of construction works, achieving a better-quality final product.
In line with the construction process automated monitoring, relevant information can be collected by installing conventional and thermographic cameras (sensitive to heat) and generating images and videos that can be used to detect cracks, microcracks, and other construction problems related to concrete [1], which is one of the most used materials in construction [2]. Although there have been previous studies using the YOLO family to detect and classify cracks [3], they are limited and do not consider the analysis of the detected cracks in concrete to measure the criticality of the failure. Based on this, it is proposed in this work to detect and evaluate construction cracks through image analysis using computer vision.

State of the Art
Several researchers have evaluated the use of artificial intelligence in the construction industry. Refs. [4,5] evaluated the YOLOv4 object identification and classification algorithm to identify objects of interest in construction sites based on images from drones and static cameras. Eight classes of objects were identified in 1000 drone images and 1046 static camera images of a construction site, with an accuracy ranging from 78.8% to 82.8% and 73.56% to 93.76%, respectively. camera and one with a reflex camera to evaluate temperatures. A variation of 2 • C was obtained in the openings and 5 • C in the insulated and thick walls.
A method based on the analysis of thermal images that consists of three phases is proposed by [15]: collecting original and thermal images, estimating the position of the captured image, and updating the 3D model of the construction site in the BIM software. The proposal was applied to a building under construction in Suzhou, China. In conclusion, when processing the original images, it failed to identify the elements of the building due to low image quality, poor light conditions, and image noise. In the case of the mixture of the original and thermal images, a high percentage of object detection (95%) was obtained.
On the other hand, ref. [16] evaluated the application of machine learning and image analysis algorithms to analyze metallic structures from thermographic images. It was found that minor defects can be detected when they are closer to the surface. Therefore, it was suggested to use higher resolution images to obtain more pixels per fault image.
Regarding the characterization of fractures, there is extensive literature on the subject. Still, we can highlight the seminal work of [17], who sought to provide a quantitative description of the fracture surface in polymer-based concrete and concluded that the geometry of a surface fracture was dependent on the scale of observation. More recent works, such as that of [18], sought to understand and characterize the evolution of a concrete fracture process using the Digital Image Correlation technique; they concluded that the size of the fracture area or zone in a piece of concrete is related to the size of the constructive element. On the other hand, ref. [19] used an X-ray scanner to generate computed tomography images to assess the evolution of damage in the internal structure of the concrete under stress situations. Finally, ref. [20] unified the techniques of fractal dimensions and neural networks (YOLOv5, among others) to segment the fractures. However, although the computational complexity of the chosen method was relatively low, they were unable to measure or estimate the size of the fracture.
In this line of research, the development of mechanisms for the automation of the control and monitoring of construction processes is proposed based on the use of artificial intelligence and the capture of images from conventional cameras (PTZ and Bullet) and infrared cameras. The aim is to develop an alert and prediction model that automatically and constantly monitors the construction process during its execution, with the ability to detect deficiencies or failures (cracks) in the concrete.

General Operation
The research's general methodological process is shown in Figure 1. As a first step, the types of failures in concrete structures that fall within the scope of the investigation were determined, as well as the predominant characteristics that allow the detection of such failures by various methods.
The second step was an experimental stage, in which concrete blocks were created considering certain thresholds in the variables analyzed (inputs, process, temperature, times, among others) that have a high probability of generating the failures. For this stage, images were taken in a controlled environment. The purpose of this stage was to generate images for a subsequent analysis that allowed the manual evaluation of the essential characteristics of the failures to be identified in an automated way. Once the visual attributes to be searched for were determined, the artificial intelligence model that identifies the distinctive features of the images was selected.
If the selected artificial intelligence model belongs to the type of supervised learning model, a training stage is necessary based on the images obtained from the concrete specimens (steps three to four). This experimental stage includes a sub-process called focal loss (step five) to reduce the potential imbalance in the dataset between positive and negative images. Since detection speed is not considered a relevant variable in this study, the pruning technique will not be applied. Data augmentation techniques (like brightness modification or images of various sizes) will be used, as proposed by [3]. Once good indicators are obtained (e.g., mAP > 70%), it can proceed to the operational phase, where images of actual construction to be analyzed are obtained using the same thermographic camera and the previously trained model. Finally, as the last step, it is necessary to validate if the results obtained by the system are within acceptable thresholds. negative images. Since detection speed is not considered a relevant variable in this stud the pruning technique will not be applied. Data augmentation techniques (like brightne modification or images of various sizes) will be used, as proposed by [3]. Once good ind cators are obtained (e.g., mAP > 70%), it can proceed to the operational phase, where im ages of actual construction to be analyzed are obtained using the same thermograph camera and the previously trained model. Finally, as the last step, it is necessary to va date if the results obtained by the system are within acceptable thresholds.

YOLOv4
The YOLO ("You Only Look Once") neural network model, version 4, was used f this research. YOLO was presented in 2016 as a single convolutional network for detecti and classifying objects in images and videos in real-time [21]. Using the entire image, can simultaneously generate multiple bounding boxes with their respective object clas fication accuracy scores. This results in a higher processing algorithm speed without com promising the detection accuracy. During the training phase, the model was configur according to the number of classes to be trained; for the first training process, two class were used, so it was set with 4000 batches, based on the formula "n° classes × 2000", wh for the rest of the training five classes were used, so 10,000 batches based on the previo work by [4].

Instruments
The images needed to apply the methodology proposed were obtained from the therm camera in Table 1 and from the videos obtained by the video surveillance equipment Table 2.

YOLOv4
The YOLO ("You Only Look Once") neural network model, version 4, was used for this research. YOLO was presented in 2016 as a single convolutional network for detecting and classifying objects in images and videos in real-time [21]. Using the entire image, it can simultaneously generate multiple bounding boxes with their respective object classification accuracy scores. This results in a higher processing algorithm speed without compromising the detection accuracy. During the training phase, the model was configured according to the number of classes to be trained; for the first training process, two classes were used, so it was set with 4000 batches, based on the formula "n • classes × 2000", while for the rest of the training five classes were used, so 10,000 batches based on the previous work by [4].

Instruments
The images needed to apply the methodology proposed were obtained from the thermal camera in Table 1 and from the videos obtained by the video surveillance equipment in Table 2.

Training
A neural network is trained with the objects and classes that must be identified. This stage is executed once per project. Figure 2 shows the objects to be identified within the images obtained from the construction site.

Training
A neural network is trained with the objects and classes that must be identified. This stage is executed once per project. Figure 2 shows the objects to be identified within the images obtained from the construction site. A frame extraction algorithm was used in the videos where the objects of interest were found to obtain images with the elements to be identified. Then, the neural network

Training
A neural network is trained with the objects and classes that must be identified. This stage is executed once per project. Figure 2 shows the objects to be identified within the images obtained from the construction site. A frame extraction algorithm was used in the videos where the objects of interest were found to obtain images with the elements to be identified. Then, the neural network

Training
A neural network is trained with the objects and classes that must be identified. This stage is executed once per project. Figure 2 shows the objects to be identified within the images obtained from the construction site. A frame extraction algorithm was used in the videos where the objects of interest were found to obtain images with the elements to be identified. Then, the neural network A frame extraction algorithm was used in the videos where the objects of interest were found to obtain images with the elements to be identified. Then, the neural network was              The frame extraction process has as input information the videos obtained from video surveillance cameras. A total of 2929 images were obtained from the video compilation process, including 1611 beams and 1318 test tubes. A total of 1132 images contained concrete cracks. The resulting images shown in Figure 6 have a dimension of 3840 × 2160 pixels, with a resolution of 96 dpi. Next, YOLOv4 reset image sizes to 416 × 416 pixels in its initial processing stages. Lower input image resolutions can cause fuzzy features in small cracks, which is not recommended [3]. For the execution of the Algorithm 1 [24], the IDLE development environment with the Python programming language version 3.7 was used. The frame extraction process has as input information the videos obtained from video surveillance cameras. A total of 2929 images were obtained from the video compilation process, including 1611 beams and 1318 test tubes. A total of 1132 images contained concrete cracks. The resulting images shown in Figure 6 have a dimension of 3840 × 2160 pixels, with a resolution of 96 dpi. Next, YOLOv4 reset image sizes to 416 × 416 pixels in its initial processing stages. Lower input image resolutions can cause fuzzy features in small cracks, which is not recommended [3]. For the execution of the Algorithm 1 [24], the IDLE development environment with the Python programming language version 3.7 was used.

Manual Image Classification
LabelImg was used for manual classification. LabelImg is an open-source graphical image tool on GitHub that provides image annotation mechanisms by labeling bounding boxes of objects. This tool is available for Linux, Windows, and macOS operating systems. It is written in Python and uses Qt as a graphical interface [25].
According to [26], cracks can be classified according to their origin and moment of appearance: • Cracks originated in the plastic state.

Manual Image Classification
LabelImg was used for manual classification.
LabelImg is an open-source graphical image tool on GitHub that provides image annotation mechanisms by labeling bounding boxes of objects. This tool is available for Linux, Windows, and macOS operating systems. It is written in Python and uses Qt as a graphical interface [25].
According to [26], cracks can be classified according to their origin and moment of appearance:

•
Cracks originated in the plastic state. Cracks originated from the plastic settlement due to four factors: little coating and excessive diameters in the steel, changes in consistency in continuous pours, displacement of the formwork, and deformation of the supporting ground. • Cracks originated in the hardened state, among which cracks originated by spontaneous movements caused by: contraction due to carbonation and thermal shrinkage; numbness due to thermal expansion, excessive oxidation of reinforcing steel, or excess expansive in cement; and alkali-aggregate reaction.
Another case of cracks in the hardened state is produced by loads caused by compression, traction, bending, shear, and torsion stresses. Figure 7 shows the types of compression failures that can be generated in concrete. Cracks originated in the hardened state, among which cracks originated by sponta neous movements caused by: contraction due to carbonation and thermal shrinkag numbness due to thermal expansion, excessive oxidation of reinforcing steel, or ex cess expansive in cement; and alkali-aggregate reaction.
Another case of cracks in the hardened state is produced by loads caused by com pression, traction, bending, shear, and torsion stresses. Figure 7 shows the types of com pression failures that can be generated in concrete. The images obtained from the process indicated in Section 2.5.1 were saved in a folde with the classes.txt file containing the classified classes. The manual image classificatio process was conducted with 1132 images and five classes (Beam_bending_failure, Co umn_type2_failure, Column_type3_failure, Column_type4_failure, and Co umn_type5_failure) to classify the types of cracks in beams and test tubes within the im ages. Table 2 shows the five classified classes and the region of interest where the com pression failures were generated. Figure 8 shows how the region of interest was selecte using the software LabelImg.  The images obtained from the process indicated in Section 2.5.1 were saved in a folder with the classes.txt file containing the classified classes. The manual image classification process was conducted with 1132 images and five classes (Beam_bending_failure, Column_type2_failure, Column_type3_failure, Column_type4_failure, and Column_type5_failure) to classify the types of cracks in beams and test tubes within the images. Table 2 shows the five classified classes and the region of interest where the compression failures were generated. Figure 8 shows how the region of interest was selected using the software LabelImg.

Neural Network Training
All training processes were performed locally on a computer with an Intel Core i7 processor, 64 GB RAM, and NVIDIA GeForce RTX 2080 graphics card.
The manually classified images and their .txt files were distributed randomly in two folders for training and validation. The training folder contained 70% of the images, and the validation folder included 30%.
In the first instance, initial training was carried out where 200 images were used, so the amount in the training folder was 140, while the validation folder contained 60. The training was carried out with only the necessary classes for the domain under study, "Crack" and "No_crack." The training process for the 200 images lasted approximately 20 h, obtaining 4 ".weight" files. The yolov4_custom_best.weights file had the best mAP (Mean Average Precision) result, with 89.77%. The yolov4_custom_best.weights file was used for evaluation on two images, IMG75 and IMG175, for the detection and classification of manually classified classes "Crack" and "No_crack." Figure 9 shows the classification and detection of the classes "No_Crack" and "Crack" in images (a) and (b), respectively. The first class was detected with 98% accuracy and the second with 99% accuracy. process was conducted with 1132 images and five classes (Beam_bending_failure, Col-umn_type2_failure, Column_type3_failure, Column_type4_failure, and Col-umn_type5_failure) to classify the types of cracks in beams and test tubes within the images. Table 2 shows the five classified classes and the region of interest where the compression failures were generated. Figure 8 shows how the region of interest was selected using the software LabelImg.

Type Image Area Detail
Type 2 process was conducted with 1132 images and five classes (Beam_bending_failure, Col-umn_type2_failure, Column_type3_failure, Column_type4_failure, and Col-umn_type5_failure) to classify the types of cracks in beams and test tubes within the images. Table 2 shows the five classified classes and the region of interest where the compression failures were generated. Figure 8 shows how the region of interest was selected using the software LabelImg.

Neural Network Training
All training processes were performed locally on a computer with an Intel Core i7 processor, 64 GB RAM, and NVIDIA GeForce RTX 2080 graphics card.
The manually classified images and their .txt files were distributed randomly in two folders for training and validation. The training folder contained 70% of the images, and the validation folder included 30%.
In the first instance, initial training was carried out where 200 images were used, so the amount in the training folder was 140, while the validation folder contained 60. The training was carried out with only the necessary classes for the domain under study "Crack" and "No_crack." The training process for the 200 images lasted approximately 20 h, obtaining 4 ".weight" files. The yolov4_custom_best.weights file had the best mAP (Mean Average Precision) result, with 89.77%. The yolov4_custom_best.weights file was used for evaluation on two images, IMG75 and IMG175, for the detection and classification of manually classified classes "Crack" and "No_crack." Figure 9 shows the classification and detection of the classes "No_Crack" and "Crack" in images (a) and (b), respectively. The first class was detected with 98% accuracy and the second with 99% accuracy.
Four training sessions were carried out for training with the total number of images (1132 files). All images and corresponding .txt files were distributed among the training (792 images) and validation (340 images) folders.

Neural Network Training
All training processes were performed locally on a computer with an Intel Core i7 processor, 64 GB RAM, and NVIDIA GeForce RTX 2080 graphics card.
The manually classified images and their .txt files were distributed randomly in two folders for training and validation. The training folder contained 70% of the images, and the validation folder included 30%.
In the first instance, initial training was carried out where 200 images were used, so the amount in the training folder was 140, while the validation folder contained 60. The training was carried out with only the necessary classes for the domain under study, "Crack" and "No_crack." The training process for the 200 images lasted approximately 20 h, obtaining 4 ".weight" files. The yolov4_custom_best.weights file had the best mAP (Mean Average Precision) result, with 89.77%. The yolov4_custom_best.weights file was used for evaluation on two images, IMG75 and IMG175, for the detection and classification of manually classified classes "Crack" and "No_crack." Figure 9 shows the classification and detection of the classes "No_Crack" and "Crack" in images (a) and (b), respectively. The first class was detected with 98% accuracy and the second with 99% accuracy.
Four training sessions were carried out for training with the total number of images (1132 files). All images and corresponding .txt files were distributed among the training (792 images) and validation (340 images) folders.  Four training sessions were carried out for training with the total number of images (1132 files). All images and corresponding .txt files were distributed among the training (792 images) and validation (340 images) folders.
The training was processed only with the classes necessary for the domain under study. In the case of this research, five classes were worked on: Beam_bending_failure, Col-umn_type2_failure, Column_type3_failure, Column_type4_failure and Column_type5_failure. This process was performed with Darknet, an open-source neural network model written in C and CUDA [27].
The best mAP (mean average precision) result from the training process was n • 3 with 99.75%, as shown in Table 3.
The yolov4_custom_best.weights file from training n • 3 was used to evaluate the detection and classification of manually classified objects in Section 2.5 in two images, IMG903 and IMG1076. Figure 10 shows the classification and detection of the objects "Column_type4_failure" and "Beam_bending_failure" in images (a) and (b), respectively. Both objects were detected with 100% accuracy.  Table 3. The yolov4_custom_best.weights file from training n°3 was used to evaluate the detection and classification of manually classified objects in Section 2.5 in two images, IMG903 and IMG1076. Figure 10 shows the classification and detection of the objects "Column_type4_failure" and "Beam_bending_failure" in images (a) and (b), respectively. Both objects were detected with 100% accuracy.

Fracture Characterization
Static images were considered to identify fractures in the concrete since it was not intended to evaluate the dynamics of the fracture. Unlike previous fracture characterization works that use fractal dimensioning [18] or X-ray devices [20], in this work, the classification results from the neural network were based on the coordinates in the X and Y axes of the "bounding box." Then, using computer vision functions, the image that contained only the previously detected fracture was extracted, which was converted to grayscale and then to negative so that the fracture was established with pixels with a certain tendency to white. Then, the number of pixels whose chromatic values in gray were above an arbitrary value (in this case, 160, where 255 represents the white color) were counted in such a way that the greater the number of "white" pixels, the greater the estimated area of the fracture and, consequently, its criticality. Figures 11-13 show the classified image resulting from the neural network with the bounding box (a) and their classification accuracy value, 100% of accuracy for the

Fracture Characterization
Static images were considered to identify fractures in the concrete since it was not intended to evaluate the dynamics of the fracture. Unlike previous fracture characterization works that use fractal dimensioning [18] or X-ray devices [20], in this work, the classification results from the neural network were based on the coordinates in the X and Y axes of the "bounding box." Then, using computer vision functions, the image that contained only the previously detected fracture was extracted, which was converted to grayscale and then to negative so that the fracture was established with pixels with a certain tendency to white. Then, the number of pixels whose chromatic values in gray were above an arbitrary value (in this case, 160, where 255 represents the white color) were counted in such a way that the greater the number of "white" pixels, the greater the estimated area of the fracture and, consequently, its criticality. Figures 11-13 show the classified image resulting from the neural network with the bounding box (a) and their classification accuracy value, 100% of accuracy for the detection and correct classification of the classes "Column_type3_failure", "Beam_bending_failure" and "Column_type4_failure", respectively. The fracture section extracted from the bounding box is in (b), and the resulting image converted to grayscale and negative in (c). The number of pixels above the arbitrarily defined threshold (160) was 650 in Figure 11c, 62 in Figure 12c, and 7257 in Figure 13c.
detection and correct classification of the classes "Column_type3_failure", "Beam_bend-ing_failure" and "Column_type4_failure", respectively. The fracture section extracted from the bounding box is in (b), and the resulting image converted to grayscale and negative in (c). The number of pixels above the arbitrarily defined threshold (160) was 650 in Figure 11c, 62 in Figure 12c, and 7257 in Figure 13c.  detection and correct classification of the classes "Column_type3_failure", "Beam_bend-ing_failure" and "Column_type4_failure", respectively. The fracture section extracted from the bounding box is in (b), and the resulting image converted to grayscale and negative in (c). The number of pixels above the arbitrarily defined threshold (160) was 650 in Figure 11c, 62 in Figure 12c, and 7257 in Figure 13c.

Discussion
The mAP value results were higher than 96.6% (Table 3), indicating a high precision of the neural network model (YOLOv4) used to identify cracks in concrete. As such, it is possible to use this framework in contexts where the classes to be identified, in this case, cracks, have visual characteristics with poorly defined elements, unlike, for example, a car or a person. On the other hand, it is considered that the number of images used for training (792) is an important characteristic. Furthermore, a greater number and variety of images introduced during training can improve model performance. For example, refs. [10] and [28] used between 3000 and 3500 images, while [9] and [29] used 40,000 and 12,000 images, respectively.
Although the YOLOv4 neural network model has been previously used in other investigations within construction processes, such as those of [4][5][6][7][8], its use for detecting cracks in concrete has only been studied to a lesser extent. Research by [3] using YOLOv5 obtained high precision (mAP = 0.976), achieved using focal loss, pruning, and data scaling techniques during image preprocessing. In the case of the present investigation, pruning was not used, in addition to a relatively limited dataset of 1132 images, with 792 used for training. Since the results obtained are (mAP between 96% and 99%) above the minimum expected threshold (mAP = 0.70), they demonstrate that the use of a model with better performance characteristics, such as YOLOv5, YOLOv6, or YOLOv7, together with a higher amount and variability of training images can achieve better model performance.
The possibility that the current model suffers from underfitting (high bias and low variation) may cause the lower performance of the model; however, it can be overcome either by modifying the parameters of the model (for example, by increasing the number of epochs or stages) or by increasing the training data.
Regarding the characterization of the fracture, in which a simple count of pixels with a chromatic value greater than 160 has been used, the results indicate the feasibility of use in certain conditions. First, the general "whiteness" of the negative greyscale image affects the pixel count, as can be inferred from Figure 13c, which contained several white pixels comparatively higher than the other two samples (7257 in Figure 13c versus 650 and 62 pixels in Figures 11c and 12c, respectively). This evidences the need to calculate the average "white" value to determine the threshold. Second, while the image size was not a variable in this study, it can affect the results if images of different sizes are compared against each other.

Conclusions
This study demonstrates the feasibility of utilizing a specialized neural network model, YOLOv4, to effectively detect and assess various types of cracks and fractures in

Discussion
The mAP value results were higher than 96.6% (Table 3), indicating a high precision of the neural network model (YOLOv4) used to identify cracks in concrete. As such, it is possible to use this framework in contexts where the classes to be identified, in this case, cracks, have visual characteristics with poorly defined elements, unlike, for example, a car or a person. On the other hand, it is considered that the number of images used for training (792) is an important characteristic. Furthermore, a greater number and variety of images introduced during training can improve model performance. For example, refs. [10] and [28] used between 3000 and 3500 images, while [9] and [29] used 40,000 and 12,000 images, respectively.
Although the YOLOv4 neural network model has been previously used in other investigations within construction processes, such as those of [4][5][6][7][8], its use for detecting cracks in concrete has only been studied to a lesser extent. Research by [3] using YOLOv5 obtained high precision (mAP = 0.976), achieved using focal loss, pruning, and data scaling techniques during image preprocessing. In the case of the present investigation, pruning was not used, in addition to a relatively limited dataset of 1132 images, with 792 used for training. Since the results obtained are (mAP between 96% and 99%) above the minimum expected threshold (mAP = 0.70), they demonstrate that the use of a model with better performance characteristics, such as YOLOv5, YOLOv6, or YOLOv7, together with a higher amount and variability of training images can achieve better model performance. The possibility that the current model suffers from underfitting (high bias and low variation) may cause the lower performance of the model; however, it can be overcome either by modifying the parameters of the model (for example, by increasing the number of epochs or stages) or by increasing the training data.
Regarding the characterization of the fracture, in which a simple count of pixels with a chromatic value greater than 160 has been used, the results indicate the feasibility of use in certain conditions. First, the general "whiteness" of the negative greyscale image affects the pixel count, as can be inferred from Figure 13c, which contained several white pixels comparatively higher than the other two samples (7257 in Figure 13c versus 650 and 62 pixels in Figures 11c and 12c, respectively). This evidences the need to calculate the average "white" value to determine the threshold. Second, while the image size was not a variable in this study, it can affect the results if images of different sizes are compared against each other.

Conclusions
This study demonstrates the feasibility of utilizing a specialized neural network model, YOLOv4, to effectively detect and assess various types of cracks and fractures in concrete beams and columns. The achieved precision, measured by mean average precision (mAP), ranged from 96.62% to 99.75%. The model was trained on a modest set of high-resolution images (792 images, 3840 × 2160 pixels, 96 dpi).
To further enhance the neural network's performance, the following recommendations are proposed:

•
Consider adopting more recent YOLO model iterations, such as YOLOv6 or YOLOv7. • Fine-tune the model's operational parameters to prioritize performance gains over shorter training times. • Apply pruning to reduce the computational cost of the neural network, as well as other preprocessing and data augmentation techniques.

•
Expand the training dataset significantly, encompassing over 2000 images.

•
Introduce greater diversity to the images, incorporating different concrete compositions, varied angles, and lighting conditions.
Concurrently, an alternative mechanism demanding minimal computational resources has been suggested for evaluating fracture criticality. This approach involves a grayscale conversion and pixel chromatic value comparison to gauge the significance of a fracture based on pixels near the target in a converted grayscale image. A comprehensive analysis is recommended to validate the proposed method's precision, contrasting automatic pixelbased assessments with manual or visual evaluations.
However, it is important to note some limitations of the proposed method that could be part of relevant future research:

•
The method is limited to static evaluations and does not account for dynamic fracture changes over time.

•
Factors like the size of the analyzed structural element, lighting conditions, porosity of the building material, and camera quality and positioning influence the outcomes. Additional experimentation with these variables is essential to establish standardized fracture criticality scales.

•
The implemented methodology is limited to applying the pre-existing YOLOv4 algorithm in a customized database for detecting and evaluating cracks in construction, so preprocessing techniques such as pruning have not been applied.
In conclusion, the proposed mechanism is not designed to offer an exact quantitative assessment of crack size. Instead, its primary purpose is to identify regions warranting closer in situ examination. This application holds particular significance for hard-to-reach structures like bridges or dams, where automated preliminary identification via drones proves advantageous.