Automated Crack Detection in 2D Hexagonal Boron Nitride Coatings Using Machine Learning

: Characterizing defects in 2D materials, such as cracks in chemical vapor deposited (CVD)- grown hexagonal boron nitride (hBN), is essential for evaluating material quality and reliability. Traditional characterization methods are often time-consuming and subjective and can be hindered by the limited optical contrast of hBN. To address this, we utilized a YOLOv8n deep learning model for automated crack detection in transferred CVD-grown hBN films, using MATLAB’s Image Labeler and Supervisely for meticulous annotation and training. The model demonstrates promising crack-detection capabilities, accurately identifying cracks of varying sizes and complexities, with loss curve analysis revealing progressive learning. However, a trade-off between precision and recall highlights the need for further refinement, particularly in distinguishing fine cracks from multilayer hBN regions. This study demonstrates the potential of ML-based approaches to streamline 2D material characterization and accelerate their integration into advanced devices.


Introduction
Two-dimensional (2D) materials possess a unique set of properties due to their reduced dimensionality, making them promising candidates for replacing traditional materials in cutting-edge electronics, photonics, and nanoelectromechanical systems [1].Hexagonal boron nitride (hBN) is a promising dielectric material due to its wide bandgap (>5 eV), high transparency, and unique mechanical properties [2][3][4].Its broad range of applications spans optoelectronics, solid-state neutron detectors, field-effect transistors, tunneling devices, electron emitters, deep UV emitters, photonic devices, switching/memory devices, super capacitors, and environmental monitoring [5][6][7][8].The ability to fabricate heterostructures with tailored properties further amplifies the potential of hBN and other 2D materials [9,10].Notably, hBN serves as a complementary 2D substrate for graphene-based electronics, enhancing their performance [11,12].In these heterostructures, hBN plays a crucial role by encapsulating other 2D materials like graphene, thereby improving their performance and enabling the study of new physical phenomena [13,14].
Chemical vapor deposition (CVD) has emerged as the most suitable technique for synthesizing atomically thin hBN films and heterostructures on a large scale while maintaining material quality.The CVD technique is deployed to synthesize hBN films onto various metallic substrates, including Cu, Ni, and Pt [2, [15][16][17][18][19].The fabrication and characterization of devices based on 2D materials frequently necessitate the use of non-metal substrates, which in turn involve a variety of transfer techniques to move the synthesized materials from the growth substrate to the desired target substrate.CVD growth followed by wet-etch transfer methods enable the integration of large-area, high-quality hBN films into advanced electronic devices, such as dielectric layers for graphene devices, field-effect transistors, and thermoelectric devices [3,12,16,20].
However, the wet transfer process can introduce defects such as cracks, wrinkles, and polymer residues, which can significantly degrade the material's properties and hinder its performance in devices [21].Cracks, in particular, can severely compromise mechanical strength, hBN's barrier properties, and introduce charge scattering sites [22][23][24].These cracks may arise from stresses during transfer or from bubbles formed during the etching process that induce strain upon the film [25].Current crack-detection methods relying on manual inspection and sophisticated characterization tools are time-consuming and costly.These methods are prone to human error, especially with large-area samples or highthroughput production.While wrinkles and polymer residues present their own challenges, this study focuses specifically on automating the detection of cracks in transferred CVDgrown hBN films, as addressing crack identification holds immediate value for quality control and accelerating the integration of hBN into advanced devices.The limitations of current crack-detection methods highlight the need for an automated, rapid, and costeffective approach to enable efficient quality control and facilitate the widespread adoption of hBN in advanced electronic devices.
To fully harness the potential of hBN and other 2D materials, efficient and reliable characterization techniques are essential.However, the large-scale characterization of 2D materials remains a challenge due to the complexity of traditional methods.Their atomic-level thinness demands specialized techniques that can be intricate and time-consuming [26].The lack of standardized characterization protocols further hinders the comparison of results across studies.Moreover, 2D materials are highly sensitive to defects, the underlying substrate, and their interfaces with other materials, making consistent characterization even more difficult [27].
Traditional characterization techniques often involve laborious manual assessment and rely heavily on domain expertise, leading to potential subjectivity, errors, and slow analysis.Optical microscopy, though a common initial tool due to its speed, accessibility, and non-destructive nature, has limitations [28].These include resolution constraints, difficulty in providing precise quantitative information, and sensitivity to external factors like the substrate, which can complicate the analysis of 2D materials [29].Unlike graphene, hBN's wide bandgap results in low optical contrast, especially on standard SiO 2 substrates.Additionally, the contrast fluctuates across the visible spectrum, with a near-zero value in the green region, where the human eye is most sensitive [30].Hence hBN characterization requires more time and domain expertise.To overcome these limitations and accelerate hBN's widespread adoption, we introduce a machine learning approach capable of rapid and automated crack detection in CVD-grown hBN.
The rapid development of machine learning (ML), along with new algorithms, growing datasets, and increased computational power, is revolutionizing various research areas [31][32][33][34][35][36].ML is transforming 2D material sciences.By analyzing complex datasets and relationships between properties, ML can streamline characterization efforts that are traditionally time-consuming and require domain expertise.This makes ML particularly valuable for predicting properties, guiding the discovery of novel 2D materials, and optimizing their integration into cutting-edge applications [37].Importantly, ML accelerates the development of 2D materials by streamlining data collection, analysis, and the exploration of structure-property relationships.This overcomes the limitations of traditional methods that often rely on extensive, time-consuming experiments, ultimately leading to the discovery of new materials and tailored optimization for specialized applications [38,39].
While ML applications in 2D materials' optical characterization are still evolving, much of the existing work centers on graphene and exfoliated samples, primarily focusing on distinguishing graphene flakes' thickness and identifying different 2D material flakes and their heterostructures [40][41][42][43][44][45][46][47].Ramezani et al. [48] focused on identifying exfoliated hBN flakes via deep learning, while our previous study [49] utilized unsupervised models to distinguish multilayer hBN regions.While these models have shown promise, their original designs may not be ideal for identifying hBN crack morphologies, especially those found in CVD-grown hBN due to diverse crack morphologies from transfer processes, substrate effects, and potential interference from other defects like wrinkles or residues.ML-based analysis has been applied in multiple instances to identify point defects in 2D materials, but these defect identification methods require sophisticated tools, such as transmission electron microscopy (TEM) [50][51][52].
Our study shifts the focus specifically to hBN, particularly addressing the growing use of large-area CVD synthesis in the production of 2D materials [2, 15,16].With successful large-area transfer techniques to arbitrary substrates [53,54], our research aims to tackle the bottleneck of efficient large-area characterization of these transferred CVD-grown hBN samples.By developing an ML-based approach, we strive to streamline this process and facilitate the widespread application of hBN.
To address this challenge of crack detection in CVD-grown hBN, this study utilizes the power of YOLOv8n deep learning models.Our proposed algorithm can process optical microscope images with a resolution of 1024 × 768 pixels in just 291 ms, enabling the realtime detection of cracks in hBN films.To the best of our knowledge, this study is among the first to apply ML for crack identification in transferred CVD-grown hBN samples.This work has the potential to streamline quality control and accelerate the use of hBN in advanced devices.

Materials and Methods
Having highlighted the limitations of traditional hBN crack detection and the promise of machine learning, we now present a comprehensive methodology designed to address these challenges.This section presents an integrated workflow, encompassing material synthesis, dataset acquisition, annotation, model development and optimization, and evaluation.Our solution, outlined in Figure 1, leverages the power of the YOLOv8n object detection model to accurately identify crack regions within transferred CVD-grown hBN samples.The following subsections provide detailed insights into each stage of the process, from sample preparation and data collection to model training, tuning, and performance assessment.
For sample preparation, a PMMA solution (4.5% in anisole) was spin-coated onto 1 × 1 cm 2 hBN/Cu foil sections at 2500 rpm for 1 min to create a protective layer.The PMMA/hBN/Cu samples were then placed on a 0.15-0.2M potassium persulfate etchant solution to dissolve the copper foil, leaving the PMMA/hBN film floating.Thorough rinsing with DI water (5 min per cycle, repeated three times) removed residual etchant.The floating PMMA/hBN film was carefully transferred to a prepared Si/SiO 2 substrate (1.5 × 1.5 cm 2 ).Air-drying for 30 min and heating at 100 °C for 20 min promoted adhesion before the PMMA layer was dissolved in acetone.Finally, the sample was immersed in acetone for 30 min to dissolve the PMMA layer, leaving the hBN film on the Si/SiO 2 substrate.This process yielded a total of 10 multilayer hBN samples for subsequent characterization and analysis.

Dataset Acquisition
To build the image dataset, a VK-X250 laser confocal scanning microscope (CLSM) (Keyence Corp, Itasca, IL, USA) with a 10× objective was used to capture optical images of the transferred hBN films.A total of 150 images were collected, specifically focused on MLhBN regions.To ensure consistency and comparability within the dataset, uniform camera settings, image size, and intensity were maintained during the acquisition process.Images were initially acquired in VK4 file format and then converted to the widely compatible PNG format using the multi-file analysis application (VK-H1XME) software.This conversion preserved the high resolution for subsequent analysis.All images have a final size of 1.4 mm × 1 mm.

Annotation
Careful data preprocessing and labeling are crucial for developing accurate supervised models.MATLAB R2023b's Image Labeler was employed to meticulously annotate images, defining regions of interest for ground truth comparison.The annotated images were saved in both MATLAB project file and JSON formats for flexibility.To further refine the annotations, the Supervisely platform was utilized.For robust model evaluation, the dataset was strategically divided into two groups:

•
Training Set (92 images): The core dataset used to train the machine learning model.

Model Development and Optimization
Fine-tuning was employed on the pre-trained YOLOv8n-det model provided by Ultralytics to detect crack regions within the datasets [57].The YOLOv8 architecture represents a series of enhancements and extensions introduced by Ultralytics to the YOLOv5 framework.These improvements primarily focus on scaling adjustments and architectural refinements, aiming to augment the model's performance and capabilities.The architecture consists of three major parts: backbone, neck, and head.

Backbone
The backbone layer is composed of a series of convolutional neural networks (CNNs) trained more effectively with reduced computational cost using the Cross-Stage Partial (CSP) architecture design [58].The model uses a C2f module consisting of 2 ConvModule and n DarknetBottleneck allowing the model to collect richer gradient flow information.The ConvModule consists of Conv-BN-SiLU, and n is the number of bottlenecks, as shown in Figure 2. Additionally, the model adopts the Spatial Pyramid Pooling-Fast (SPPF) module, allowing an improved inference speed of the model.

Neck
Typically, networks with more layers are able to extract a greater range of feature information, which can enhance the quality of dense predictions.Yet, when networks become overly deep, they may start to lose crucial spatial details of objects, particularly if there are excessive convolution operations, which could lead to a loss of information about smaller objects.To mitigate this, it is beneficial to employ Feature Pyramid Network (FPN) [60] and Path Aggregation Network (PAN) [61], which enable the fusion of features across multiple scales.As depicted in Figure 2, the neck component of the architecture integrates features from various network levels.This process enriches the feature information in the higher levels thanks to additional layers, while the initial layers retain more spatial details due to having undergone fewer convolutions.

Head
The model features a decoupled head, separating the processes of classification and detection into distinct components.The architecture simplifies this by maintaining only the branches for classification and regression.Unlike techniques that use a predefined set of anchors to ascertain object locations by calculating offsets, this model implements an anchor-free method.This method locates the center of the object and gauges the distances to the edges of the bounding box, thereby refining the prediction of the object's location without relying on anchors.

Loss
The YOLOv8 algorithm employs the Task-Aligned Assigner from TOOD [62] for designating both negative and positive samples.It selects positive samples by considering a combination of weighted scores from both classification and regression, as delineated in the subsequent equation.
where s is the predicted score for the given class label, and u signifies the Intersection over Union (IoU) between the predicted and actual bounding boxes.The model comprises separate classification and regression branches.For classification, Binary Cross-Entropy (BCE) Loss is used, as shown in the following equation: where w represents the weight, y n is the true label, and x n is the model's predicted value.
For regression, the model applies Distributed Focal Loss (DFL) and Complete IoU (CIoU) Loss.The DFL is aimed at refining the probability distribution around the object y: where S n and S n+1 are probabilities for the ground truth and are computed as CIoU Loss integrates an additional term to account for the aspect ratio of the bounding boxes [59]: with ν as the measure of aspect ratio consistency, which is defined by Here, w represents the width of bounding box, and h represents the height of the bounding box.

Data Augmentation and Model Optimization
In the course of training the model, the Supervisely platform served as the primary environment.A variety of data augmentation techniques were employed to enhance the robustness and generalizability of the model.These techniques included the following.
(a) HSV (Hue, Saturation, Value) Augmentation: This method adjusts the color properties of images to simulate a wider range of lighting conditions and object appearances.The transformations can be represented mathematically as where H, S, and V are the original hue, saturation, and value components of the image pixels, respectively.H ′ , S ′ , and V ′ are the augmented components, and ∆H, ∆S, and ∆V represent small, random perturbations applied to each channel.(b) Translation: This technique shifts the image by a certain number of pixels horizontally and vertically, introducing variability in object positioning within the frame.The translation operation can be described by the transformation matrix: where ∆x and ∆y denote the horizontal and vertical displacements, respectively.
(c) Scaling: Scaling alters the size of the image, simulating objects at different distances from the viewer.This operation can be mathematically represented by the scaling matrix: where α and β are scaling factors for the width and height of the image, respectively.(d) Flipping Operations: Flipping operations mirror the image either horizontally, vertically, or both to simulate different orientations of objects.The flipping transformation can be represented as a reflection matrix, for example, for horizontal flipping: where W is the width of the image, ensuring the flipped image remains within the original dimensions.
Each of these augmentation techniques introduces variations in the training dataset, thus enabling the model to learn more generalized features and improving its performance on unseen data.
The model was trained over 300 epochs in the fine-tune mode in the Supervisely platform with NVIDIA GeForce RTX 3080 GPU, input image size of 640, and a batch size of 8, employing an SGD optimizer and the hyperparameters detailed in Table 1.An early stopping technique was employed to select the best weights.

Evaluation
Precision, recall, a confusion matrix, mAP50, and the mA50-95 score were used to assess the model's performance during training.Precision measures the reliability of detections, indicating the certainty that an identified crack is indeed a true positive (TP).Conversely, a false positive (FP) occurs when the detector wrongly identifies a crack.Recall measures how well the detector finds true cracks, reflecting its ability to avoid false negatives (FNs) where cracks are missed.These classifications (TP, FP, TN, FN), visualized in the confusion matrix, directly influence the precision and recall scores.
The formulas for precision and recall are defined as follows: Precision (P) = True Positives True Positives + False Positives (11) Recall (R) = True Positives True Positives + False Negatives (12) Confusion matrices provide a comprehensive summary of the model's predictions compared to ground truth labels.This visual tool facilitates the assessment of accuracy, precision, recall, and other performance metrics.
To provide an even more robust assessment, we extend our evaluation with the mAP50 and mA50-95 scores.mAP50 (mean average precision at IoU 50%) measures the average precision of object detection across different classes at a specific Intersection over Union (IoU) threshold of 50%.IoU quantifies the overlap between the predicted bounding boxes and the ground truth annotations.A higher mAP50 value indicates better localization accuracy and precision of detected cracks, making it particularly useful for evaluating the model's performance when cracks exhibit varying sizes and shapes.
For a single class, average precision (AP) is the area under the precision-recall curve.mAP50 is computed by taking the mean of the AP values across all classes at an IoU threshold of 50%.The formulas for IoU and AP calculation are as follows: area of overlap between A and B area of union of A and B (13) where N is the number of classes and AP i 50 is the AP calculated at an IoU threshold of 50% for class i.
The mA50-95 score (mean average precision across different IoU thresholds from 50% to 95%) goes beyond a single IoU threshold.It provides a comprehensive assessment of the model's performance across a range of IoU values, offering insights into its robustness and generalization ability.A higher mA50-95 score signifies superior detection performance across diverse conditions, indicating the model's effectiveness in accurately detecting cracks.
The selection of appropriate evaluation metrics is crucial for obtaining a comprehensive understanding of a machine learning model's performance.This study demonstrates the value of a multi-faceted assessment using precision, recall, confusion matrix, mAP50, and mA50-95 scores.These metrics collectively reveal the model's reliability in identifying true cracks, its ability to minimize missed detections, and its performance across varying crack characteristics.These insights collectively provide a robust basis for model optimization, ensuring its effectiveness in reliable crack detection for hBN characterization tasks.

Results
The following section presents a comprehensive analysis of the proposed YOLOv8n model's performance in crack-detection tasks.The aim is to assess the model's strengths, limitations, and learning dynamics through a combination of visual inspection, quantitative metrics, and loss curve analysis.

hBN Film Characterization
Figure 3 provides a comprehensive analysis of multilayer hexagonal boron nitride films transferred on a Si/SiO 2 substrate.Figure 3a shows an optical image of the multilayer hBN sample, while Figure 3b presents a scanning electron microscopy (SEM) image highlighting the edges of the hBN and the underlying Si/SiO 2 topography.
Figure 3c displays an atomic force microscopy (AFM) height image, which zooms in on the region shown in Figure 3a.The line scan profiles obtained from the AFM data are presented in Figure 3d, revealing the presence of two distinct hBN layers with varying thicknesses.The first line, labeled "Line 1", corresponds to a thick hBN layer with an average thickness of approximately 11.45 nm, while the second line, "Line 2", represents a thinner hBN layer with an average thickness of 4.72 nm.To further characterize the sample, Raman spectroscopy measurements were performed.Figure 3e shows the Raman spectrum obtained from the crack regions, where the Si Raman mode is visible due to the absence of the hBN film, serving as a background signal.On the other hand, Figure 3f depicts the Raman spectrum acquired from the film areas.After applying a Voigt fitting procedure, the hBN film Raman shift is found to be 1375.5 cm −1 , which aligns with the expected Raman shift for multilayer hBN.
The combination of optical microscopy, SEM, AFM, and the Raman spectroscopy provides a comprehensive understanding of the multilayer hBN sample's morphology, thickness, and structural properties.The varying thicknesses observed in the AFM line scans, along with the distinct Raman signatures from the crack and film regions, offer valuable insights into the quality and uniformity of the transferred hBN films.These characteristics highlight the challenges for crack detection.Having characterized the hBN film's properties and potential defects, let us now analyze the model's performance in specifically identifying cracks within this sample.In the raw image shown in Figure 4a, the color contrast highlights various defects and hBN characteristics.Cracks appear distinctively as dark blue regions (purple arrows), while black regions (orange arrow) suggest the presence of residue, and brighter blue lines indicate wrinkles.Color variations also reveal thick (brighter) and thin hBN (lighter blue) areas.The model successfully identifies multiple cracks in Figure 4b, assigning bounding boxes and confidence scores.The zoomed-in views in Figure 4c,d demonstrate its ability to accurately identify both single and intersecting cracks, even those with finer details.This robust performance against variations in crack size, orientation, clarity, and background complexity offers initial evidence of the model's potential for automated hBN characterization.A more comprehensive picture of the model's strengths and weaknesses emerges in Figure 5. Notably, the model accurately detects and precisely bounds cracks of varying sizes Figure 5a-f.Its ability to successfully localize even intersecting cracks (Figure 5c) further demonstrates its capabilities.Confidence scores associated with each prediction add a valuable dimension to the analysis, particularly aligning with the model's certainty.

Visualizing Model Performance: Qualitative Analysis of Crack Detection
However, limitations also exist.In some cases, the model misidentifies background MLhBN regions as cracks (particularly fine or low-contrast ones), leading to false positives (purple arrows in Figure 5b,f).Additionally, the precision of bounding boxes occasionally suffers, likely due to challenges in accurately outlining irregular crack patterns.Further refinement could address the observed variability in confidence scores assigned to similarlooking cracks, potentially improving robustness and reducing biases within the model's decision-making process.
These results demonstrate the model's promising ability to detect cracks of various types, but also highlight a need to refine its precision for fine cracks and complex crack patterns.A deeper analysis of the model's errors and decision-making, as revealed through quantitative metrics, will shed further light on areas for improvement.

Quantitative Analysis of Model Performance: Errors and Metrics
To gain a comprehensive understanding of the model's performance, a multi-metric analysis is essential.The confusion matrix, F1-Confidence curve, and precision-recall (PR) curve offer interconnected insights into the model's ability to detect cracks.
The confusion matrix in Figure 6a provides valuable insights into the model's performance and areas for improvement.It demonstrates the model's strong ability to detect cracks, with a high true positive rate (TPR) of 76%, accurately identifying 38 out of 50 cracks present in the dataset.This high TPR is significant as it showcases the model's efficacy in recognizing genuine defects, a critical aspect in ensuring the integrity and reliability of 2D materials like hBN.
However, the matrix also reveals the model's tendency to misidentify 10 background areas (i.e., Si/SiO 2 ) as cracks, leading to a higher false positive rate (FPR) and lower precision.While this sensitivity (model's ability to correctly identify actual cracks) ensures that the model errs on the side of caution, it highlights an area for improvement in distinguishing between true cracks and similar anomalies.Enhancing the model's ability to differentiate between genuine cracks and background irregularities is crucial for reducing unnecessary reviews and interventions in a manufacturing context, ultimately improving both efficiency and cost-effectiveness.
Additionally, the matrix reveals 12 instances where cracks were present but went undetected by the model (false negatives), indicating the need for enhanced sensitivity to ensure fewer cracks are missed, which is critical for quality control in the production of hBN.Given the configuration of the matrix, there is no direct calculation of true negatives (TNs), as the focus is primarily on the detection of cracks, and the background (non-crack areas) is not treated as a separate class.This setup emphasizes the model's application in scenarios where the primary concern is the reliable detection of cracks rather than the identification of non-crack areas.The F1-confidence curve (Figure 6b) underscores the inherent trade-off between precision and recall.The model achieves its best balance between these metrics, with an F1 score of 0.76, at a confidence threshold of approximately 0.572.This threshold offers insights into optimizing the model's crack detection without excessive false alarms.
Figure 6c demonstrates the precision-recall (PR) curve, which provides a more comprehensive view of the trade-off between precision and recall across different classification thresholds.The curve for the "Crack" class starts at a precision of around 0.809, aligning with the maximum F1 score observed in the F1-confidence curve.As the recall increases, the precision decreases, indicating that the model makes more false positive predictions when attempting to capture more true positive instances.The maximum average precision (mAP) across all classes is 0.809 at a recall of 0.5, suggesting that the model performs reasonably well in balancing precision and recall for the "Crack" class.These metrics collectively highlight the model's strengths and a recurring theme: the trade-off between precision and recall.This analysis sets the stage for investigating how this trade-off evolves during model training.The results highlight the model's potential for automated crack-detection applications.Its ability to successfully detect diverse cracks, coupled with insights into the precision-recall balance, offers a strong foundation for optimization.Addressing the identified limitations would further bolster the model's reliability in practical crack inspection scenarios.

Discussion
The results presented demonstrate the YOLOv8n model's potential for crack detection in CVD-grown hBN.Its ability to identify cracks of varying sizes and complexities aligns with the urgent need for streamlined, automated characterization methods in the field of 2D materials.The model's performance, particularly in detecting diverse crack types, underscores the capabilities of machine learning techniques to address the limitations of traditional optical characterization approaches.
However, the observed trade-off between precision and recall warrants further consideration.This trade-off is consistent with findings in other object detection tasks, highlighting the inherent challenge of balancing the identification of true positives with the minimization of false alarms.The tendency towards false positives, particularly in identifying fine or low-contrast cracks, highlights the core challenge of this work: discriminating between cracks and complex MLhBN regions.This underscores the inherent difficulty of hBN characterization on standard substrates and emphasizes the need for even more precise detection techniques.
An analysis of the model's learning dynamics offers insights for optimization strategies.The steady decline in classification losses throughout training suggests the model's progressive knowledge acquisition without significant overfitting.Understanding how the precision-recall balance was learned could inform the use of techniques like class weighting or threshold adjustment to address the observed limitations.
The findings of this study have several implications for research on 2D materials.The demonstrated potential for automated crack detection facilitates the large-scale characterization of CVD-grown hBN.This has direct consequences for quality control and for ensuring material suitability in advanced device applications.Furthermore, the successful application of machine learning underscores its versatility in analyzing complex 2D materials datasets.This lays the groundwork for extending ML-based approaches to address other characterization challenges in the field.
Evaluating the quality of 2D-hBN films based on the detected cracks involves quantifying both the number and size of cracks.While a higher crack density generally suggests a higher degree of defects, the impact on material properties depends on specific application requirements.Even a single large crack can significantly compromise performance in certain applications.Incorporating factors like crack morphology and distribution into the quality assessment process would provide a more comprehensive understanding of the material's potential performance characteristics.Future work should focus on refining these quantitative quality evaluation methods based on the detected cracks, enabling a more robust and standardized assessment of 2D-hBN films for various applications.
Despite the effectiveness of the YOLOv8n-det model for crack detection, the limitations of using rectangular bounding boxes should be considered.The model may face challenges in distinguishing between closely spaced cracks, as multiple cracks could be identified as a single entity within a bounding box.To address these limitations, future work should explore advanced object detection architectures, such as instance segmentation models (e.g., Mask R-CNN), which provide pixel-level segmentation of cracks instead of rectangular bounding boxes.Additionally, post-processing techniques to refine the detected bounding boxes, such as morphological operations or edge detection algorithms, could be developed to extract the actual crack contours and improve the accuracy of crack area estimation.Semantic segmentation models, which directly classify each pixel as belonging to a crack or background, could also be investigated to handle complex crack patterns and accurately distinguish between nearby cracks.Incorporating a diverse range of crack orientations, shapes, and proximity in the training data would further enhance the model's ability to handle these challenging scenarios.
Although this study focuses specifically on crack detection due to their critical impact on material properties and device performance, future work should expand the scope to include the detection and characterization of multiple defect types, including residue and wrinkles, as well as address the challenge of distinguishing cracks from complex MLhBN regions.Developing machine learning models capable of identifying and classifying various defects would provide a more holistic understanding of the material's quality and enable the targeted optimization of the synthesis and transfer processes, accelerating the efficient characterization of 2D-hBN and facilitating its widespread adoption in diverse applications.
While initial development required a labeled dataset and manual annotation, the trained model can now be applied to unlabeled datasets without further human intervention, making it well suited for large-scale analysis.The model's performance is expected to improve with exposure to larger and more diverse datasets, enhancing robustness and generalizability.Although initially trained on hBN, the model's approach is highly transferable to other 2D materials.The common practice of characterizing 2D materials on Si/SiO 2 substrates aligns well with the model's design, which focuses on identifying cracks or dis-continuities in a visually distinct layer on a substrate.While the model excels at detecting cracks in optical images, its direct application to studying the underlying mechanisms of crack propagation may be limited, as crack propagation involves complex factors that are not directly evident in simple optical images.However, the model's ability to identify cracks quickly and accurately could aid in collecting large datasets for further analysis, facilitating the investigation of fundamental principles governing crack formation and propagation in 2D materials.
Several promising avenues exist for future work.Firstly, expanding the dataset with a wider range of crack patterns and substrate contrasts would enhance the model's robustness in distinguishing fine cracks.Secondly, exploring data augmentation techniques tailored to MLhBN regions could further improve the model's discriminatory capabilities.Additionally, investigating ensemble methods that combine the strengths of multiple models offers the potential for greater generalizability and accuracy.Finally, the insights gained from this study pave the way for applying the proposed machine learning approach to crack detection in other 2D materials and even complex heterostructures, further accelerating the characterization and development of these advanced materials.

Conclusions
This study demonstrates the significant potential of a YOLOv8n-based approach to reliable crack detection in CVD-grown hBN.The model's success in identifying cracks of varying complexities offers a valuable tool for quality control in hBN production, ensuring its suitability for cutting-edge devices.The observed precision-recall trade-off presents an opportunity for further refinement, with data augmentation and class weighting techniques holding promise for enhancing accuracy.This work advances the practical application of machine learning in 2D materials characterization, streamlining processes and facilitating the widespread adoption of hBN in advanced technologies.By addressing these challenges, machine learning-based characterization has the potential to revolutionize 2D materials development, accelerating the discovery and optimization of novel materials for cuttingedge applications.

Figure 1 .
Figure 1.Accelerating hBN characterization: A machine learning workflow for automated crack detection.Our comprehensive workflow integrates material synthesis, image acquisition, meticulous annotation, the fine-tuning of a YOLOv8n deep learning model, and rigorous evaluation to streamline hBN quality control and accelerate its use in advanced devices.

Figure 2 .
Figure 2. Architecture of the YOLOv8 model: backbone, neck, and head with component modules.Reprinted with permission from Ref. [59].2019, Springer Nature.

Figure 3 .
Figure 3. Multi-technique characterization of transferred hBN film: (a) The optical image reveals the overall morphology; (b) SEM highlights the edge and substrate topography, and the image was obtained with the Everhart-Thornley Detector (ETD) at 5 kV accelerating voltage; (c) AFM height image; (d) line scans quantify thickness variations; (e) Raman (crack regions); (f) Raman (hBN film).

Figure 4
Figure 4 demonstrates the YOLOv8n model's promising capabilities in crack detection.In the raw image shown in Figure4a, the color contrast highlights various defects and hBN characteristics.Cracks appear distinctively as dark blue regions (purple arrows), while black regions (orange arrow) suggest the presence of residue, and brighter blue lines indicate wrinkles.Color variations also reveal thick (brighter) and thin hBN (lighter blue) areas.The model successfully identifies multiple cracks in Figure4b, assigning bounding boxes and confidence scores.The zoomed-in views in Figure4c,d demonstrate its ability to accurately identify both single and intersecting cracks, even those with finer details.This robust performance against variations in crack size, orientation, clarity, and

Figure 4 .
Figure 4. Crack detection with proposed algorithm.(a) Raw image.(b) Model-generated crack detections with bounding boxes and confidence scores.(c,d) Close-ups demonstrate the accurate identification of varying crack types.Scale bars, (a,b) 200 µm.

Figure 5 .
Figure 5. Model-generated crack detections and analysis.(a-f) Model output with bounding boxes and confidence scores.Analysis highlights strengths and weaknesses.Scale bars, (a-f) 200 µm.

Figure 6 .
Figure 6.Multi-metric analysis of crack-detection model performance: (a) Raw confusion matrix reveals true positives, false positives, and potential class imbalance, where N/A values reflect that only the crack class is used for testing and the background is irrelevant for evaluation; (b) F1-Confidence curve pinpoints optimal confidence threshold; (c) precision-recall curve maps precision decay as recall increases, highlighting the model's overall performance trade-offs.

Figure 7
Figure 7 offers insights into how the previously discussed precision-recall trade-off evolved during the model's training process.Analysis of Figure 7a-c reveals promising trends.The steady decline in training and validation losses across classification loss demonstrates the model's progressive learning in detecting correct cracks.The validation loss on box loss and distribution focal loss (dfl) saturates after certain epochs.This loss saturation suggests that the model's bounding box prediction on the data has slight deviation with the actual bounding box.Given our limited dataset (92 images), the model might not encounter enough examples to generalize effectively across broader real-world data variations.

Figure 7 .
Figure 7. Multi-metric evaluation of the YOLOv8n model's training progress.(a) Classification, (b) bounding box, and (c) distribution focal loss curves reveal steady improvement; (d) precisionrecall curves demonstrate strong performance; (e,f) increasing mAP50 and mAP50-95 scores indicate improving object detection accuracy.Despite the promising learning trends in classification loss, the plateau in validation loss on box and dfl, resulting in differences between training and validation loss, suggests limited generalizability due to the small dataset and potential data variations not fully represented in the training set.The precision-recall curves (Figure 7d), with consistently high values and minimal fluctuations, reinforce a favorable balance of precision and recall throughout training.Furthermore, rising mAP50 and mAP50-95 scores (Figure 7e,f) indicate improving crackdetection accuracy, particularly at an IoU threshold of 0.5.While performance at stricter IoU thresholds warrants further investigation, these results collectively show the model's progressive learning and success in crack detection.The results highlight the model's potential for automated crack-detection applications.Its ability to successfully detect diverse cracks, coupled with insights into the precision-recall balance, offers a strong foundation for optimization.Addressing the identified limitations would further bolster the model's reliability in practical crack inspection scenarios.

Table 1 .
Hyperparameter settings for optimizing the YOLOv8n model's performance on Supervisely.