1. Introduction
China is the world’s most extensive tile production base and exporting country. Therefore, the quality of China’s ceramic tile affects the international building ceramics industry. The continuous development of the national economy has resulted in an increasing demand for ceramic tiles, prompting enhancements in both manufacturing processes and equipment. China’s ceramic tile enterprises in the production, processing, and other aspects of automation have been realized [
1].
The production process of ceramic tile includes the following main steps: grinding clay, quartz, and other raw materials in the ball mill into uniform fine mud and then removing large particles and impurities through the sieve to improve the uniformity and quality of the mud. Then, the static mud undergoes a stale process so that the water and mineral composition is evenly distributed, improving the plasticity and stability. Then, the material is formed into a brick embryo, dried after the glaze and printing, and then through the high-temperature firing to form the final ceramic tile. Then, the ceramic tile is polished and undergoes further processing. The defects of the tiles mainly appear in the firing stage, since firing temperature, time, cellar humidity, cellar density, and so on easily cause surface defects to the tiles. To meet the needs of people for the tile’s own strength quality and surface quality, finished tiles must be finished before entering the market for grading and classification. Although the production and processing of ceramic tiles in China has been automated, surface detection still relies on manual inspection methods. This approach depends on the high-intensity work of assembly line workers, whose prolonged exposure to bright lighting leads to visual fatigue.
Furthermore, variability in personal subjective judgment affects the efficiency and stability of the detection process. As a result, the outcomes often fail to meet industrial demands [
2]. Therefore, it is of great theoretical and practical value to propose a detection algorithm with strong generalization ability for ceramic tiles and apply it to the automated ceramic tile defect detection equipment.
Over the past few years, significant progress has been made by both domestic and international researchers in the study of algorithms for tile defect detection: among them, Sameer Ahamad [
3] et al. performed morphological manipulation on images and used fuzzy rules to classify defects in tiles, Putri et al. [
4] developed a method using fuzzy logic to detect defects in ceramics, and Matic et al. [
5] introduced a real-time algorithm for segmenting cookie tiles, capable of efficiently distinguishing the cookie tiles from the background. This approach has been implemented in the production line of cookie tiles [
6]. Designing a method based on the combination of a sliding filter and region auto-growth to divide the preprocessed ceramic tile defect image into two regions and removing the spurious interference points by morphological operation, extracting the ceramic tile crack feature parameters to give the results of the three-dimensional ceramic tile defect structure detection under the interference of the complex background; Quan Xiaoxia et al. [
7] using the detection algorithm of local variance weighted information firstly, the initial step involves calculating the discrepancy between the tile image under test and the reference tile image, so as to obtain the difference image of the standard tile, and then the overall contour information of the defective area is preliminarily extracted by using the calculation of the local variance between the pixel points; at last, the defective area is adjusted by the weighted average information firstly, and the detailed information of the defective area is emphasized, thus to show the precise defective region of the tiles; Wang [
8] proposed a YOLOv5s (You Only Look Once version 5 Small)-based, single-stage algorithm for detecting defects on tile surfaces, which introduces backbone extraction and attention mechanisms, and employs a depth-separable convolutional model to deepen the network’s ability to characterize defects on small targets and improve detection accuracy; Cao et al. [
9] introduced a YOLOv5-based balanced multi-scale target scoring network for detecting tile surface defects which integrates content-aware feature reorganization and a dynamic attention mechanism to improve detection performance.
The methods above have yielded some results in detecting surface imperfections in tiles. However, due to the large format of the tiles, the small size of defects, the complex background, and the minimal color differences between defects and the background, it is easy to miss the defect features of white and dark spots during local feature extraction. Additionally, in the process of feature fusion, defects with slight differences from the background are prone to the loss of semantic information; in addition, the automatic production line has specific requirements for the ceramic tile detection speed and hardware adaptation, so it is necessary to reduce the number of parameters and calculation quantity as much as possible when meeting the accuracy requirements. The detection of minor target defects on tile surfaces remains challenging under these constraints. To address the aforementioned issues, this paper introduces a method for detecting tile surface defects using an enhanced version of YOLOv8, with the following main contributions:
To address the characteristics of small and challenging-to-extract local features associated with white point and dark point block defects in tile defect detection, CAACSPELAN (Context Anchor Attention Network Cross-Stage Partial Efficient Layer Aggregation Network) is proposed as a replacement for C2f (Concatenate 2-factor), and ODConv (Omni-Dimensional et al.) [
10] is employed to substitute part of the traditional convolution;
To address the problem that the semantic information of defective targets is lost during the process of feature fusion, the neck network part is redesigned to use the CGRFPN (Context-Guided Spatial Feature Reconstruction for Feature Pyramid Networks) to facilitate the fusion of information using features at various levels and scales so that the model can fully understand and distinguish different types of defects;
The proposed MPNWD (Minimum Points Normalized Wasserstein Distance) optimized loss function enables the model to detect tiny defects that are usually difficult to capture, thus improving the overall detection accuracy and reliability.