Detection of Pitting Corrosion in Stainless-Steel Sheet Pile Walls Using Deep Learning

Suzuki, Tetsuya; Otaka, Norihiro; Shibano, Kazuma; Fujimoto, Yuji; Hagiwara, Taiki

doi:10.3390/cmd7020023

Open AccessArticle

Detection of Pitting Corrosion in Stainless-Steel Sheet Pile Walls Using Deep Learning

by

Tetsuya Suzuki

^1,*,

Norihiro Otaka

²,

Kazuma Shibano

³

,

Yuji Fujimoto

² and

Taiki Hagiwara

⁴

¹

Institute of Agriculture, Niigata University, 8050 2-no-cho, Ikarashi, Nishi-ku, Niigata 950-2181, Japan

²

Nippon Steel Metal Products Co., Ltd., 4-14-1 Sotokanda, Chiyoda-ku, Tokyo 101-0021, Japan

³

Graduate School of Science and Technology, Niigata University, 8050 2-no-cho, Ikarashi, Nishi-ku, Niigata 950-2181, Japan

⁴

Graduate School of Sciences and Technology for Innovation, Yamaguchi University, 1677-1 Yoshida, Yamaguchi 753-8511, Japan

^*

Author to whom correspondence should be addressed.

Corros. Mater. Degrad. 2026, 7(2), 23; https://doi.org/10.3390/cmd7020023

Submission received: 6 February 2026 / Revised: 20 March 2026 / Accepted: 2 April 2026 / Published: 7 April 2026

Download

Browse Figures

Versions Notes

Abstract

This study proposes a new deep learning-based approach for detecting pitting corrosion on stainless-steel sheet pile surfaces in drainage channels. Conventional ultrasonic thickness measurement methods cannot detect microscopic pitting corrosion that occurs before measurable thickness reduction. The research develops an automated detection system using visible images captured with smartphone cameras and U-net semantic segmentation. Two stainless steel grades (SUS410 and SUS430) were exposed for 5 years to a brackish water environment and analyzed. The deep learning approach achieved F1-scores of 0.831 (SUS410) and 0.808 (SUS430), outperforming binary thresholding methods (F1-scores: 0.407 and 0.329, respectively). Data augmentation improved performance by 1–3 percentage points. The method enabled non-destructive, quantitative assessment of early-stage corrosion using readily available equipment, providing a practical tool for infrastructure maintenance and long-term durability evaluation.

Keywords:

pitting corrosion detection; stainless-steel sheet piles; deep learning; U-net; semantic segmentation; image analysis; infrastructure inspection; non-destructive testing; corrosion monitoring; agricultural water management

1. Introduction

Stainless-steel sheet pile walls serve as an advanced successor material to conventional steel sheet pile waterways and are expected to extend the service life of facilities and reduce the burden of maintenance management operations. Compared with ordinary steel sheet piles, stainless-steel sheet piles exhibit an exceptionally high corrosion resistance; consequently, thickness reductions cannot be detected by ultrasonic thickness measurement devices typically used in routine corrosion surveys for approximately 10 years after installation. Thus, the development of alternative evaluation methods for minute corrosion is required. In this study, the detection of pitting corrosion occurring on the surface of stainless-steel sheet pile walls using visible images was attempted.

Pitting corrosion has been detected by visual inspection, metallographic examination, mass loss, pit depth measurement, and non-destructive testing [1]. Visual inspection is defined as an inspection to determine the location and degree of pitting corrosion under ambient light [2]. Recording the appearance of pitting corrosion before and after the removal of corrosive products has been conducted. The advantages of this method include not requiring special equipment and being relatively inexpensive to implement. The challenges are real-time image processing and position tagging for continuous surveys [3]. In metallographic examination and pit depth measurement, the indices of pit depth, radius, and density are measured [2]. Scanning white-light interference microscopy [4,5,6] and laser scanning [7,8,9] have been employed for the acquisition of three-dimensional information, and the detection accuracy in three dimensions and indices related to stress corrosion cracking have been investigated. Simple models of pit shapes—rectangular, cylindrical, conical, and hemispherical—are used [10,11]. Image-based diagnoses range from studies evaluating the features of pitting corrosion in images to improving the accuracy of pitting corrosion detection and estimation of shape indices. Visual inspection by experts has been cited as having issues of being labor-intensive and lacking objective evaluation criteria. In addition to the electrochemical methods, this could lead to understanding the progression process of corrosion if it becomes possible to capture all forms of corrosion through images. Motivated by this, the features of pitting corrosion in images have been evaluated by color, texture, and shape [12]. In terms of other indices, the classification of pitting corrosion and cracks by frequency and fractal characteristics was attempted for the prediction of crack formation caused by the occurrence and progression of pitting corrosion [13]. As an image analysis method for pitting corrosion detection, the position and diameter of pitting corrosion are estimated using the Hough transform, which is a circular detection method [14]. For deep learning methods that have been proposed in recent years, pitting corrosion and corrosion detection in painted structural members [15] and pitting corrosion detection in gas pipelines [16] have been reported. In all cases, high-accuracy classification of multiple classes was performed using large-scale datasets. Few previous studies have targeted detection of pitting corrosion on steel material, and no cases have been reported on stainless-steel sheet pile walls sampled from drainage channels after long-term service. For other methods, fiber optic sensors have also been investigated as a promising approach for pitting corrosion detection. Long-period fiber grating (LPFG) sensors coated with Fe-C film have been shown to enable a quantitative measurement of corrosion-induced mass loss [17], and optical frequency domain reflectometry (OFDR)-based distributed fiber optic sensors have been reported to allow continuous and quantitative assessment of the pit depth and corrosion rate along steel surfaces [18,19]. However, these methods require physical installation of sensors on the target structure, which limits their applicability to existing infrastructure.

The technical issue is “the lack of established evaluation methods for stainless-steel sheet pile walls where thickness reduction cannot be observed within a period of several to ten years”. The requirement to address this issue is the “development of an indicator that shows the degradation of mechanical performance of stainless-steel sheet pile walls”. From the observation results of the stainless-steel sheet pile walls subjected to exposure tests, pitting corrosion was confirmed at a stage prior to thickness reduction. Therefore, the pitting corrosion area ratio of stainless-steel sheet pile walls is proposed as an indicator of performance degradation. In this study, a high-accuracy detection method for pitting corrosion using deep learning with visible images acquired by a simple smartphone-mounted camera was developed.

2. Analytical Procedures

2.1. Definition of Pitting Corrosion in Stainless-Steel Sheet Pile Walls

2.1.1. Definition of Pitting Corrosion in Image Information

Supervised learning requires annotation (ground truth labels); therefore, to perform annotation work, a clear definition of pitting corrosion in images is required. ISO8044 (2015) defines pitting corrosion as “corrosion that causes cavities extending from the metal surface toward the interior” [20]. According to Sugimoto (1981), pitting corrosion is “a corroded area with a large diameter/depth ratio that forms when part of the surface of a metal material dissolves at a higher rate than other parts for some reason” [21]. However, only the surface opening can be observed, and regardless of the shape, only the area of the opening can be the subject of evaluation in image analysis. As shown in Figure 1, pitting corrosion can form in various shapes and sizes, so it should be noted that the size of the opening does not necessarily accurately reflect the degree of corrosion below the surface. There is no consensus among researchers regarding the quantitative definition of pitting corrosion, and Table 1 summarizes the annotation methods in similar previous studies.

A technical challenge is that corrosion and pitting corrosion detection by image processing has not established a clear correspondence with physical meaning [23]. Although image processing using artificial intelligence, machine learning, and deep learning shows high detection performance, “pitting corrosion” defined by humans in images tend to be semantically assigned based on expert experience rather than quantitative criteria. In this study, since it is difficult to directly evaluate the direction of depth from the surface in images, the definition of pitting corrosion in images is set as “changes in luminance values due to cavities extending toward the interior” (Figure 2).

To supplement this definition with quantitative evidence, pixel-level statistics were computed from the annotated dataset. As shown in Figure 3a, the grayscale intensity distributions of pitting corrosion (mean ± SD: 146.5 ± 49.3) and non-pitting corrosion regions (129.1 ± 23.8) partially overlap, indicating that only intensity is insufficient for unambiguous discrimination. By contrast, Figure 3b shows that pitting regions exhibit substantially higher brightness gradient magnitudes (38.3 ± 23.5 pixel/pixel) than non-pitting regions (14.0 ± 9.1 pixel/pixel), corresponding to a 2.73 times difference. Similarly, Figure 3c shows that the local intensity standard deviation within a 5-pixel radius is 1.92 times higher in pitting regions (32.6 ± 10.6) than in non-pitting regions (16.9 ± 5.6). Kolmogorov–Smirnov tests confirm that all distributions differed significantly between the two regions (all p < 0.001). These differences reflect the physical nature of pitting, as light incidence on a cavity creates a dark center/bright halo pattern with a steep luminance transition at the pit boundary, resulting in higher local gradient and texture contrast compared to the surrounding intact surface.

Annotation data comprises array data of the same size as the visible image, with each pixel assigned as either 0 (non-pitting area) or 255 (pitting area) (Figure 4). By using dedicated annotation tools and by introducing ground truth, efficient and high-precision annotation becomes possible. It has been reported that the quality of ground truth data by annotation significantly affects the model quality [24]. It has even been pointed out that the data quality is more important than the algorithm itself during accuracy verification, so sufficient care must be taken to create high-quality annotation data.

2.1.2. Task Definition

To define specific tasks, it is necessary to consider the characteristics of the available training dataset. In this study, visible images that can be easily collected are used as input data, and annotation data is manually created from the visible images. The deep learning method to be adopted was selected based on the amount of the available data, required accuracy, and implementation feasibility. The conditions are set as follows: two region-of-interest images are available, the required accuracy exceeds that of conventional image processing methods, and an implementable deep learning model. The original photographs captured on-site contained non-uniform surface conditions, including areas with surface contamination and structural irregularities unsuitable for corrosion analysis; therefore, representative regions of approximately 60 mm × 270 mm were manually extracted from each original photograph, with one region per steel grade (SUS410 and SUS430), focusing on the exposed surfaces with confirmed pitting corrosion. These two cropped images serve as the basis for all subsequent deep learning procedures. The limited number of available original images reflects the restricted availability of stainless-steel sheet pile installations with sufficient service history to exhibit detectable pitting corrosion under real-world conditions. The accuracy of deep learning has improved in various domains such as image recognition, natural language processing, video analysis, and speech recognition, bringing innovation to machine learning [25]. This success is not only due to the ongoing development of methods but also to the preparation of large-scale datasets and the cost reduction in GPUs that enable high-speed numerical computation. As there are basic tasks (Figure 5 and Table 2) for deep learning targeting images, optimal task selection according to purpose is important.

In this study, image segmentation (semantic segmentation) was adopted with the goal of calculating the pitting corrosion area ratio per image. Image classification assigns a single label to the entire image and provides no spatial information about the location or the extent of corrosion, making it unsuitable for area ratio calculations. Object detection localizes target regions using bounding boxes; however, bounding boxes enclose both corroded and non-corroded pixels within the rectangular boundary, resulting in a systematic overestimation of the corrosion area. Pixel-level delineation through image segmentation is therefore required. Semantic segmentation classifies each pixel into semantic categories, whereas instance segmentation identifies individual instances of objects in addition to category classification. Task selection is determined based on the necessity of individual identification, the necessity of handling overlapping objects, and the importance of overall area understanding. Semantic segmentation is adopted in this study because individual identification of each pit is not necessary, and the calculation of the total pitting corrosion area ratio based on ASTM standards [2] is the objective. According to ASTM G46 [2], the evaluation index is defined as the fraction of the surface area occupied by pitting corrosion. Since this index requires only the total corroded pixel area rather than the identification of individual pits, the additional computational overhead of instance segmentation provides no analytical benefit for the present purpose. For semantic segmentation, many methods have been proposed, including FCN [26], SegNet [27], U-net [28], and DeepLab [29]. U-net, which can achieve high-accuracy detection even with a limited dataset, was adopted. Based on these technical challenges, requirements, available data, and model selection, the task in this study was defined as “pitting corrosion detection of stainless steel sheet pile walls by U-net using visible images”. This task enables the calculation of evaluation indices of pitting corrosion density and area based on ASTM standards [2]. The number of corrosion pits observed on the specimen surface, as well as the pitting area ratio, is defined as the ratio of the corroded area to the total surface area (%). These two indices were selected to represent fundamental quantitative measures for characterizing pitting corrosion severity within the ASTM G46 framework.

2.2. Deep Learning Workflow

2.2.1. Annotation

The annotation work was performed using dedicated image annotation software [30]. The annotation process began by loading visible images. Pitting corrosion areas were then manually identified and marked through careful visual inspection. Following this identification process, binary mask images were created with each pixel assigned either 0 for non-pitting areas or 255 for pitting areas. Finally, quality checks and verification of annotations were conducted to ensure accuracy and consistency.

2.2.2. Preprocessing (Data Splitting)

In supervised learning, the dataset is divided into three parts: training data, validation data, and test data. This division is necessary since the development of the model always requires settings to be tuned [31]. In addition to model weights, there is a need to select setting values called hyperparameters, such as the number of layers and layer sizes. In hyperparameter tuning, model performance on the validation data is used as a feedback signal to search for the optimal combination in the parameter space. If the model is evaluated only with validation data without preparing test data separately, performance on unknown data may be lower than expected due to information leakage; therefore, the final performance of the model is evaluated with test data, which is a completely unknown dataset.

The hold-out method is a basic splitting approach, and performance estimates may be highly sensitive to the choice of split when the dataset is small. k-fold cross-validation mitigates this issue by averaging the performance over k-different splits, with the assumption of spatial independence between folds that may be violated when patches are derived from spatially contiguous regions of the same image. In this study, patches from the last row of each image were reserved as a fixed test set to ensure spatial separation from the training pool. To account for the variability introduced by random partitioning of the training pool, the entire pipeline was repeated 100 times with independent random seeds, and the distribution of F1-scores is presented as box plots.

For data augmentation to improve generalization performance, vertical flip and blur were applied using the Albumentations library [32] (Figure 6). These two operations were selected from a broader set of candidate augmentations, including horizontal flip, rotation, contrast transformation, and cropping based on their contribution to improving F1-score during the optimization process. The application probability of each operation was treated as a hyperparameter and optimized.

2.2.3. Model

The deep learning model used in this study is U-net [28], which is a type of Fully Convolutional Network (FCN). FCNs are neural networks only consist of convolutional layers and pooling layers without fully connected layers. U-net is widely used as a basic semantic segmentation model. Deep learning is a mechanism that maps input values such as images to predicted values such as labels (Figure 7). The processing that a layer performs on input data is stored in the weights of that layer. The process of learning involves finding the optimal values of weights in all layers of the network so that the input values are accurately mapped to the corresponding target values. To control something, it must first be observable; that is, to control the output of a model, it is necessary to measure how much the output deviates from the expected value. This measurement is performed by the model’s loss function. The loss function is calculated from the model’s predicted values and ground truth values, when then quantifies the model’s performance for that sample. The basic principle of deep learning is to use this loss value as a feedback signal to fine-tune the weight values. The adjustment of weights is performed in the direction that reduces the loss value for the current sample. The optimizer is responsible for executing this adjustment process [31].

The U-net architecture consists of an encoder (contracting path) and a decoder (expansive path), as shown in Figure 8. The encoder performs feature extraction through repeated application of convolutions (Conv), Rectified Linear Unit activation (ReLU), batch normalization, and max pooling (2 × 2) for downsampling. The decoder performs upsampling through up-convolution (2 × 2) operations, followed by concatenation with the corresponding encoder features via skip connections and repeated convolutions. These skip connections establish direct connections from encoder to decoder, transmitting both local and global information while helping to preserve spatial information that may be lost during downsampling. The model takes input images of 128 × 128 pixels in size with 3 channels (RGB) and outputs binary segmentation masks with 1 channel. The loss function used is binary cross-entropy, and the optimizer is Adam [33].

2.2.4. Hyperparameter Tuning

In hyperparameter tuning, the search space increases as the number of parameters to be adjusted increases. Intuiting which hyperparameters to adjust and what values to set is often gained from experience. This process has a significant impact on performance and requires a great deal of time, so technologies for the automatic optimization of hyperparameters are being studied. The optimization procedure is basically the same for any method: set a combination of hyperparameters and build the corresponding model; train on the training data and evaluate performance on the validation data; select the next combination of hyperparameters for training and evaluation; and repeat this to select the combination that maximizes performance on the validation data. Basic optimization methods include grid search, random search, and Bayesian optimization (Figure 9). Grid search is an algorithm that tries all combinations of the parameters and adopts the parameter with the best evaluation value of the objective function. This is effective when the search range is small and the objective function can be calculated quickly [34]. Random search is an algorithm that randomly selects values from the candidate range for each parameter and tries them. Grid search and random search are computationally fast because they do not model the objective function; however, they do not leverage the information from previous trials to guide the search. Consequently, their efficiency degrades as the search space expands with the number of hyperparameters. In contrast, Bayesian optimization constructs a probabilistic surrogate model that is updated after each trial, enabling the algorithm to focus evaluations on promising regions of the search space. Although each iteration involves an additional computational overhead, this approach identifies high-performing configurations with substantially fewer trials than exhaustive or random sampling. Given that each trial in this study requires full model training, which is computationally expensive, Bayesian optimization was selected to balance search efficiency with the cost of functional evaluation. The optimization was implemented using Optuna [35], with the Tree-structured Parzen Estimator (TPE) as the surrogate model and Expected Improvement (EI) as the acquisition function.

As shown in Figure 9d, it was confirmed that this combination minimized the objective value to be efficiently searched. The hyperparameters tuned included the image size (ranging from 32 × 32 to 512 × 512 pixels), number of filters, kernel size, batch size, and maximum number of epochs.

2.2.5. Evaluation Metrics

When adjusting the model, it is necessary to pay attention to overfitting and generalization. Overfitting is a phenomenon in which the model shows high performance during training; however, the performance can decrease with unknown data. Generalization refers to the performance when a trained model is executed on unknown data, and improving this generalization performance is a major goal in machine learning. Figure 10 shows the relationship between the training time and loss value. During the model adjustment process, the loss value on the validation data initially decreases as the training progresses, but after reaching a minimum value at a certain point, it tends to increase again. Immediately after the start of training, as the loss value on the training data decreases, the loss value on the validation data also decreases. At this stage, learning is insufficient, and there is still room for improvement in the model (underfitting state); however, when learning on the training data exceeds a certain number of times (epochs), generalization performance no longer improves, and the loss value on validation data begins to increase. This stage indicates that overfitting has begun. A model in an overfitting state can learn the patterns and noise specific to the training data, resulting in decreased prediction performance for unknown data. Identifying the point at which the model achieves maximum generalization performance—that is, the boundary between underfitting and overfitting—is one of the most important factors for maximizing the final model performance. To find this point, methods such as “early stopping”, which periodically evaluates the validation data during training and stops training when the loss value is minimized, have been effective.

Performance evaluation methods for classification tasks include confusion matrix, accuracy, precision, recall, and F1-score. The confusion matrix comprises true positive (TP), true negative (TN), false positive (FP), and false negative (FN) between the annotation data and the prediction data. When the target is pitting corrosion, if the annotation class is pitting and the prediction is also pitting, it is a true positive; conversely, if the prediction is not pitting, it is a false negative. When the annotation class is not pitting, if the prediction is pitting, it is a false positive; thus, if the prediction is not pitting, it is a true negative. The confusion matrix is used for the consideration of the classification results; however, concise indices are needed to compare multiple models. For model detection performance in accuracy verification, the F1-score is used to consider data imbalance. The F1-score is the harmonic mean of precision and recall, providing a balanced evaluation metric that is especially useful when the class distribution is imbalanced.

To evaluate the performance of the deep learning approach, a conventional image processing method based on binary thresholding was used as a control group. The binary thresholding method begins by converting RGB images to grayscale, followed by applying threshold processing with separate ranges for low luminance (0–120) and high luminance (120–255). This process creates 25 binary images for the low luminance range and 28 images for the high luminance range. These binary images are then synthesized in 700 different combinations, and the optimal images are extracted for each block. Finally, accuracy verification is performed to select the combination with the maximum F1-score. This conventional method serves as a baseline to demonstrate the improvement achieved by the deep learning approach. The comparison includes accuracy, precision, recall, and F1-score metrics for both methods.

3. Target Structure and Measurement Methods

3.1. Environmental Conditions and Steel Grades

3.1.1. Site Overview

The target stainless-steel sheet pile installation site is the O drainage channel in Niigata Prefecture, Japan. This site represents an agricultural drainage channel where the revetment steel sheet piles of an operational drainage channel were replaced with stainless-steel sheet piles, and approximately 5 years have elapsed since their installation. The installation setup is shown in Figure 11.

The O drainage channel is located in a brackish water environment with relatively high chloride ion concentrations. The drainage channel serves agricultural water management purposes and experiences significant water level fluctuations throughout the irrigation season. The water quality conditions at the test site are presented in Table 3. The chloride ion concentration of 120 mg/L, recorded as a single measurement on August 28, 2023, falls within the brackish water classification range (50–500 ppm) and is considerably higher than typical freshwater environments, suggesting a substantially more aggressive corrosive environment. This chloride concentration increases electrical conductivity and extends the wetting time, resulting in a more severe corrosive environment. Previous studies have indicated that the annual corrosion rate for ordinary steel in this environment is approximately 0.15 mm (150 μm), which is more than twice that observed in freshwater environments [36].

The stainless-steel sheet pile materials examined in this study consisted of two grades with distinct metallurgical characteristics and corrosion-resistance properties. The first material, SUS410, is a martensitic stainless steel with a Pitting Index (

P I

) value of 11. Its chemical composition consists of

F e

-

C r

with approximately 12% chromium content. This material was fabricated as a lightweight steel sheet pile of the LSP3D type with a plate thickness of 5 mm. The second material, SUS430, is a ferritic stainless steel exhibiting a higher Pitting Index value of 16. Its chemical composition similarly consists of

F e

-

C r

however with a higher chromium content of approximately 17%. The Pitting Index (

P I

) is calculated using the formula,

P I = C r + 3.3 \times M o + 16 \times N,

(1)

where the percentages represent the weight fractions of chromium (

C r

), molybdenum (

M o

), and nitrogen (

N

) in the alloy composition. This index provides a quantitative measure of a stainless steel’s resistance to pitting corrosion, with higher values indicating superior resistance. The higher

P I

value of SUS430 (16 compared to 11 for SUS410) indicates superior pitting corrosion resistance, which can be attributed primarily to its higher chromium content. The additional 5% chromium in SUS430 enhances the stability and protective quality of the passive oxide film that forms on the steel surface, providing greater resistance to breakdown and pit initiation in chloride-containing environments.

Both materials exhibit high corrosion resistance due to their passive films, with corrosion primarily manifesting as localized corrosion, such as pitting corrosion and crevice corrosion, rather than the uniform corrosion observed in ordinary carbon steels. This localized corrosion behavior is characteristic of stainless steels, where the passive film provides excellent general protection but remains vulnerable to breakdown at specific sites under aggressive conditions, such as high chloride concentrations. For comparison purposes, ordinary steel (SS400) sheet piles were also installed at the site alongside the stainless-steel sheet piles (Figure 12). After 5 years of exposure to the brackish water environment, the ordinary steel showed significant general corrosion with floating rust formation, particularly in the water level fluctuation zone where alternating wet–dry cycles and oxygen availability accelerated the corrosion rate. In contrast, the stainless-steel sheet piles maintained relatively sound conditions with only microscopic pitting corrosion. This dramatic difference in corrosion behavior validates the superior performance of stainless-steel materials in aggressive aqueous environments and demonstrates the practical benefits of using corrosion-resistant alloys in agricultural water management infrastructure, despite their higher initial material costs.

3.1.2. Preliminary Investigation by Prior Measurements

Prior to the image-based analysis, exposure tests were conducted to evaluate the corrosion resistance of stainless-steel sheet piles under actual service conditions. The exposure test involved installing cold-formed steel sheet piles in the operational environment and investigating their corrosion behavior over time. Thickness measurements were performed using ultrasonic thickness gauges. The measurement process involved constructing a temporary cofferdam around the target revetment; excavating the soil layer approximately 50 cm below the surface; drying the steel sheet pile surfaces; removing adhered dirt by water washing; and conducting thickness measurements using an Olympus 38DL digital ultrasonic thickness gauge (Olympus Corporation) equipped with a D798-J probe, in accordance with JIS Z 2355-1:2016 [37]. The measurement resolution of the instrument was 0.01 mm, which constitutes the minimum detectable thickness change under the applied measurement conditions. For stainless-steel sheet piles (both SUS410 and SUS430), no systematic general thickness reduction exceeding the detection limit of 0.01 mm was observed after 5 years of exposure (Figure 13), indicating that the uniform corrosion rate was less than 0.002 mm/year. This is consistent with the characteristic corrosion behavior of stainless steel, in which the passive film suppresses uniform dissolution while leaving the material susceptible to localized pitting corrosion.

However, visual inspection after surface cleaning revealed important findings. At 1 year after installation, the surface exhibited a smooth, silver, and glossy appearance after removing the adhered mud and iron particles with no visible corrosion observed by visual inspection. At 3 years after installation, the surface showed slightly reduced gloss and minor pitting corrosion was confirmed, though the overall condition remained sound. At 5 years after installation, brown discoloration was observed in the water level fluctuation zone, and upon cleaning and observing the surface, the occurrence of fine pitting corrosion was confirmed (Figure 14). The observed pitting corrosion were hemispherical in shape, with both the diameter and depth less than 1 mm. The distribution was concentrated mainly in the water level fluctuation zone. A clear difference between steel grades was observed, with SUS410 showing 137 pits per 50 mm × 50 mm area, while SUS430 showed 24 pits per 50 mm × 50 mm area. The exposure test results demonstrated a critical limitation of conventional ultrasonic thickness measurement methods: microscopic pitting corrosion that occurs as a precursor to thickness reduction cannot be detected by ultrasonic thickness gauges. This finding establishes the necessity for image-based evaluation methods that can detect microscopic pitting corrosion in these early stages, quantify the pitting corrosion density and area, and provide evaluation indices for performance degradation before measurable thickness reduction occurs.

3.1.3. Surface Cleaning Before Image Acquisition

To enable proper surface inspection and image acquisition, the following preparatory work was conducted similar to the ultrasonic thickness measurement. First, a temporary cofferdam was constructed around the target revetment section to enable dewatering and access to the steel sheet pile surface. Then, the soil layer in the subsurface zone was excavated to a depth of approximately 50 cm to expose the steel sheet pile surface in the water level fluctuation zone where pitting corrosion occurrence had been confirmed. After dewatering, the steel sheet pile surface was allowed to dry completely to enable proper visual inspection and image acquisition.

3.1.4. Image Acquisition and Imaging Device

The target position for photography was the surface of the stainless-steel sheet pile in the water level fluctuation zone where the occurrence of pitting corrosion had been confirmed. This zone represents the most aggressive corrosion environment due to its alternating wet–dry cycles, oxygen concentration cells, highest chloride ion exposure, and maximum corrosion rate location. Photography was conducted under natural daylight without artificial lighting equipment. This approach was chosen to simplify field measurement requirements, as it eliminates the need for specialized lighting apparatus that may be impractical to deploy in confined revetment inspection environments. It should be noted that natural light involves variability due to factors such as the time of day and weather conditions; therefore, the systematic characterization of illumination effects on detection performance remains a subject for future investigation. Images were captured at an approximate distance of 50 cm from the steel sheet pile surface. This distance provides an adequate field of view for pitting corrosion detection, sufficient spatial resolution for microscopic features, practical working distance for field inspections, and a consistent scale across different measurement locations.

A 12-megapixel digital camera mounted on a smartphone was used as the imaging device. The selection of a smartphone camera was intentional to demonstrate that high-accuracy pitting corrosion detection can be achieved with readily available equipment; that specialized inspection equipment is not required; that the method can be easily implemented in practical field inspections; and that cost-effective solutions are feasible for routine monitoring. The captured images featured a high-resolution suitable for detecting sub-millimeter features; a wide aperture (f/1.8) enabling an adequate depth of field; a fast shutter speed (1/448 s) minimizing motion blur; a low ISO sensitivity (32) minimizing image noise; and a standard RGB format compatible with deep learning frameworks. The images used for analysis consisted of two images of the stainless-steel sheet pile after 5 years of exposure (Figure 15). From the photographed images, regions of approximately 60 mm × 270 mm were extracted from the locations where the surface was exposed. This extraction process focused on the representative areas with confirmed pitting corrosion provided sufficient data for training and validation, and maintained consistent scale and resolution. The extracted regions were used for deep learning-based pitting corrosion detection. The images were further divided into smaller patches (32 × 32; 64 × 64; 128 × 128; 256 × 256; or 512 × 512 pixels) for training and evaluation. Data segmentation into training, validation, and test sets was performed at the patch level, not at the original image level. After dividing each original image into patches, the patches were spatially partitioned so that patches from the lower region of each image were designated as the test set, while the remaining patches were used for training and validation, as illustrated in Figure 16.

4. Results and Discussion

4.1. Dataset Configuration and Model Optimization

Preliminary analyses with a fixed epoch number of 10, varying the pixel dimensions of subdivided images, are conducted to determine the optimal image division size for deep learning-based pitting detection (Figure 16). Figure 17 presents the distribution of F1-scores across different image sizes using box-and-whisker plots. The boxes represent the interquartile range (Q1 to Q3); the whiskers indicate minimum and maximum values (excluding outliers); the central line shows the median; and × marks represent mean values. The analysis revealed that images subdivided into 128 × 128 pixels exhibited the highest F1-scores with minimal distribution variance, demonstrating both the highest accuracy and stability. Consequently, 128 × 128 pixels was adopted for this study as the standard image subdivision size for subsequent analyses, as this configuration provided an optimal balance between detection performance and computational efficiency for our dataset and model architecture. The epoch number was fixed at 10 in this preliminary screening to ensure a comparison across image sizes under identical training conditions—including the epoch count as an additional variable at this stage would substantially increase the computational cost per trial, making the exploration of the image size space exhaustive and impractical. This screening served to narrow the search space to a tractable range before proceeding to the joint optimization of the remaining hyperparameters.

Using the 128 × 128 pixel image subdivision, 100 trials are performed to evaluate the robustness of our deep learning approach. In this optimization stage, the training epoch count was treated as a free hyperparameter and jointly optimized alongside the network channel depth, kernel size, batch size, and data augmentation parameters using Bayesian optimization with the Tree-structured Parzen Estimator (TPE) as the surrogate model. This joint optimization addresses the concern that the epoch count was fixed in the preliminary image-size screening, and ensures that the final model configuration reflects the optimal combination of all relevant hyperparameters. The F1-scores showed a mean value of 0.753, median of 0.760, maximum of 0.816, and minimum of 0.636. The relatively small standard deviation of 0.034 indicated consistent and stable detection performance across multiple iterations. To ensure reproducibility across all 100 trials, model weight initialization was fixed using a random seed. Each trial was assigned a unique combination of hyperparameters by the Bayesian optimizer, and the augmentation behavior was fully determined by the sampled probabilities for blur and vertical flip, introducing no additional uncontrolled stochasticity. Figure 18 illustrates the detection results for pitting corrosion on SUS410 stainless-steel sheet pile surfaces using U-net, while Figure 19 presents the corresponding results for SUS430. Each figure displays four images in vertical sequence: (a) the original RGB image; (b) the ground truth annotation image; (c) the predicted image generated by deep learning; and (d) an overlay image combining ground truth and predictions for accuracy assessment.

The overlay images in both Figure 18 and Figure 19 prominently display green labels representing true positives (TP), indicating accurate pitting corrosion detection for both SUS410 and SUS430. Conversely, blue labels (false negatives, FN) and red labels (false positives, FP) appear minimally, demonstrating that detection omissions and misclassifications were limited. This visual assessment confirms the high accuracy of the deep learning-based detection approach. Table 4 and Table 5 present the quantitative performance metrics derived from the confusion matrices of SUS410 and SUS430, respectively. Given the highly imbalanced nature of pitting corrosion detection, where pitting corrosion regions occupy only a small fraction of the total image area, the F1-score serves as the most appropriate evaluation metric. The precision values were approximately 0.77 for both steel grades, indicating that about 77% of the detected pitting locations were correctly identified, thereby demonstrating relatively low false positive rates. The recall values showed SUS410 at 0.903 and SUS430 at 0.848, with SUS410 demonstrating superior performance. This higher recall for SUS410 indicates that the model successfully detected the majority of actual pitting occurrences, resulting in fewer false negatives. The F1-scores, representing the harmonic mean of precision and recall, exceeded 0.8 for both materials (0.831 for SUS410; 0.808 for SUS430), demonstrating a well-balanced detection performance. The slightly higher F1-score for SUS410 suggests that the model maintained both precision and recall at relatively high levels for this steel grade, effectively capturing correct pitting locations with minimal omissions and false alarms. This performance difference may be attributed to the fact that SUS410 exhibited a higher frequency of pitting corrosion occurrence, resulting in a larger quantity of pitting examples in the training dataset. As shown in Table 6, SUS410 exhibited 1134 corrosion pits with an area rate of 3.573%, whereas SUS430 showed only 314 corrosion pits with an area rate of 0.714%. The greater availability of pitting corrosion data for SUS410 contributed to improved model learning and consequently enhanced detection performance for this steel grade.

4.2. Effect of Data Augmentation on Detection Performance

To evaluate the impact of the data augmentation on the detection accuracy, deep learning analyses without data augmentation techniques were performed. Following the same protocol as the augmented approach, 100 trials were attempted. The F1-scores yielded a mean of 0.752, a median of 0.759, a maximum of 0.799, and a minimum of 0.614, with a standard deviation of 0.035, indicating relatively stable results comparable to the augmented approach. Figure 20 and Figure 21 present the detection results without data augmentation for SUS410 and SUS430, respectively, while Table 7 and Table 8 show the corresponding confusion matrix metrics.

Visual inspection of Figure 20 and Figure 21 reveals abundant green labels (true positives) for both steel grades, confirming high detection accuracy; however, slightly increased occurrences of blue labels (false negatives) and red labels (false positives) were observed, indicating marginally higher detection omissions and misclassifications compared to the data-augmented results. Precision values of 0.826 (SUS410) and 0.801 (SUS430) showed SUS410 with slightly superior performance, indicating better discrimination against false positives. Recall values were 0.781 (SUS410) and 0.792 (SUS430), with SUS430 marginally higher, suggesting this model detected slightly more actual pitting occurrences with fewer false negatives. F1-scores were 0.803 (SUS410) and 0.796 (SUS430), with SUS410 slightly outperforming, though both values were nearly equivalent. It demonstrates that both steel grades exhibited relatively stable detection performance even without data augmentation.

Comparing the effects of data augmentation using F1-scores as the primary metric, SUS410 showed an F1-score of 0.831 (with augmentation) versus 0.803 (without), while SUS430 demonstrated 0.808 (with augmentation) versus 0.796 (without). Data augmentation improved F1-scores by approximately 2.8 pt for SUS410 and 1.2 pt for SUS430. Furthermore, comparing mean and median F1-scores across all 100 trials reveal consistent improvements of approximately 1.0 pt with data augmentation, confirming its effectiveness in enhancing model performance. These improvement magnitudes align well with documented performance gains from the data augmentation in comparable image segmentation tasks. According to Buslaev et al. (2020), data augmentation techniques typically contribute to performance improvements in the range of 2–7% across various computer vision applications [38]. Our observed improvements of 1.2–2.8 pt (corresponding to approximately 1.5–3.5% relative improvement) fall within the lower-to-middle portion of this established range, indicating reasonable and consistent enhancement effects. The relatively modest improvement magnitude can be attributed to the already limited dataset size (two images subdivided into blocks).

4.3. Comparison with Binary Thresholding Approach

To quantitatively evaluate the advantages of deep learning-based detection, our deep learning approach was compared with the binary thresholding method. The binary thresholding approach demonstrated significant limitations when applied to real-world stainless-steel sheet pile surfaces. The method relies on brightness value thresholds to differentiate pitting from non-pitting regions.

The challenge involved a lack of homogenous lighting across the captured images. Field conditions introduced variations in the surface brightness across images. Figure 22 illustrates the regional brightness variations within the images of both steel grades. The figure presents the average brightness values for different subdivision schemes: 72 divisions (128 × 128 pixels), 18 divisions, three divisions, and no division (whole image). The numerical values displayed in the subdivided regions clearly demonstrate systematic brightness gradients within single images. For SUS410, the brightness values in the upper regions ranged from about 141–147, while the lower regions exhibited values as low as 99–110, representing a decrease of approximately 30–40%. Similarly, SUS430 showed upper region brightness values of 140–153 that decreased to 92–113 in the lower regions, also representing an approximate 30–40% variation. This substantial brightness inhomogeneity arose from natural lighting angles during outdoor image acquisition and the surface reflection properties of the stainless-steel sheet piles. This visualization clearly illustrates that applying uniform threshold values across such images would inevitably produce inconsistent results: threshold values optimized for darker regions would generate excessive false positives in brighter regions, while threshold values optimized for brighter regions would miss genuine pitting in darker regions.

The whole-image analysis approach applied a single pair of threshold values (upper and lower) uniformly across the entire image. The two resulting binary images were then combined to produce a final detection result. Despite this comprehensive parameter search, the approach encountered severe limitations due to the variations in brightness documented in Figure 22. The optimal threshold pair represented a compromise that balanced detection performance across bright and dark regions, but this compromise inevitably produced poor performance in both regions. In darker regions where brightness values were 20–40% lower than in brighter regions, the globally optimized threshold tended to over-detect, classifying normal surface variations, shadows, and contaminants as pitting. Conversely, the same threshold values failed to capture genuine pitting whose brightness characteristics fell outside the detection range in brighter regions. The second major limitation concerned false positive generation, which proved to be a persistent and severe problem throughout the thresholding analysis. Surface contaminants, shadows from adjacent structures, and inherent surface texture variations produced brightness changes that mimicked pitting characteristics. Since the method applied uniform thresholds across the entire image without considering spatial context, it could not distinguish brightness variations caused by genuine pitting from those caused by surface artifacts or environmental factors. The third fundamental limitation related to context insensitivity inherent to the thresholding approach. Binary thresholding treats each pixel independently based solely on the brightness value without considering the spatial context or surrounding features. Genuine pitting often exhibits characteristic patterns, specifically shadow regions combined with reflective regions due to pit geometry, as illustrated in Figure 3; however, thresholding methods cannot capture these complex spatial relationships, as they lack the capability to learn and recognize such feature combinations.

Figure 23 and Figure 24 present the detection results using whole-image binary thresholding for SUS410 and SUS430, respectively, while Table 9 and Table 10 show the corresponding performance metrics. Visual examination of Figure 23 and Figure 24 reveals dramatically different detection patterns compared to deep learning predictions. The green labels (true positives) are substantially fewer for both steel grades, indicating that the thresholding method correctly identified only a minority of actual pitting locations. In addition, both figures show extensive red labeling (false positives) in the left and right portions of the images. This pattern directly corresponds to the brightness variations documented in Figure 22, where regions with lower brightness values in the lower portions of the images generated numerous false detections as the globally optimized threshold classified normal surface characteristics as pitting. The combination of extensive false positives and moderate false negatives produced the characteristic pattern of poor precision with moderately low recall observed in the quantitative metrics. As shown in Table 9 and Table 10, precision values were extremely low at 0.318 (SUS410) and 0.258 (SUS430), representing less than one-third the precision achieved by deep learning (0.770–0.771). Recall values were 0.565 (SUS410) and 0.456 (SUS430), substantially lower than the 0.903 (SUS410) and 0.848 (SUS430) achieved by deep learning. The resulting F1-scores were 0.407 (SUS410) and 0.329 (SUS430), lower than the 0.831 (SUS410) and 0.808 (SUS430) achieved by deep learning. These F1-scores reflect the combined impact of poor precision and low recall, indicating a fundamentally inadequate detection performance for practical applications.

4.4. Summary and Practical Implications

No prior study has targeted pitting corrosion detection on stainless-steel sheet pile walls under field conditions, precluding a direct methodological comparison. The binary thresholding method therefore serves as the methodological baseline in this study. Binary thresholding requires no training data and is straightforward to implement; however, as demonstrated in Section 4.3, it is fundamentally limited by sensitivity to brightness inhomogeneity, yielding F1-scores of 0.407 (SUS410) and 0.329 (SUS430). The proposed deep learning approach sacrifices this simplicity in exchange for substantially higher detection accuracy (F1 > 0.80) and robustness to natural lighting variation inherent in field inspection conditions.

From a practical deployment perspective, the method offers several critical advantages for field inspection of in-service infrastructure. First and most importantly, the approach demonstrates exceptional robustness to natural variations in imaging conditions that inevitably occur during field acquisition. The comparison with binary thresholding clearly illustrates this advantage. While threshold-based methods require carefully controlled lighting conditions or extensive parameter re-optimization for each image or image region, the deep learning model maintained high detection accuracy despite brightness inhomogeneity arising from uncontrolled natural lighting angles and surface reflection properties. This robustness is valuable for inspecting operational structures where standardizing illumination conditions is impractical or impossible [39,40,41]. Second, the use of smartphone cameras for image acquisition eliminates the need for specialized imaging equipment, reducing implementation costs and simplifying operator training. In recent years, infrastructure monitoring has shifted toward more decentralized and participatory approaches. Smartphone-based sensing has emerged as a powerful tool for detecting structural damage and monitoring vibrations [42,43,44,45]. Furthermore, crowdsourced infrastructure monitoring—where citizens act as sensors—is becoming a mainstream strategy for large-scale urban maintenance, enabling rapid and low-cost screening of civil structures [46]. The 128 × 128 pixel optimal image resolution (corresponding to approximately 47 × 47 mm physical area at the ~50 cm imaging distance used) provides practical guidance for field inspection protocols. This resolution captures sufficient detail to detect sub-millimeter pitting corrosion while maintaining reasonable computational requirements for batch processing. Third, the automated detection system provides objective, quantitative assessment of corrosion extent, reducing subjectivity inherent in visual inspection and enabling consistent monitoring across multiple sites and time periods. The high recall performance (>0.84) ensures that significant corrosion is reliably detected, while the precision values (~0.77) indicate that false positive rates remain manageable, making human verification burden acceptable for operational deployment.

Despite the promising results demonstrated in this study, several important limitations must be acknowledged. The most critical limitation concerns the limited size and diversity of the training dataset. The study utilized only two full-scale images captured from a single exposure test site, which were subdivided into smaller blocks for training and evaluation. While this dataset was sufficient to demonstrate proof-of-concept and establish the substantial performance advantages over conventional methods, it represents a relatively narrow sampling of the environmental conditions, surface states, and corrosion severities that would be encountered across operational infrastructure networks [47,48].

Unlike conventional steel structures where corrosion occurs relatively quickly and extensively, stainless-steel sheet piles exhibit excellent corrosion resistance, with pitting developing gradually over years to decades of service [36]. The adoption of stainless-steel sheet piles in agricultural water management infrastructure has been historically limited due to higher initial material costs compared to conventional steel alternatives. Consequently, the number of existing installations with sufficient service history to exhibit detectable pitting corrosion remains quite restricted. Field sites suitable for acquiring training data—those with stainless-steel sheet piles that have been in service long enough to develop measurable pitting under real environmental conditions—are scarce. This situation may evolve favorably in the coming years. As infrastructure managers increasingly recognize the lifecycle cost advantages of corrosion-resistant materials, particularly when considering long-term maintenance and replacement costs, the adoption of stainless-steel sheet piles is expected to expand [49]. This growing deployment provides both motivation and opportunity for developing automated corrosion monitoring capabilities. As more stainless-steel sheet pile installations enter service and accumulate exposure history, opportunities for systematic data collection across diverse sites, environmental conditions, and service durations will increase correspondingly.

Future research should prioritize several complementary directions to address the current limitations and expand practical utility. First and most urgently, expanding the training dataset to encompass broader ranges of environmental conditions, surface states, and corrosion severities would substantially improve model generalization and reliability across operational deployment scenarios. This expansion should systematically capture variations in lighting conditions, surface conditions, and pitting characteristics to build comprehensive training datasets that represent the full range of inspection scenarios. The demonstrated robustness of the current model, despite limited training data, suggests that a relatively modest dataset expansion could yield substantial practical benefits. Integrating the corrosion detection system with structural analysis models could translate detected pitting area ratios directly into remaining service life estimates or load-bearing capacity predictions, enabling data-driven infrastructure management decisions [50,51]. The relationship between surface pitting characteristics (density, depth, area ratio) and structural performance is well-established for many corrosion scenarios. Establishing similar relationships for stainless-steel sheet piles through combined monitoring and structural testing programs could enable the deep learning detection system to provide engineering-relevant outputs rather than purely descriptive outputs (pitting area percentage).

5. Conclusions

This research successfully developed an automated pitting corrosion detection system for stainless-steel sheet piles using deep learning and smartphone-based imaging. The key findings and implications are:

The U-net-based semantic segmentation approach achieved F1-scores exceeding 0.80 for both steel grades, with precision around 0.77 and recall above 0.84. This represents a substantial 42-point improvement over conventional binary thresholding methods, demonstrating the superiority of deep learning for this application.
The method offers exceptional robustness to natural variations in field imaging conditions, uses accessible smartphone cameras, and provides objective quantitative assessments. The optimal 128 × 128 pixel resolution captures sub-millimeter pitting while maintaining reasonable computational requirements.
The results confirmed the superior corrosion resistance of SUS430 (PI = 16) over SUS410 (PI = 11), with a significantly lower pitting density and area ratio, validating the effectiveness of higher chromium contents in aggressive brackish water environments.
The primary limitation is the restricted training dataset size (two images from a single site). The scarcity reflects the limited number of existing stainless-steel sheet pile installations with sufficient service history to develop detectable pitting under real environmental conditions.

The demonstrated capability to detect early-stage corrosion using accessible technology provides a practical foundation for data-driven infrastructure management and lifecycle cost optimization in agricultural water management systems.

Author Contributions

Conceptualization, T.S. and N.O.; methodology, T.S., N.O. and K.S.; software, K.S.; validation, N.O.; formal analysis, N.O. and K.S.; investigation, T.S., N.O., Y.F. and T.H.; resources, T.S. and N.O.; data curation, N.O.; writing—original draft preparation, T.S. and N.O.; writing—review and editing, T.S.; visualization, N.O.; supervision, T.S.; project administration, T.S.; funding acquisition, T.S. and N.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

Authors Norihiro Otaka and Yuji Fujimoto were employed by the Nippon Steel Metal Products Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

Phull, B. Evaluating pitting corrosion. In ASM Handbook: Corrosion—Fundamentals, Testing, and Protection; ASM International: Materials Park, OH, USA, 2003; Volume 13A, pp. 545–548. [Google Scholar]
ASTM G46-05; Standard Guide for Examination and Evaluation of Pitting Corrosion. ASTM International: West Conshohocken, PA, USA, 2005.
Caines, S.; Khan, F.; Shirokoff, J. Analysis of pitting corrosion on steel under insulation in marine environments. J. Loss Prev. Process Ind. 2013, 26, 1466–1483. [Google Scholar] [CrossRef]
Sasidhar, K.N.; Ahuja, R.; Lukas, C.; Sridharan, K. Convolutional neural network for automated quantitative analysis of non-destructively acquired three-dimensional corrosion pit morphology data. Scr. Mater. 2025, 262, 116660. [Google Scholar] [CrossRef]
Qi, X.; Lian, Y.; Wang, Y.; Lu, Z. Simulation-driven end-to-end deep learning method for white-light interference topography reconstruction. Photonics 2025, 12, 702. [Google Scholar] [CrossRef]
Ghahari, S.M.; Davenport, A.J.; Rayment, T.; Suter, T.; Tinnes, J.P.; Padovani, C.; Hammons, J.; Stampanoni, M.; Marone, F.; Mokso, R. In situ synchrotron X-ray micro-tomography study of pitting corrosion in stainless steel. Corros. Sci. 2011, 53, 2684–2687. [Google Scholar] [CrossRef]
Khodabux, W.; Liao, C.; Brennan, F. Characterisation of pitting corrosion for inner sections of offshore wind foundations using laser scanning. Ocean Eng. 2021, 230, 109079. [Google Scholar] [CrossRef]
Zhang, W.; Wan, W.; Ren, Q.; Liu, Z.; Zhang, X.; Zhao, L.; Yang, L.; Chai, S.; Shi, M.; Wang, H.; et al. High-throughput quantitative characterization of pitting corrosion in 907A steel based on a multidimensional information strategy combined with deep learning image identification. Prog. Nat. Sci. Mater. Int. 2025, 35, 917–933. [Google Scholar] [CrossRef]
Bing, H.; Li, S. Point cloud data-driven modelling of high-strength steel wire corrosion pits considering orientation features. Constr. Build. Mater. 2024, 449, 138451. [Google Scholar] [CrossRef]
Hu, Z.; Hua, L.; Liu, J.; Min, S.; Li, C.; Wu, F. Numerical simulation and experimental verification of random pitting corrosion characteristics. Ocean Eng. 2021, 240, 110000. [Google Scholar] [CrossRef]
Wang, R. On the effect of pit shape on pitted plates, Part II: Compressive behavior due to random pitting corrosion. Ocean Eng. 2021, 236, 108737. [Google Scholar] [CrossRef]
Choi, K.Y.; Kim, S.S. Morphological analysis and classification of types of surface corrosion damage by digital image processing. Corros. Sci. 2005, 47, 1–15. [Google Scholar] [CrossRef]
Pidaparti, R.M.; Aghazadeh, B.S.; Whitfield, A.; Rao, A.S.; Mercier, G.P. Classification of corrosion defects in NiAl bronze through image analysis. Corros. Sci. 2010, 52, 3661–3666. [Google Scholar] [CrossRef]
Wang, Y.; Cheng, G. Application of gradient-based Hough transform to the detection of corrosion pits in optical images. Appl. Surf. Sci. 2016, 366, 9–18. [Google Scholar] [CrossRef]
Liu, C.; Tian, L.; Wang, P.; Yu, Q.Q.; Song, L.; Miao, J. Non-destructive detection and quantification of corrosion damage in coated steel components under different illumination conditions. Expert Syst. Appl. 2025, 282, 127854. [Google Scholar] [CrossRef]
Malashin, I.; Tynchenko, V.; Nelyub, V.; Borodulin, A.; Gantimurov, A.; Krysko, N.V.; Shchipakov, N.A.; Kozlov, D.M.; Kusyy, A.G.; Galinovsky, A. Deep learning approach for pitting corrosion detection in gas pipelines. Sensors 2024, 24, 3563. [Google Scholar] [CrossRef]
Chen, Y.; Tang, F.; Bao, Y.; Tang, Y.; Chen, G. A Fe-C coated long-period fiber grating sensor for corrosion-induced mass loss measurement. Opt. Lett. 2016, 41, 2306–2309. [Google Scholar] [CrossRef]
Xu, L.; Shi, S.; Huang, Y.; Yan, F.; Wang, X.; Wilson, R.; Zhang, D. Quantification and assessment of steel pitted corrosion using OFDR-based distributed fiber optic sensors. Measurement 2025, 256, 118519. [Google Scholar] [CrossRef]
Tan, X.; Fan, L.; Huang, Y.; Bao, Y. Detection, visualization, quantification, and warning of pipe corrosion using distributed fiber optic sensors. Autom. Constr. 2021, 132, 103953. [Google Scholar] [CrossRef]
ISO 8044:2015; Corrosion of Metals and Alloys—Basic Terms and Definitions. International Organization for Standardization: Geneva, Switzerland, 2015.
Sugimoto, K. Fundamental aspects of localized corrosion. Met. Surf. Finish. 1981, 32, 355–365. (In Japanese) [Google Scholar] [CrossRef]
Roberge, P.R. Corrosion Engineering; McGraw-Hill: New York, NY, USA, 2008; p. 708. [Google Scholar]
Xia, D.H.; Song, S.; Tao, L.; Qin, Z.; Wu, Z.; Gao, Z.; Luo, J.L. Material degradation assessed by digital image processing: Fundamentals, progress, and challenges. J. Mater. Sci. Technol. 2020, 53, 146–162. [Google Scholar] [CrossRef]
Halevy, A.; Norvig, P.; Pereira, F. The unreasonable effectiveness of data. IEEE Intell. Syst. 2009, 24, 8–12. [Google Scholar] [CrossRef]
Gulli, A.; Pal, S. Deep Learning with Keras; Packt Publishing: Birmingham, UK, 2017. [Google Scholar]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (CVPR), Boston, MA, USA, 7–12 June 2015; IEEE: Boston, MA, USA, 2015; pp. 3431–3440. [Google Scholar]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
FastLabel Inc. FastLabel: AI Data Platform for Annotation and MLOps. 2026. Available online: https://app.fastlabel.ai (accessed on 5 February 2026).
Chollet, F. Deep Learning with Python, 2nd ed.; Manning Publications: Shelter Island, NY, USA, 2021. [Google Scholar]
Albumentations Team. Albumentations: Fast and Flexible Image Augmentations. 2026. Available online: https://albumentations.ai (accessed on 5 February 2026).
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Sano, S.; Akiba, T.; Imamura, H.; Ohta, T.; Mizuno, N.; Yanase, T. Black-Box Optimization with Optuna; Ohmsha: Tokyo, Japan, 2023. (In Japanese) [Google Scholar]
Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; ACM: Anchorage, AK, USA, 2019; pp. 2623–2631. [Google Scholar] [CrossRef]
Otaka, N.; Fujimoto, Y.; Asano, I.; Kawabe, S.; Hagiwara, H.; Suzuki, T. Evaluation of Corrosion Characteristics of Stainless-Steel Sheet Piles in Agricultural Drainage Canals by Exposure Tests. Trans. JSIDRE 2023, 91, I_203–I_209. [Google Scholar] [CrossRef]
JIS Z 2355-1:2016; Non-Destructive Testing—Ultrasonic Thickness Measurement—Part 1: Measurement Method. JSA: Tokyo, Japan, 2016.
Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A. Albumentations: Fast and flexible image augmentations. Information 2020, 11, 125. [Google Scholar] [CrossRef]
Bondar, D.; Basova, Y.; Vodka, O.; Machado, J. Mobile-focused spatial inspection of industrial parts using 2D image processing and LiDAR. Measurement 2026, 266, 120522. [Google Scholar] [CrossRef]
Kim, J.W.; Choi, H.W.; Kim, S.K.; Na, W.S. Review of image-processing-based technology for structural health monitoring of civil infrastructures. J. Imaging 2024, 10, 93. [Google Scholar] [CrossRef]
Kaur, R.; Karmakar, G.; Xia, F.; Imran, M. Deep learning: Survey of environmental and camera impacts on Internet of Things images. Artif. Intell. Rev. 2023, 56, 9605–9638. [Google Scholar] [CrossRef]
O’Byrne, M.; Pakrashi, V.; Schoefs, F.; Ghosh, B. Damage assessment of built infrastructure using smartphones. In Civil Engineering Research in Ireland; CERI: Dublin, Ireland, 2018. [Google Scholar]
Chen, Z.; Chen, J. Mobile imaging and computing for intelligent structural damage inspection. Adv. Civ. Eng. 2014, 2014, 483729. [Google Scholar] [CrossRef]
Perez, H.; Tah, J.H.M. Deep learning smartphone application for real-time detection of defects in buildings. Struct. Control Health Monit. 2021, 28, e2751. [Google Scholar] [CrossRef]
Ozer, E.; Kromanis, R. Smartphone prospects in bridge structural health monitoring: A literature review. Sensors 2024, 24, 3287. [Google Scholar] [CrossRef] [PubMed]
Chen, X.; Wang, B.; Chen, J.; Zhang, X.; Liu, S.; Zhou, G.; Li, P.; Zhao, X. Innovative life-cycle inspection strategy of civil infrastructure: Smartphone-based public participation. Struct. Control Health Monit. 2023, 2023, 8715784. [Google Scholar] [CrossRef]
Coelho, L.B.; Zhang, D.; Van Ingelgem, Y.; Steckelmacher, D.; Nowé, A.; Terryn, H. Reviewing machine learning of corrosion prediction from a data-oriented perspective. npj Mater. Degrad. 2022, 6, 8, Correction in npj Mater. Degrad. 2022, 6, 72. [Google Scholar] [CrossRef]
Das, A.; Dorafshan, S.; Kaabouch, N. Autonomous image-based corrosion detection in steel structures using deep learning. Sensors 2024, 24, 3630. [Google Scholar] [CrossRef]
Otaka, N.; Fujimoto, Y.; Asano, I.; Yamauchi, Y.; Hagiwara, T.; Suzuki, T. Material Design and Life Cycle Cost Assessment for Extra Long-term Durability of Agricultural Canal Revetments. J. Jpn. Soc. Irrig. Drain. Rural Eng. 2023, 91, 801–804. (In Japanese) [Google Scholar] [CrossRef]
Ahmad, S.; Ahmad, S.; Akhtar, S.; Ahmad, F.; Ansari, M.A. Data-driven assessment of corrosion in reinforced concrete structures embedded in clay-dominated soils. Sci. Rep. 2025, 15, 22744. [Google Scholar] [CrossRef] [PubMed]
Franciosi, M.; Kasser, M.; Viviani, M. Digital twins in bridge engineering for streamlined maintenance and enhanced sustainability. Autom. Constr. 2024, 168, 105834. [Google Scholar] [CrossRef]

Figure 1. Typical form of pitting corrosion [1,22].

Figure 2. Definition of pitting corrosion within images.

Figure 3. Pixel-level statistical distributions of pitting and non-pitting regions.

Figure 4. Annotation data preparation.

Figure 5. Basic tasks for deep learning targeting images.

Figure 6. Data augmentation methods.

Figure 7. Workflow of deep learning.

Figure 8. U-net architecture.

Figure 9. Basic optimization methods for hyperparameter tuning.

Figure 10. Evaluation of generalization and overfitting by learning curve.

Figure 11. The installation situation of stainless-steel sheet pile.

Figure 12. Appearance of stainless-steel sheet pile walls and ordinary steel sheet pile walls after 5 years of exposure in the O drainage channel.

Figure 13. Remaining plate thickness profiles of SS400, SUS410, and SUS430 sheet piles measured by ultrasonic thickness gauge after 5 years of exposure.

Figure 14. Surface condition of stainless-steel sheet piles (exposure period 5 years, O drainage channel).

Figure 15. Measurement range selection and full sections of stainless-steel sheet piles for analysis.

Figure 16. Example of dataset split based on image size.

Figure 17. Box-and-whisker plot showing F1-score distribution for different image sizes (32, 64, 128, 256, 512 pixels).

Figure 18. Deep learning-based pitting detection results for SUS410.

Figure 19. Deep learning-based pitting detection results for SUS430.

Figure 20. Deep learning-based pitting corrosion detection results without data augmentation for SUS410.

Figure 21. Deep learning-based pitting corrosion detection results without data augmentation for SUS430.

Figure 22. Average brightness values depending on the number of divisions.

Figure 23. Threshold-based pitting corrosion detection results for SUS410.

Figure 24. Threshold-based pitting corrosion detection results for SUS430.

Table 1. Previous related studies on pitting corrosion detection.

Author, Year	Target	Method	Purpose	Annotation
Liu. et al., 2025 [15]	Corrosion of painted steel structural members	Semantic segmentation (YOLOv8-G) using smartphone images	Evaluation of influence of illumination conditions on images Quantification of corroded areas	Images correspond to five stages of the corrosion process of painted steel structural members
Malashin. et al., 2024 [16]	Pitting corrosion in gas pipelines	Binary classification using Gaussian process (GP) and Sequential Model-based Algorithm Configuration (SMAC)	Identification of images showing pitting corrosion	No description of classification
Wang. and Cheng., 2016 [14]	X80 pipeline steel specimens immersed in NaCl solution	Circular detection using gradient-based Hough transform	Automated recognition of pit location and diameter Simplification of statistical analysis	Diameter measured using image editing software (size depends on microscope magnification)

Table 2. Deep learning tasks for images.

Term	Definition
Image Classification	Assigning one or more labels to an image. Example: does this image contain pitting corrosion?
Image Segmentation	Segmenting or dividing an image into multiple areas. Example: is any given pixel pitting corrosion?
Object Detection	Drawing rectangles called bounding boxes around target objects in an image and associating each rectangle with a class.

Table 3. Water quality conditions at O drainage channel, measured on August 28, 2023, concurrent with pitting corrosion image acquisition.

Item	Unit	Measured Value
Sulfate ion	mg/L	21
Nitrate ion	mg/L	3.7
Chloride ion	mg/L	120
pH	-	6.9

Table 4. Detection accuracy metrics for SUS410 using deep learning with data augmentation.

Steel Grade	Precision	Recall	F1-Score
SUS410	0.770	0.903	0.831

Table 5. Detection accuracy metrics for SUS430 using deep learning with data augmentation.

Steel Grade	Precision	Recall	F1-Score
SUS430	0.771	0.848	0.808

Table 6. Number of pitting corrosions and area rate for each steel grade.

Steel Grade	Number of Pitting Corrosions	Area Rate (%)
SUS410	1134	3.573
SUS430	314	0.714

Table 7. Detection accuracy metrics for SUS410 using deep learning without data augmentation.

Steel Grade	Precision	Recall	F1-Score
SUS410	0.826	0.781	0.803

Table 8. Detection accuracy metrics for SUS430 using deep learning without data augmentation.

Steel Grade	Precision	Recall	F1-Score
SUS430	0.801	0.792	0.796

Table 9. Detection accuracy metrics for SUS410 using threshold method.

Steel Grade	Precision	Recall	F1-Score
SUS410	0.318	0.565	0.407

Table 10. Detection accuracy metrics for SUS430 using threshold method.

Steel Grade	Precision	Recall	F1-Score
SUS430	0.258	0.456	0.329

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Suzuki, T.; Otaka, N.; Shibano, K.; Fujimoto, Y.; Hagiwara, T. Detection of Pitting Corrosion in Stainless-Steel Sheet Pile Walls Using Deep Learning. Corros. Mater. Degrad. 2026, 7, 23. https://doi.org/10.3390/cmd7020023

AMA Style

Suzuki T, Otaka N, Shibano K, Fujimoto Y, Hagiwara T. Detection of Pitting Corrosion in Stainless-Steel Sheet Pile Walls Using Deep Learning. Corrosion and Materials Degradation. 2026; 7(2):23. https://doi.org/10.3390/cmd7020023

Chicago/Turabian Style

Suzuki, Tetsuya, Norihiro Otaka, Kazuma Shibano, Yuji Fujimoto, and Taiki Hagiwara. 2026. "Detection of Pitting Corrosion in Stainless-Steel Sheet Pile Walls Using Deep Learning" Corrosion and Materials Degradation 7, no. 2: 23. https://doi.org/10.3390/cmd7020023

APA Style

Suzuki, T., Otaka, N., Shibano, K., Fujimoto, Y., & Hagiwara, T. (2026). Detection of Pitting Corrosion in Stainless-Steel Sheet Pile Walls Using Deep Learning. Corrosion and Materials Degradation, 7(2), 23. https://doi.org/10.3390/cmd7020023

Article Menu

Detection of Pitting Corrosion in Stainless-Steel Sheet Pile Walls Using Deep Learning

Abstract

1. Introduction

2. Analytical Procedures

2.1. Definition of Pitting Corrosion in Stainless-Steel Sheet Pile Walls

2.1.1. Definition of Pitting Corrosion in Image Information

2.1.2. Task Definition

2.2. Deep Learning Workflow

2.2.1. Annotation

2.2.2. Preprocessing (Data Splitting)

2.2.3. Model

2.2.4. Hyperparameter Tuning

2.2.5. Evaluation Metrics

3. Target Structure and Measurement Methods

3.1. Environmental Conditions and Steel Grades

3.1.1. Site Overview

3.1.2. Preliminary Investigation by Prior Measurements

3.1.3. Surface Cleaning Before Image Acquisition

3.1.4. Image Acquisition and Imaging Device

4. Results and Discussion

4.1. Dataset Configuration and Model Optimization

4.2. Effect of Data Augmentation on Detection Performance

4.3. Comparison with Binary Thresholding Approach

4.4. Summary and Practical Implications

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI