Semi-Supervised Learning Method for the Augmentation of an Incomplete Image-Based Inventory of Earthquake-Induced Soil Liquefaction Surface Effects

: Soil liquefaction often occurs as a secondary hazard during earthquakes and can lead to signiﬁcant structural and infrastructure damage


Introduction
Liquefaction is a type of ground failure in which increased pore pressure in granular soil reduces effective stress and can result in ejected silt to fine sand and water, settlement, lateral spreading resulting in cracking, and/or bearing capacity failure.Surface manifestations of liquefaction, including ejecta, settlements, and cracking due to lateral settlements, are often identified during field reconnaissance after earthquake events as evidence of liquefaction in the subsurface.These surface manifestations of liquefaction are associated with liquefiable sediments in the upper few meters [1].Liquefaction inventories (historical occurrence records) are used for developing semi-empirical and empirical models for liquefaction prediction, including regional and global susceptibility models commonly used in practice for rapid response and loss estimation [2,3].These liquefaction occurrence inventories generally provide observed locations as points or polygons (areas) of liquefaction occurrence.Most often, the liquefaction occurrence is equivalent to the existence of a surface manifestation of liquefaction, which has been observed in the field or remotely.
Existing liquefaction inventories are generally presented as binary labels (either the location had evidence of liquefaction (1) or did not have evidence of liquefaction (0)), such as the Next Generation Liquefaction (NGL) project [4].Other efforts have proposed an alternate liquefaction inventory format, including several recent Geotechnical Extreme Events Reconnaissance (GEER) Association efforts, which result in near-spatially complete polygon-based maps of liquefaction occurrence [5][6][7][8].In [9], the authors have also developed inventories that map the spatial extent of liquefaction by providing polygons around ejecta and surface cracking.Development of regional and global liquefaction occurrence probability models requires the availability of spatially complete and unbiased (consistent) inventory of historical liquefaction occurrence records for the purpose of model training, validation, and optimization [2,10].Alternatively, light detection and ranging (LiDAR) datasets of surface settlements as a more quantitative measure of liquefaction have also been used [11,12].A benefit of these alternate liquefaction inventories is an estimate of the scale of liquefaction, which is necessary to map impact and fragility.This paper examines an efficient way to develop a spatial inventory of liquefaction surface manifestations using aerial imagery.
Liquefaction (ejecta, cracking, etc.) can be identified at a regional scale using satellite optical imagery or high-resolution aerial imagery in a site-specific manner [13].The liquefaction identification can be carried out by visual interpretation of post-event images [14,15] or by inspection of both pre-and post-event images [16].However, any visual process can be tedious even for an expert and take a significant amount of time, sometimes resulting in incomplete and spatially limited datasets.In addition, this approach can be highly inconsistent because it depends on the experience and judgment of human inspectors and other contributing factors such as data quality and resolution.Thus, automated liquefaction detection methods are needed to provide consistent, spatially complete liquefaction inventory datasets after major earthquakes.
Automated mapping methods can be developed based on pairs of pre-event and post-event data or post-event data only.Satellite multispectral bands provide essential information about land cover types, which can be helpful for identifying liquefaction ejecta [9,13,[17][18][19][20][21].Change detection methods use pairs of pre-event and post-event imagery for liquefaction detection [17,[21][22][23].These methods are sensitive to image acquisition orientations (viewing geometry) of satellites, different light and weather conditions, seasonal variations, resolution mismatch, cloud coverage of pre-or post-event images, etc.As an alternative, post-event optical imagery can be used to map liquefaction using supervised learning algorithms, including advanced machine learning [24,25] and deep learning-based methods [26,27].Successful deployment of supervised learning techniques depends on the availability, quantity, and quality of labeled (annotated) data.
In this research, semi-supervised learning as an alternative mapping strategy for developing liquefaction inventories is demonstrated.Semi-supervised self-training methods are helpful when datasets are partially labeled, and they alleviate the requirement of large-scale labeled data, by learning from the labeled data (like supervised learning algorithms) and from the unlabeled data to capture the patterns within the dataset.This improves the generalizability of the model to other feature types which have limited or no presence in the labeled portion of the data.These methods have been used for several applications, including semantic image segmentation [28], and they are shown to improve predictive accuracy with additional unlabeled data in image classification frameworks [29,30].In [31], the authors have also shown the superior performance of semi-supervised learning for post-disaster building damage assessment via pre-and post-event satellite imagery.
Within the semi-supervised algorithm, this research will leverage the benefits of various image transformations to the color channels.Image processing techniques such as texture analysis and statistical measurements of color channels have been used for different purposes, including raveling and crack detection in asphalt pavements [32,33], flood damage mapping [34], vision-based rock type classification [35], and lithological mapping by optimal integration of such characteristics with multi-sensor remote sensing data [36].The color and textural features of the image were used for liquefaction detection via the implementation of tile-based deep convolutional neural networks [26,27].This research will explore these methods via semi-supervised pixel-based classification.
This study demonstrates a semi-supervised self-training classification for developing a spatially complete and consistent liquefaction inventory using a limited set of labeled liquefaction polygons and high-resolution aerial imagery.Because high-resolution aerial imagery only has three color bands (red, green, and blue), data transformations across four categories (color transformations, texture analysis, statistical indices, and dimensionality reduction techniques) are leveraged to increase the information content.This method, which uses semi-supervised learning and imaged-derived information to detect liquefaction for the first time, is compared to a simpler supervised learning approach.Using the results of this study, a spatially complete and consistent liquefaction inventory can be created.

Datasets 2.1.1. Region of Study
The 22 February 2011 (Mw6.2) Christchurch earthquake struck directly beneath Christchurch City.Although the Christchurch earthquake occurred with a smaller magnitude than the 2010 Darfield earthquake, it caused a higher amount of damage in northern and eastern Christchurch [37] because of its proximity to the city of Christchurch and its occurrence directly beneath the liquefaction-prone land.Examination of the aerial and satellite images and extensive ground surveys has confirmed the presence of widespread liquefaction in rural areas and the Christchurch urban area [38][39][40].

Imagery
The aerial imagery used in this research study comprises three spectral (color) bands (RGB) with a very high spatial resolution (10 cm).On 24 February 2011 (2 days after the earthquake), they were captured by New Zealand Aerial Mapping (NZAM) at 1700 m above the earth's surface using the UltraCam XP digital aerial camera.The 1785 orthorectified RGB GeoTIFF imagery published by [41] represents a ground surface area of 33.3 ha in Christchurch.The size of each vertical image tile is 7200-by-4800 pixels (720-by-480 m). Figure 1 shows the locations of the tiles used in this paper, and Figure 2 shows the tiles in a close view.The tile shown in Figure 2a is used for model development and evaluation.Then, the same approach is applied to the tile in Figure 2b.
tiles in a close view.The tile shown in Figure 2a is used for model development and evaluation.Then, the same approach is applied to the tile in Figure 2b.tiles in a close view.The tile shown in Figure 2a is used for model development and evaluation.Then, the same approach is applied to the tile in Figure 2b.

Liquefaction Labels
Liquefaction inventories have historically been developed from field investigations [5,8] and visual labeling of satellite and aerial imagery [6,42].The lack of a standardized protocol for documenting earthquake-induced ground failure creates uncertainties in how surface manifestations are mapped.Liquefaction surface manifestations vary in shape, size, and color.In [42], the authors present a polygon-based liquefaction inventory that includes the 2011 Christchurch earthquake.The [42] inventory includes large-scale polygon delineations to show the broad extent of liquefaction-induced impact but does not mimic the individual surface manifestations (e.g., ejecta).
In [43], Sanon et al. (2022) have developed an alternative polygon-based inventory that also includes the 2011 Christchurch earthquake.The [43] inventory is based on the delineation of individual liquefaction surface manifestation to use the inventory to train computer-vision-based algorithms.In [43], authors discuss the challenges and limitations of creating a consistent and spatially complete inventory.One rule in the [43] inventory is only to include liquefaction labels that are identified with high certainty and sufficient visibility of boundaries.They have also only delineated liquefaction occurrences which appeared to be in their original form, not the piled liquefaction ejecta on the roadsides [43].As a result, the [43] inventory is a partially labeled, spatially incomplete inventory of liquefaction surface manifestations with high quality and certainty.
An accuracy assessment of the model is performed on the tile in Figure 3a, which shows the labels by [43] overlaying the image tile presented in Figure 2a (model evaluation tile).To calculate the model accuracy in this study, the spatially complete labels are created by manually drawing polygons (Figure 3b).As [43] discussed, visual labeling of liquefaction has inherent uncertainty due to the ambiguity of boundaries and the inconsistency between labelers.Thus, to minimize human inconsistency, a single expert labeled the entire model validation tile (Figure 3b).This study aims to use the high-certainty liquefaction polygons labeled by [43] in Figure 3a to map the unlabeled portion of the tile using semi-supervised learning methods to augment the partially labeled liquefaction data into a spatially complete and consistent inventory.[5,8] and visual labeling of satellite and aerial imagery [6,42].The lack of a standardized protocol for documenting earthquake-induced ground failure creates uncertainties in how surface manifestations are mapped.Liquefaction surface manifestations vary in shape, size, and color.In [42], the authors present a polygon-based liquefaction inventory that includes the 2011 Christchurch earthquake.The [42] inventory includes large-scale polygon delineations to show the broad extent of liquefaction-induced impact but does not mimic the individual surface manifestations (e.g., ejecta).
In [43], Sanon et al. (2022) have developed an alternative polygon-based inventory that also includes the 2011 Christchurch earthquake.The [43] inventory is based on the delineation of individual liquefaction surface manifestation to use the inventory to train computer-vision-based algorithms.In [43], authors discuss the challenges and limitations of creating a consistent and spatially complete inventory.One rule in the [43] inventory is only to include liquefaction labels that are identified with high certainty and sufficient visibility of boundaries.They have also only delineated liquefaction occurrences which appeared to be in their original form, not the piled liquefaction ejecta on the roadsides [43].As a result, the [43] inventory is a partially labeled, spatially incomplete inventory of liquefaction surface manifestations with high quality and certainty.
An accuracy assessment of the model is performed on the tile in Figure 3a, which shows the labels by [43] overlaying the image tile presented in Figure 2a (model evaluation tile).To calculate the model accuracy in this study, the spatially complete labels are created by manually drawing polygons (Figure 3b).As [43] discussed, visual labeling of liquefaction has inherent uncertainty due to the ambiguity of boundaries and the inconsistency between labelers.Thus, to minimize human inconsistency, a single expert labeled the entire model validation tile (Figure 3b).This study aims to use the high-certainty liquefaction polygons labeled by [43] in Figure 3a to map the unlabeled portion of the tile using semi-supervised learning methods to augment the partially labeled liquefaction data into a spatially complete and consistent inventory.The [43] labels used in this study (Figure 3a,c), as well as the validation labels created in this study (Figure 3b), are summarized in Table 1.Model application is evaluated on a second tile, as shown in Figure 3c with the labels from [43].The model application tile has The [43] labels used in this study (Figure 3a,c), as well as the validation labels created in this study (Figure 3b), are summarized in Table 1.Model application is evaluated on a second tile, as shown in Figure 3c with the labels from [43].The model application tile has been labeled by a different person than labeled the model evaluation tile [43].The model application tile is used to apply the same method performed on the model development tile without validating the resulting map via ground-truth validation data.Visual inspection of the polygon labels in the two tiles (Figure 3a,c) shows differences in the labeling strategies by different experts.This is an indicator of inconsistencies resulting from human labeling.No pre-processing and quality assurance were performed on the imagery and the ground-truth polygons, as the imagery is of very high resolution, and the polygons were drawn by [43] on the same sets of images, leading to minimal to no mismatch between the polygon boundaries and the liquefaction features in the aerial images.In [9], the authors demonstrated that building roofs are highly variable in terms of colors and materials.As a result, they have a high spectral variation that makes it challenging to discriminate roofs from other classes in a pixel-wise classification framework due to the overlap with other classes.Using a building footprint mask to remove buildings from the classification process has increased accuracy significantly in the liquefaction mapping study by [9].Thus, this study also uses building footprints as prior known geospatial information to exclude buildings and improve classification results.
The geospatial layer of building footprints used in this study was provided by [44].The dataset has some errors (identified from visual inspection) resulting from change in land use since the building footprints were created, mismatch of building boundaries compared to the imagery used, or incompleteness of the building footprints (missing building polygons).For the tiles used in this research, and to enhance the validation process, a few building polygons were created to improve the accuracy and completeness of the building footprint.Figure 4 shows the building footprints used in this study to mask the buildings from the imagery.The classification approach used in this study is designed to be a multi-class classification.Thus, training data of different classes of land use and land cover are needed to train the classifier to distinguish them from liquefaction ejecta.In addition, the proposed

Land Use and Land Cover ROIs
The classification approach used in this study is designed to be a multi-class classification.Thus, training data of different classes of land use and land cover are needed to train the classifier to distinguish them from liquefaction ejecta.In addition, the proposed multi-class method can also provide insight on the classes which lead to the most misclassifications of liquefaction.Thus, high-quality training data are collected by drawing region of interest (ROI) polygons over seven different land covers and land use types, including trees, vegetation, soil, water, shadow, roads, and pavements/driveways.Drawing ROIs was carried out via ArcGIS Pro using the polygon drawing tool in a way that the boundary of the polygons is inside the boundary of the object of interest, so no mixed pixels or out-of-class pixels are sampled for that class.It is recommended that enough ROIs be drawn to cover the widest possible range of pixel values for each class of objects in the tile.

Methodology
The proposed methodology presented in this study uses color, statistical, and texture transformations on high-resolution aerial imagery and dimensionality reduction techniques to create a multi-band dataset from a three-band color image.The flowchart of the methodology is provided in Figure 5.The high-certainty spatially incomplete liquefaction labels combined with spatially incomplete labels for seven additional land cover and land use classes sampled by ROIs and a complete building footprints mask are used as input data.The building footprint is used to mask the building pixels, so they are not included in the training process.It has been observed that liquefaction surface manifestations present as light-colored (dry sand) and dark-colored (wet sand) [9].Thus, the approach also includes Fuzzy C-Means clustering to create two liquefaction label clusters.The Fuzzy approach is chosen to remove outliers and mixed pixels from the liquefaction classes based on an individual pixel having a low probability of belonging to either of the two liquefaction classes.The model is evaluated using accuracy statistics for the labels based on a second dataset of spatially complete liquefaction labels (Figure 3b).The minimum Redundancy Maximum Relevance (MRMR) method is used to identify relevant color, statistical, and texture transformations from the imagery, plus the best outputs of dimensionality reduction techniques to separate the 9 classes (2 liquefaction and 7 land covers).Finally, a semisupervised linear discriminant analysis classifier is trained to classify the unlabeled portion of the partially labeled image.A few combinations of features are tested, and the preferred model with the best set of selected features is introduced.The steps of the methodology are discussed below.based on an individual pixel having a low probability of belonging to either of the two liquefaction classes.The model is evaluated using accuracy statistics for the labels based on a second dataset of spatially complete liquefaction labels (Figure 3b).The minimum Redundancy Maximum Relevance (MRMR) method is used to identify relevant color, statistical, and texture transformations from the imagery, plus the best outputs of dimensionality reduction techniques to separate the 9 classes (2 liquefaction and 7 land covers).Finally, a semi-supervised linear discriminant analysis classifier is trained to classify the unlabeled portion of the partially labeled image.A few combinations of features are tested, and the preferred model with the best set of selected features is introduced.The steps of the methodology are discussed below.The liquefaction labels in the training dataset are visually developed based on expert knowledge.The study by [9] discovered that the surface effects of liquefaction are bimodal in terms of spectral signature and hence should be classed as either wet (darker) or dry (lighter) liquefaction.This is because the water content of these two types differs.Thus,

Fuzzy C-Means Clustering of Liquefaction Data
The liquefaction labels in the training dataset are visually developed based on expert knowledge.The study by [9] discovered that the surface effects of liquefaction are bimodal in terms of spectral signature and hence should be classed as either wet (darker) or dry (lighter) liquefaction.This is because the water content of these two types differs.Thus, we have considered creating two liquefaction labels using clustering methods based on the spectral characteristics of liquefaction.In [43], the authors also discussed that the liquefaction ejecta boundaries, which are hand drawn by the expert as polygons, are uncertain, resulting in imperfect labels, also known as out-of-class or mixed pixels.Particularly in coarse-spatial-resolution imagery, pixels may be a mix of two or more classes, introducing uncertainty and ambiguity into the labels.Soft classification approaches aid in assessing uncertainties and ambiguity in transition zones between different types of land cover [18].Fuzzy logic is a soft classification approach which can be used to extract a single land cover class from out-of-class or mixed pixels.Thus, in this research, the Fuzzy C-Means (FCM) clustering method is used to convert the liquefaction labels to a binary liquefaction label and to improve the labels for the out-of-class and mixed pixels.
Fuzzy C-Means (FCM) is a clustering technique where each data point belongs to a cluster with some degree specified by a membership grade, and the sum of the memberships for each pixel must be equal to 1 [45].FCM allows each data point to belong to multiple clusters with varying degrees of membership.By iteratively updating the cluster centers and the membership scores for each data point, FCM moves the cluster centers to the right location within a dataset.This iteration is based on minimizing an objective function representing the distance from any given data point to a cluster center weighted by that data point's membership grade.
The hyperparameters of the FCM algorithm used in this study include 0.2 as the exponent for the fuzzy partition matrix, 0.00001 as the minimum improvement in objective function, and 100 as the maximum number of iterations.Euclidean distance was used as the method to compute distances and no initial cluster centers were defined.
The result of this step in the methodology is to have a more consistent set of training labels, including two classes of liquefaction.After running the fuzzy clustering, a probability threshold of 0.55 was used to filter out the low-certainty pixels from the labeled liquefaction data of each class (dark and light).The threshold is chosen to be low enough to only remove very low-probability pixels in terms of belonging to either class.

Feature Extraction from RGB Image
RGB wavelengths are highly correlated and thus contain limited spectral contrast information.In this research, applying a variety of image processing, analysis, and transformation techniques is considered to include additional information which can be beneficial in finding new liquefaction features.Table 2 provides a brief overview of the feature extraction techniques used in this study, categorized into 4 groups of color transformations (3 features with 10 total bands), texture analysis (4 features with 7 total bands), statistical indices (7 features with 7 total bands), and dimensionality reduction techniques (3 features with 7 total bands).
Common applications of indices derived from multi-band remote sensing images have been to reduce topography and illumination differences within temporal remote sensing images, and to enhance a certain class of interest, such as minerals, water, and soil content.When generating indices from an image, remote sensing data dimensionality is indirectly reduced to one, for instance, signifying a specific feature or matter [18].These additional bands are extracted information contained in the original RGB; they are showing uncorrelated information in a new set of bands.
The color transformation features extracted from the image include Hue-Saturation-Value (HSV), decorrelation stretch, and cyan-magenta-yellow-black (CMYK).Color transformations have been used previously to improve the mapping of objects of interest via remote sensing data, for instance, to map maize plants [46].The texture analysis techniques considered in this research include convolution and correlation filters, Gabor filters, and wavelet transformation.The statistical indices presented in Table 2 include entropy, gradient weight, sum of squares, mean absolute deviation, variance, standard deviation, and range filters.This study uses grayscale transformation, Minimum Noise Fraction (MNF) rotation transforms, and Principal Component Analysis (PCA) techniques as dimensionality reduction techniques.Texture transformation, statistical analysis, and dimensionality reduction techniques have also been used previously to detect objects of interest via remote sensing data, for instance, to map leafless trees in plantations [47].Color Transformations

HSV 3
Saturation (chroma)-the intensity or purity of a hue Brightness (value)-the relative degree of black or white mixed with a given hue Hue-another word for color Minimum Noise Fraction rotation transforms are used to determine the inherent dimensionality of image data, to segregate noise in the data, and to reduce the computational requirements for processing. [53] Texture Analysis 8 Gabor Filter 4 A filter bank representing a linear Gabor filter that is sensitive to textures with a specified wavelength and orientation. [54] 9 Wavelet Transform 1 Single-level discrete 2-D wavelet transform derived by taking the tensor products of the one-dimensional wavelet and scaling functions, leading to decomposition of 4 components.The approximation coefficients are used as a single band.
[55] Convolution Filter 1 convolution is used to apply filters to an image.A filter (kernel function) is a small matrix of numbers, which is slid over an image, performing a mathematical operation at each position to create a new filtered output image.
[56] Correlation Filter 1 GLCM-derived Correlation as a measure of how correlated a pixel is to its neighbor over the whole image.Range = [−1, 1] [57] The color, statistical, and texture transformations and dimensionality reduction outputs are stacked for the training image and sampled at the locations of updated liquefaction labels and the pixels with ROI labels, and the building pixels are removed.The multi-band stacked dataset is then normalized before being fed to the semi-supervised classification algorithm.Normalization significantly reduces the impact of data range on the final liquefaction prediction model and increases the processing and convergence speed [58].Using the mean (µ) and standard deviation (σ) of individual features, data of all features are normalized using Equation (1).
The machine learning-based classification algorithm proposed in this study is designed to not only learn from the RGB channels of the imagery but also analyze the extracted color, texture, statistical, and dimensionality reduction output bands.Due to the large number of bands, a feature ranking/selection method is required to find the most relevant features.Feature selection is required to avoid redundancy and irrelevance by removing low-weight features.The higher the feature weight, the better the feature can separate the classes.
The methodology used herein implements the Minimum Redundancy Maximum Relevance (MRMR) method [59], which is used to identify important predictors for the multi-class classification problem.The MRMR algorithm finds an optimal set of mutually and maximally dissimilar features that can represent the response variable effectively.The algorithm minimizes the redundancy and maximizes the relevance of the feature set to the response variable.The algorithm quantifies the redundancy and relevance heuristically using the mutual information of variables, which means pairwise mutual information of the features and mutual information of each feature and the binary response.The mutual information between two variables measures how much uncertainty of one variable can be reduced by knowing the other variable.The MRMR algorithm uses this definition to compute the mutual information values used in its iterative process to rank the features by their classification importance [60].
The MRMR algorithm in this study was applied on all land use and land cover classes, plus the two liquefaction classes (9 classes in total), and ranks features by the following steps.(1) Select the feature with the highest relevance to start creating the list of important features.(2) Find features with non-zero relevance and zero redundancy, complementing the feature chosen in step 1 and adding the most relevant feature to the list of selected features.If none of the features in this step has zero redundancy, the most relevant feature among non-zero redundancy features will be selected.(3) Repeat step 2 until no feature with zero redundancy remains among the unselected ones.(4) Select the most relevant feature among non-zero redundancy features, and add it to the selected list.(5) Repeat step 4 until all unselected features have zero relevance.( 6) Add zero-relevance features to the list of selected features in a random order as low-ranked features.
The MRMR feature ranking is carried out separately on each of the 4 categories of features extracted, and the selection of features to be used for the classification step is conditioned to be among the high-weight features by investigating the change in the slope of the curve after plotting the feature weights and choosing the features showing the considerable contribution to class separability.Additionally, to avoid redundancy in features used, a cap of two features per category is considered in this study.Then, the highranked features per category are added to the RGB channels to evaluate the performance of the classification model.Finally, the preferred combination of features is introduced based on the accuracy results.We used equal observation weights for the samples used in the feature selection process via MRMR, and we used the empirical probability distribution derived from observed samples as the prior probability in the feature selection model.

Semi-Supervised Self-Training Classification via Linear Discriminant Analysis
After feature selection, the semi-supervised classification model for liquefaction prediction is developed.The classifier was trained to distinguish liquefaction (two classes) and non-liquefaction (seven classes) pixels using the selected non-color features alongside the color imagery.Semi-supervised learning combines aspects of supervised learning, where all the training data are labeled, and unsupervised learning, where true labels are unknown.
Discriminant analysis is used as the semi-supervised classification method, which assumes that different classes generate data based on different Gaussian distributions [61].To train the classifier, the fitting function estimates the parameters of a Gaussian distribution for each class and finds the smallest calculated misclassification cost to predict classes of new (unlabeled) data.The linear discriminant analysis (LDA) used in this research computes the sample mean of each class.Then it computes the sample covariance by first subtracting the sample mean of each class from the observations of that class and taking the empirical covariance matrix of the result [62].
The semi-supervised algorithm proposed herein iteratively trains an LDA classifier.This iterative process is called self-training.First, the function trains the LDA classifier on the labeled data alone and then uses that classifier to make label predictions for the unlabeled data.At each iteration, the algorithm calculates scores for the predictions.Then it treats the high-probability predictions as true labels (pseudo-labels) for the next training iteration of the classifier if the scores are above a certain threshold.This process repeats until the label predictions converge and all pixels are labeled (final labels).
The choice of Discriminant Analysis classifier as the learner used in the proposed semi-supervised learning approach was based on the type of the problem (multi-class classification), the computational speed (runtime), and the achieved testing accuracy by trying a few potential learners (K-Nearest Neighbor, SVM, Ensemble, etc.).In the LDA implementation, we used an iteration limit of 5, as shown in our published codes, as we found the model not to be sensitive to the number of iterations, and high accuracy was achieved via a very low number of iterations.
Since the proposed method is pixel-based, a post-processing step is performed following the classification to avoid pixelated output maps, remove noise, and smooth the boundaries of predicted features.The median filtering approach is used in the postprocessing step with a window size of 15-by-15 pixels, which is considered as neither very small nor very large compared to the size of the liquefaction features.It should be noted that masking the buildings was a pre-processing step which masks building pixels from the beginning, while median filtering is a post-processing step in this study which re-moves noise and isolated liquefaction predictions from the final binarized liquefaction/nonliquefaction map.

Model Evaluation and Comparative Analysis
After running the classification using the self-trained classifier on the whole image, except pixels within buildings footprints, the model is evaluated based on the binary liquefaction labels.The binary classification includes liquefaction as the positive class, and the confusion matrix specifies the predictions of each sample by one of the four categories, including true positive (TP), true negative (TN), false positive (FP), and false negative (FN).The TP and represent that liquefaction (1) and non-liquefaction (0) are correctly classified, respectively, whereas FP represents that non-Liquefaction is misclassified as liquefaction, and FN represents the cases that liquefaction is misclassified as non-liquefaction class.
The accuracy indices used to compare the different models include precision, recall, F1 score, and overall accuracy.The proportion of correctly classified observations in the positive (liquefaction) class is referred to as the Positive Predictive Values (PPVs) or Precision.A high precision indicates that the model has a high probability of correctly classifying positive samples.The Recall or Negative Predictive Value (NPV) is the proportion of correctly classified observations in the negative (non-landslide) class.While the recall explains how sensitive the model is to identifying actual positive samples, it also quantifies the probability of detecting actual positive samples.Because precision and recall assess different aspects of the model, an index that combines the two is also used.The F1 score is the harmonic mean of the precision and recall, where an F1 score reaches its best value at 1 and worst at 0. The Precision, Recall, and F1 score formulas are provided via Equations ( 2)-( 4).
Precision = TP/(TP + FP) Recall = TN/(TN + FN) Different models are tested for liquefaction classification performance in this study.First, 4 different models based on high-ranked features extracted from the 4 categories of extracted features are evaluated and compared with the model using only RGB channels.Then, features from the categories with improved classification performance are combined, and the classification performance is again compared with the other models, including the one using only RGB.Finally, the preferred model is presented, and its performance is also compared with a supervised learning approach using the same linear discriminant analysis method to investigate the impact of the semi-supervised learning approach.

Training Data
The training data include the two clusters of liquefaction pixels and the seven sets of ROI pixels.Figure 6 shows an example of the clustered liquefaction pixels, demonstrating how the Fuzzy approach helps detect and remove low-probability pixels.In Figure 6a, the liquefaction ejecta have been traced within polygons as part of [43]  There are some notable observations in Figure 8 about the two liquefaction classes.The light liquefaction class has the highest pixel values on average compared to the other classes.The median values of the individual RGB channels in the light liquefaction class are even higher than the 75th percentiles of all other classes, including the class of buildings in which bright roof pixels are common.The RGB mean of the dark liquefaction class is closest to the RGB mean of the soil class; however, looking at the individual box plots of their RGB channels, a difference is observed in terms of the distance between the medians of the channels within each class of dark liquefaction and soil.Another observation is that while for the trees and vegetation classes, the green channels have higher median values, the red channel dominates all the other classes in this sense.Since some liquefaction pixels are in vegetated areas, this might be a helpful distinguishing aspect.There are some notable observations in Figure 8 about the two liquefaction classes.The light liquefaction class has the highest pixel values on average compared to the other classes.The median values of the individual RGB channels in the light liquefaction class are even higher than the 75th percentiles of all other classes, including the class of buildings in which bright roof pixels are common.The RGB mean of the dark liquefaction class is closest to the RGB mean of the soil class; however, looking at the individual box plots of their RGB channels, a difference is observed in terms of the distance between the medians of the channels within each class of dark liquefaction and soil.Another observation is that while for the trees and vegetation classes, the green channels have higher median values, the red channel dominates all the other classes in this sense.Since some liquefaction pixels are in vegetated areas, this might be a helpful distinguishing aspect.

Feature Extraction and Ranking
Visualizations of the extracted features are provided via Figures 9-12 for a small region within the model development tile (Figure 3a) categorized by color transformations (Figure 9), texture analysis (Figure 10), statistical indices (Figure 11), and dimensionality reduction techniques (Figure 12).

Feature Extraction and Ranking
Visualizations of the extracted features are provided via Figures 9-12 for a small region within the model development tile (Figure 3a) categorized by color transformations (Figure 9), texture analysis (Figure 10), statistical indices (Figure 11), and dimensionality reduction techniques (Figure 12).

Feature Extraction and Ranking
Visualizations of the extracted features are provided via Figures 9-12 for a small region within the model development tile (Figure 3a) categorized by color transformations (Figure 9), texture analysis (Figure 10), statistical indices (Figure 11), and dimensionality reduction techniques (Figure 12).Feature ranking was carried out by running the MRMR algorithm separately on each category of extracted features, and high-rank features of each category were considered for model development.The results of feature ranking are provided in Table 3.As discussed in Section 2.2.3, the minimum score of 0.5 was considered for selecting features, and the number of selected features per category was capped at 2 to avoid redundancy and classification model overfitting.
The results of feature ranking are presented in Table 3 and led to the selection of seven high-ranked features, including black and magenta channels of CMYK as two color transformation outputs, approximate coefficients of the wavelet transform as the only texture-derived band, sum of the squares and gradient weight as two statistical indices, and the first two components of PCA analysis as the two bands from the dimensionality reduction outputs.

Semi-Supervised Classification
To understand the impact of each of the selected feature sets by category, the first step was to develop four classification models, each using a category of selected features, via the semi-supervised discriminant analysis classification algorithm presented in Section 2. A model using only RGB channels (Model 1) was also developed for comparative analysis to determine the feature categories which can help in better distinguishing between the two liquefaction classes (dark and light) and other classes of land cover and land use.After running the multi-class classification, the labels were converted to binary, specifically classifying liquefaction versus non-liquefaction, in order to evaluate accuracy against the validation labels.
F1 score and overall accuracy indices were calculated to compare the performance of the evaluated models and presented in Figure 13.All models resulted in improved performance over the RGB-only model (Model 1) except for Model 3 (RGB + color transformation).One explanation for the relatively inferior performance of Model 3 could be the high correlation of black and magenta channels with the RGB color channels, leading to redundancy in data.The improvement in performance accuracy resulting from adding the texture component (Model 4) was only marginal.The highest model performance was for Model 2 (RGB + dimensionality reduction), with Model 5 (RGB + statistical indices) in second place.
Next, an integrated model (Model 6) was developed by using the combination of selected features, including principal components and statistical indices, which showed superior performance among all developed models.The preferred model (Model 6) has the highest overall accuracy of 88.73%.It means that compared to Model 1′s accuracy of 84.53%, Model 6 leads to ~1.45 million more correctly predicted pixel labels across the image tile, resulting in significant corrections in the spatial extent of liquefaction over the image tile.

Semi-Supervised Classification
To understand the impact of each of the selected feature sets by category, the first step was to develop four classification models, each using a category of selected features, via the semi-supervised discriminant analysis classification algorithm presented in Section 2. A model using only RGB channels (Model 1) was also developed for comparative analysis to determine the feature categories which can help in better distinguishing between the two liquefaction classes (dark and light) and other classes of land cover and land use.After running the multi-class classification, the labels were converted to binary, specifically classifying liquefaction versus non-liquefaction, in order to evaluate accuracy against the validation labels.
F1 score and overall accuracy indices were calculated to compare the performance of the evaluated models and presented in Figure 13.All models resulted in improved performance over the RGB-only model (Model 1) except for Model 3 (RGB + color transformation).One explanation for the relatively inferior performance of Model 3 could be the high correlation of black and magenta channels with the RGB color channels, leading to redundancy in data.The improvement in performance accuracy resulting from adding the texture component (Model 4) was only marginal.The highest model performance was for Model 2 (RGB + dimensionality reduction), with Model 5 (RGB + statistical indices) in second place.
Next, an integrated model (Model 6) was developed by using the combination of selected features, including principal components and statistical indices, which showed superior performance among all developed models.The preferred model (Model 6) has the highest overall accuracy of 88.73%.It means that compared to Model 1′s accuracy of 84.53%, Model 6 leads to ~1.45 million more correctly predicted pixel labels across the image tile, resulting in significant corrections in the spatial extent of liquefaction over the image tile.

Semi-Supervised Classification
To understand the impact of each of the selected feature sets by category, the first step was to develop four classification models, each using a category of selected features, via the semi-supervised discriminant analysis classification algorithm presented in Section 2. A model using only RGB channels (Model 1) was also developed for comparative analysis to determine the feature categories which can help in better distinguishing between the two liquefaction classes (dark and light) and other classes of land cover and land use.After running the multi-class classification, the labels were converted to binary, specifically classifying liquefaction versus non-liquefaction, in order to evaluate accuracy against the validation labels.
F1 score and overall accuracy indices were calculated to compare the performance of the evaluated models and presented in Figure 13.All models resulted in improved performance over the RGB-only model (Model 1) except for Model 3 (RGB + color transformation).One explanation for the relatively inferior performance of Model 3 could be the high correlation of black and magenta channels with the RGB color channels, leading to redundancy in data.The improvement in performance accuracy resulting from adding the texture component (Model 4) was only marginal.The highest model performance was for Model 2 (RGB + dimensionality reduction), with Model 5 (RGB + statistical indices) in

Semi-Supervised Classification
To understand the impact of each of the selected feature sets by category, the first step was to develop four classification models, each using a category of selected features, via the semi-supervised discriminant analysis classification algorithm presented in Section 2. A model using only RGB channels (Model 1) was also developed for comparative analysis to determine the feature categories which can help in better distinguishing between the two liquefaction classes (dark and light) and other classes of land cover and land use.After running the multi-class classification, the labels were converted to binary, specifically classifying liquefaction versus non-liquefaction, in order to evaluate accuracy

Semi-Supervised Classification
To understand the impact of each of the selected feature sets by category, the first step was to develop four classification models, each using a category of selected features, via the semi-supervised discriminant analysis classification algorithm presented in Section 2. A model using only RGB channels (Model 1) was also developed for comparative analysis to determine the feature categories which can help in better distinguishing between the two liquefaction classes (dark and light) and other classes of land cover and land use.After running the multi-class classification, the labels were converted to binary,

Semi-Supervised Classification
To understand the impact of each of the selected feature sets by category, the first step was to develop four classification models, each using a category of selected features, via the semi-supervised discriminant analysis classification algorithm presented in Section 2. A model using only RGB channels (Model 1) was also developed for comparative analysis to determine the feature categories which can help in better distinguishing between the two liquefaction classes (dark and light) and other classes of land cover and land use.After running the multi-class classification, the labels were converted to binary, specifically classifying liquefaction versus non-liquefaction, in order to evaluate accuracy against the validation labels.
F1 score and overall accuracy indices were calculated to compare the performance of the evaluated models and presented in Figure 13.All models resulted in improved performance over the RGB-only model (Model 1) except for Model 3 (RGB + color transformation).One explanation for the relatively inferior performance of Model 3 could be the high correlation of black and magenta channels with the RGB color channels, leading to redundancy in data.The improvement in performance accuracy resulting from adding the texture component (Model 4) was only marginal.The highest model performance was for Model 2 (RGB + dimensionality reduction), with Model 5 (RGB + statistical indices) in second place.
Next, an integrated model (Model 6) was developed by using the combination of selected features, including principal components and statistical indices, which showed superior performance among all developed models.The preferred model (Model 6) has the highest overall accuracy of 88.73%.It means that compared to Model 1′s accuracy of 84.53%, Model 6 leads to ~1.45 million more correctly predicted pixel labels across the image tile, resulting in significant corrections in the spatial extent of liquefaction over the image tile.

Semi-Supervised Classification
To understand the impact of each of the selected feature sets by category, the first step was to develop four classification models, each using a category of selected features, via the semi-supervised discriminant analysis classification algorithm presented in Section 2. A model using only RGB channels (Model 1) was also developed for comparative analysis to determine the feature categories which can help in better distinguishing between the two liquefaction classes (dark and light) and other classes of land cover and land use.After running the multi-class classification, the labels were converted to binary, specifically classifying liquefaction versus non-liquefaction, in order to evaluate accuracy against the validation labels.
F1 score and overall accuracy indices were calculated to compare the performance of the evaluated models and presented in Figure 13.All models resulted in improved performance over the RGB-only model (Model 1) except for Model 3 (RGB + color transformation).One explanation for the relatively inferior performance of Model 3 could be the high correlation of black and magenta channels with the RGB color channels, leading to redundancy in data.The improvement in performance accuracy resulting from adding the texture component (Model 4) was only marginal.The highest model performance was for Model 2 (RGB + dimensionality reduction), with Model 5 (RGB + statistical indices) in second place.
Next, an integrated model (Model 6) was developed by using the combination of selected features, including principal components and statistical indices, which showed superior performance among all developed models.The preferred model (Model 6) has the highest overall accuracy of 88.73%.It means that compared to Model 1′s accuracy of 84.53%, Model 6 leads to ~1.45 million more correctly predicted pixel labels across the image tile, resulting in significant corrections in the spatial extent of liquefaction over the image tile.A large score value indicates that the corresponding predictor is important.Also, a drop in the feature importance score represents the confidence of feature selection, meaning that the MRMR algorithm recognizes that adding additional features is not helpful to the classification.Based on this logic, the statistical indices seem more effective overall compared to the other three categories of features.In contrast, the texture components seem to be the least effective among the categories.It should be noted that the texture components might have been of higher importance if the buildings had not been masked as they have a texture that can be detected and learned by texture analysis techniques.
Comparing all the extracted features, regardless of the categories they belong to, the first component of the PCA has the highest score of 0.818, followed very closely by the sum of squares (statistical index), with a score of 0.813.The approximation coefficients of the wavelet transform (texture analysis) and the black channel (color transformation) also have very high feature scores, at 0.794 and 0.791, respectively.

Semi-Supervised Classification
To understand the impact of each of the selected feature sets by category, the first step was to develop four classification models, each using a category of selected features, via the semi-supervised discriminant analysis classification algorithm presented in Section 2. A model using only RGB channels (Model 1) was also developed for comparative analysis to determine the feature categories which can help in better distinguishing between the two liquefaction classes (dark and light) and other classes of land cover and land use.After running the multi-class classification, the labels were converted to binary, specifically classifying liquefaction versus non-liquefaction, in order to evaluate accuracy against the validation labels.
F1 score and overall accuracy indices were calculated to compare the performance of the evaluated models and presented in Figure 13.All models resulted in improved performance over the RGB-only model (Model 1) except for Model 3 (RGB + color transformation).One explanation for the relatively inferior performance of Model 3 could be the high correlation of black and magenta channels with the RGB color channels, leading to redundancy in data.The improvement in performance accuracy resulting from adding the texture component (Model 4) was only marginal.The highest model performance was for Model 2 (RGB + dimensionality reduction), with Model 5 (RGB + statistical indices) in second place.Then, to analyze the benefit of the semi-supervised learning approach versus supervised learning, the same datasets and features used to develop Model 6 (preferred model) were used to train and validate a linear discriminant analysis classifier (same classifier used in the semi-supervised method).The results of the comparative analysis are provided in Figure 14.The results show that the proposed semi-supervised self-training classifier provides improved performance compared to supervised learning (higher precision, F1 score, and overall accuracy, and very close recall score).This might be due to the benefits gained by learning not only from labeled data but also from unlabeled data.The overall accuracy was increased from 87.32% to 88.73%, leading to ~0.49 million more correctly predicted pixel labels.

Liquefaction Map Visualization
Using Model 6, the classification was applied to the model development tile (Figure 15), showing the resulting map of liquefaction ejecta, and the predictions are also visually compared with the [43] labels (Figure 15a) and the validation labels (Figure 15b).In addition, Figure 15c provides the final map derived from the supervised learning approach.Next, an integrated model (Model 6) was developed by using the combination of selected features, including principal components and statistical indices, which showed superior performance among all developed models.The preferred model (Model 6) has the highest overall accuracy of 88.73%.It means that compared to Model 1 s accuracy of 84.53%, Model 6 leads to ~1.45 million more correctly predicted pixel labels across the image tile, resulting in significant corrections in the spatial extent of liquefaction over the image tile.
Then, to analyze the benefit of the semi-supervised learning approach versus supervised learning, the same datasets and features used to develop Model 6 (preferred model) were used to train and validate a linear discriminant analysis classifier (same classifier used in the semi-supervised method).The results of the comparative analysis are provided in Figure 14.The results show that the proposed semi-supervised self-training classifier provides improved performance compared to supervised learning (higher precision, F1 score, and overall accuracy, and very close recall score).This might be due to the benefits gained by learning not only from labeled data but also from unlabeled data.The overall accuracy was increased from 87.32% to 88.73%, leading to ~0.49 million more correctly predicted pixel labels.Then, to analyze the benefit of the semi-supervised learning approach versus supervised learning, the same datasets and features used to develop Model 6 (preferred model) were used to train and validate a linear discriminant analysis classifier (same classifier used in the semi-supervised method).The results of the comparative analysis are provided in Figure 14.The results show that the proposed semi-supervised self-training classifier provides improved performance compared to supervised learning (higher precision, F1 score, and overall accuracy, and very close recall score).This might be due to the benefits gained by learning not only from labeled data but also from unlabeled data.The overall accuracy was increased from 87.32% to 88.73%, leading to ~0.49 million more correctly predicted pixel labels.

Liquefaction Map Visualization
Using Model 6, the classification was applied to the model development tile (Figure 15), showing the resulting map of liquefaction ejecta, and the predictions are also visually compared with the [43] labels (Figure 15a) and the validation labels (Figure 15b).In addition, Figure 15c provides the final map derived from the supervised learning approach.

Liquefaction Map Visualization
Using Model 6, the classification was applied to the model development tile (Figure 15), showing the resulting map of liquefaction ejecta, and the predictions are also visually compared with the [43] labels (Figure 15a) and the validation labels (Figure 15b).In addition, Figure 15c provides the final map derived from the supervised learning approach.Figure 15d also provides the map of liquefaction predicted via Model 1 as the base model using only RGB channels.Figure 16b is an example of boundary correction for the large-scale feature partially labeled by [43], perhaps due to the discontinuities or uncertainties in determining the extent of the feature.This example also shows some model over-predications on bare soils, which was introduced in Section 2 as the most statistically similar class to the dark liquefaction classes.Figure 16c also shows the power of the model in developing an almost spatially complete inventory of liquefaction occurrences using the proposed semisupervised learning with partially labeled data.
Investigating individual features in Figure 16d shows that Model 6 corrects the overpredictions made by Model 1 and predicts several missing liquefaction features which were not predicted by Model 1. Model 6 also improved the boundaries and the quality of the predictions within the liquefaction polygons even by correcting the labels where non-liquefaction areas are included within liquefaction polygons.The other observation from Figure 16d is that small features of dark liquefaction within highly vegetated (darker green) areas are more likely to be missed by the model, rather than the ones in driveways with dark backgrounds.
Figure 16e shows a relatively poorer performance of the model in the detection of light liquefaction compared to dark liquefaction features.The reason is that in the tile used for model development, there are significantly fewer light liquefaction pixels than dark liquefaction.This can cause the classification model not to learn enough from the limited data.The other notable point in this example is the tendency of the model to over-predict liquefaction features by expanding their boundary over the vegetated areas and detecting the very shallow traces of liquefaction, which were not included as ground-truth labels by validation liquefaction polygons.The other observation in this example is the confusion between dark asphalts and pavements with the dark liquefaction class, leading to some over-prediction.

Application and Discussion
The same approach was applied on another tile (Figure 2b) to check the applicability of the approach on a new image.Finally, the classification output is compared with the [43] labels via Figures 17 and 18, showing a few examples of model predictions.In this image, contrary to the model development tile, a significant number of light liquefaction pixels are available.Visual inspection of the maps shown in Figure 18 shows that the algorithm works well in the detection of liquefaction features but with some over-predictions across the tile.Figure 18a is an excellent example of the model performance in detecting liquefaction traces as liquefaction.In this case, since the partial labels by [43] include such pixels, it is an expected performance.The same observation applies to Figure 18b, where a polygon is drawn by [43] over the liquefaction traces.
Figure 18c shows an excellent performance of the model in detecting liquefaction occurrences across the region and completing the partially labeled inventory.However, Figure 18d indicates that there are some inconsistent labels between asphalt and liquefaction, which can be due to the lack of availability of road pixels in this tile compared to the model development tile.The authors recommend one of the following ways to deal this issue.(1) The area of training data collection should be large enough (several tiles should be used) for the classifier to capture as many spectral and spatial patterns as possible within the data.It is recommended that drawing ROIs should be carried out in multiple tiles to gather a variety of pixels per class to train the classifier better, leading to a generalized classification performance.(2) If the number of tiles is low (less than 10, for instance), it is still recommended to perform the tile-based approach used in this study, but each tile should be investigated specifically for the land use and land covers available in the tile, and the amount of spatial coverage in the tile for each class.Then, the ROIs should be drawn in a way that enough pixel samples are collected for the model to be informed of the individual classes.Otherwise, if there are specific minor classes, they can be fully labeled and excluded from the training process, or they can be merged with the most similar class upon an investigation of the statistical and spectral similarity.
be reported and compared to the selected features in this research.
As a summary of the significance and novelty of this research, the advanced machine learning and image processing methods used in the study made it possible to spatially complete and augment the partial inventory of liquefaction labels in the image library provided by [43] with acceptable accuracy.The proposed methodology can be applied in other domains where labeled and high-quality training data are limited, and there is a need for semi-supervised algorithms to classify data due to the difficulty of labeling.

Conclusions
The aim of this paper was to generate spatially consistent and complete maps of liquefaction ejecta using a small number of visually labeled liquefaction surface effects using high-resolution aerial imagery with the aid of advanced computer vision and machine learning algorithms.A semi-supervised self-training classification method was presented A very important aspect of performing the methodology presented in this study is the use of building footprints to exclude building pixels from the process, especially due to the proposed iterative self-training procedure.However, this is dependent on the availability of good-quality polygons of buildings with recently updated, spatially complete, and sufficiently accurate boundaries.Such building footprints are not always available; thus, it is highly recommended by the authors that a review of the building footprint quality should be carried out first, and the geospatial footprints should be revised if needed, by (1) deleting the polygon in which no buildings are present in the imagery; (2) adding polygons for when they are missing for the buildings that are visible in the images; (3) shifting the polygons in space for geometric correction due to the view/capture angle of the airborne device; and (4) correcting the boundaries of building polygons when there is a considerable mismatch with the building in the image.
Furthermore, it should be noted that the applicability of this method depends on the capture of post-event imagery on time before the surface dries, and before human interference causes the removal of liquefaction-related surface effects.Additionally, other sources of surface moisture should not be present to interfere with liquefaction ejecta effects.Liquefaction-impacted areas should also be investigated in terms of available types of liquefaction surface effects.If there is a significant difference between the impacted area, and the two classes of liquefaction presented in this research (dark and light), a third class of liquefaction might be considered upon investigation of benefits to the presented semi-supervised modeling approach (mix of water and soil, with a very high water content, for instance).
In terms of potential future studies to improve the performance of the proposed approach, and to enhance the information content of the data, the authors recommend that additional sources of information be considered for addition to the optical image bands and image-derived bands used in this paper.Additional informative bands can include geospatial liquefaction model probability outputs derived from geospatial proxies with known relationships to the occurrence of earthquake-induced soil liquefaction as indicators of earthquake load, soil density and saturation [2,10], and information derived from geophysical techniques, liquefaction case histories, site investigations, and magnitudedistance empirical relationships [63,64].
Future research studies can be performed to investigate the applicability of the semisupervised approach for other earthquake-induced ground failure damage, e.g., landslides, as well as other surface effects and damage resulting from different types of natural disasters, e.g., flooding, wildfire, and hurricane damage.Feature selection by investigating the various image-derived bands presented in this paper can be performed on other disaster-related datasets and problems (landslide mapping, for instance), and the effective input parameters for the semi-supervised model based on the specific damage type can be reported and compared to the selected features in this research.
As a summary of the significance and novelty of this research, the advanced machine learning and image processing methods used in the study made it possible to spatially complete and augment the partial inventory of liquefaction labels in the image library provided by [43] with acceptable accuracy.The proposed methodology can be applied in other domains where labeled and high-quality training data are limited, and there is a need for semi-supervised algorithms to classify data due to the difficulty of labeling.

Conclusions
The aim of this paper was to generate spatially consistent and complete maps of liquefaction ejecta using a small number of visually labeled liquefaction surface effects using high-resolution aerial imagery with the aid of advanced computer vision and machine learning algorithms.A semi-supervised self-training classification method was presented and validated in this study and showed superior performance in the detection of liquefaction surface effects compared to the supervised learning approach.This paper also investigated using RGB image-derived products as inputs into the classification algorithm, such as transformations based on color, texture, statistics, and dimensionality reduction.The results showed enhanced performance of liquefaction labeling when selected statistical indices and dimensionality reduction techniques' outputs were used alongside RGB channels of the imagery.Texture and color transformation components did not impact the model performance considerably.The feature ranking was carried out via the MRMR algorithm, which reduces the redundancy and increases the relevance of the features to the classes of interest.A Fuzzy-based clustering approach was proposed for clustering liquefaction pixels into two classes of dark and light liquefaction to increase the certainty of the pixels by removing outliers and mixed pixels from the labeled liquefaction training data.It is recommended that this approach be performed in a tile-based manner, and the use of high-quality building footprints should be ensured to achieve accurate results from the presented semi-supervised approach.The advanced machine learning and image processing methods used in the study made it possible to spatially complete and augment the partial inventory of liquefaction labels in the image library provided by Sanon et al. (2022) [43] with acceptable accuracy.The proposed methodology can be applied in other domains where labeled and high-quality training data are limited, and there is a need for semi-supervised algorithms to classify data due to the difficulty of labeling.

Figure 1 .
Figure 1.The location of the two aerial image tiles used in this study in Christchurch, New Zealand.The resolution of the images is 10 cm, and each tile is 720-by-480 m in size.The image tiles are all projected to the New Zealand Transverse Mercator (NZTM) system.The left and right images are shown in detail via Figure 2a,b, respectively.

Figure 2 .
Figure 2. (a) The image tile used for model development and evaluation.(b) The image tile used for further model application.Both tiles have a resolution of 10 cm, and each tile is 720-by-480 m in size.Linear stretching is performed on the image for better visualization contrast.Row and column are indicators of pixel number in each axis of the image shown in the figure.

Figure 1 .
Figure 1.The location of the two aerial image tiles used in this study in Christchurch, New Zealand.The resolution of the images is 10 cm, and each tile is 720-by-480 m in size.The image tiles are all projected to the New Zealand Transverse Mercator (NZTM) system.The left and right images are shown in detail via Figure 2a,b, respectively.

Figure 1 .
Figure 1.The location of the two aerial image tiles used in this study in Christchurch, New Zealand.The resolution of the images is 10 cm, and each tile is 720-by-480 m in size.The image tiles are all projected to the New Zealand Transverse Mercator (NZTM) system.The left and right images are shown in detail via Figure 2a,b, respectively.

Figure 2 .
Figure 2. (a) The image tile used for model development and evaluation.(b) The image tile used for further model application.Both tiles have a resolution of 10 cm, and each tile is 720-by-480 m in size.Linear stretching is performed on the image for better visualization contrast.Row and column are indicators of pixel number in each axis of the image shown in the figure.

Figure 2 .
Figure 2. (a) The image tile used for model development and evaluation.(b) The image tile used for further model application.Both tiles have a resolution of 10 cm, and each tile is 720-by-480 m in size.Linear stretching is performed on the image for better visualization contrast.Row and column are indicators of pixel number in each axis of the image shown in the figure.

Figure 3 .
Figure 3. (a) The image tile used for model development and evaluation with Sanon et al. (2022) [43] liquefaction polygons (in red) overlayed.(b) The image tile with a complete label (in magenta) was created for model validation in this study.(c) The image tile used for model application with Sanon et al. (2022) [43] liquefaction polygons (in red) overlayed.Row and column are indicators of pixel number in each axis of the image shown in the figure.

Figure 3 .
Figure 3. (a) The image tile used for model development and evaluation with Sanon et al. (2022) [43] liquefaction polygons (in red) overlayed.(b) The image tile with a complete label (in magenta) was created for model validation in this study.(c) The image tile used for model application with Sanon et al. (2022) [43] liquefaction polygons (in red) overlayed.Row and column are indicators of pixel number in each axis of the image shown in the figure.

Figure 4 .
Figure 4. Building footprints (in yellow) provided by LINZ [44], used to mask out the buildings in this study for (a) model development tile, and (b) model application tile.The version shown in this figure was modified by adding a few polygons for missing buildings and removing some polygons for which no building was observed in the imagery.Row and column are indicators of pixel number in each axis of the image shown in the figure.

Figure 4 .
Figure 4. Building footprints (in yellow) provided by LINZ [44], used to mask out the buildings in this study for (a) model development tile, and (b) model application tile.The version shown in this figure was modified by adding a few polygons for missing buildings and removing some polygons for which no building was observed in the imagery.Row and column are indicators of pixel number in each axis of the image shown in the figure.

Figure 5 .
Figure 5. Flowchart of the proposed method, showing how multi-band multi-class labeled data are created and fed to the semi-supervised learning method to complete the partially labeled liquefaction data [43].2.2.1.Fuzzy C-Means Clustering of Liquefaction Data

Figure 5 .
Figure 5. Flowchart of the proposed method, showing how multi-band multi-class labeled data are created and fed to the semi-supervised learning method to complete the partially labeled liquefaction data [43].

3 A
linear, pixel-wise operation for visual enhancement Output = T × (Input − mean) + mean T = Transformation Matrix (Covariance-derived) , Y, and K represent the Cyan, Magenta, Yellow and Black components of the CMYK color space image.derivation of the RGB channels Grayscale = (0.2898 × R) + (0.5870 × G) + (0.1140 × B) . The pixels within those polygons are included as training data and used in the Fuzzy-C Means clustering.In Figure 6b, the results of the Fuzzy-C Means clustering are shown, the non-liquefaction pixels on the running path were omitted successfully.The liquefaction clusters are grouped into dark and light liquefaction ejecta.The performance of the algorithm in clustering the dark and light liquefaction pixels is visually acceptable.At the end of this step, the training data for liquefaction are more consistent and clustered based on pixel-based spectral properties.

Figure 6 .
Figure 6.(a) An example of the liquefaction ejecta polygons drawn by Sanon et al. (2022) [43].(b) Fuzzy C-Means clustering results for the example image tile with dark and light liquefaction.The dark red and orange colors are indicators of dark and light liquefaction, respectively.Uncertain pixels are removed from liquefaction training data based on their low probability of belonging to any of the dark or light liquefaction classes.Representative ROIs (polygons) were drawn for seven classes over the image tiles, as presented in Figure7.In Figure8, box plots of the RGB channels of collected data of these seven classes are compared with the clustered liquefaction pixels and pixels located within the building footprints (which are not used in the classification process).It is observed that the buildings have the widest range of values among the classes, overlapping with multiple classes.This reconfirms the benefit of masking out the buildings from training data.Shadow pixels, as expected, have the lowest mean among the classes and are the closest to the class of trees.From a class separability point of view, considering the box plot quantiles and the RGB mean line, the closest classes are the classes of roads and pavements/driveways.Although the class of pavements/driveways has a wider range of values, the median values of the individual RGB channels have some degree of deviation.At the same time, the RGB means have a slight difference.It should be noted that even if these two classes would have similarities, confusing the classifier in distinguishing them, it should not be harmful to the final performance of the model based on the purpose of this study, which is liquefaction detection, especially since the RGB means of liquefaction classes (dark and light liquefaction) are sufficiently different from these two classes (roads and pavements/driveways)There are some notable observations in Figure8about the two liquefaction classes.The light liquefaction class has the highest pixel values on average compared to the other classes.The median values of the individual RGB channels in the light liquefaction class are even higher than the 75th percentiles of all other classes, including the class of buildings in which bright roof pixels are common.The RGB mean of the dark liquefaction class is closest to the RGB mean of the soil class; however, looking at the individual box plots of their RGB channels, a difference is observed in terms of the distance between the medians of the channels within each class of dark liquefaction and soil.Another observation is that while for the trees and vegetation classes, the green channels have higher median values, the red channel dominates all the other classes in this sense.Since some liquefaction pixels are in vegetated areas, this might be a helpful distinguishing aspect.

Figure 6 .
Figure 6.(a) An example of the liquefaction ejecta polygons drawn by Sanon et al. (2022) [43].(b) Fuzzy C-Means clustering results for the example image tile with dark and light liquefaction.The dark red and orange colors are indicators of dark and light liquefaction, respectively.Uncertain pixels are removed from liquefaction training data based on their low probability of belonging to any of the dark or light liquefaction classes.Representative ROIs (polygons) were drawn for seven classes over the image tiles, as presented in Figure 7.In Figure 8, box plots of the RGB channels of collected data of these seven classes are compared with the clustered liquefaction pixels and pixels located within the building footprints (which are not used in the classification process).It is observed that the buildings have the widest range of values among the classes, overlapping with multiple classes.This reconfirms the benefit of masking out the buildings from training data.Shadow pixels, as expected, have the lowest mean among the classes and are the closest to the class of trees.From a class separability point of view, considering the box plot quantiles and the RGB mean line, the closest classes are the classes of roads and pavements/driveways.Although the class of pavements/driveways has a wider range of values, the median values of the individual RGB channels have some degree of deviation.At the same time, the RGB means have a slight difference.It should be noted that even if these two classes would have similarities, confusing the classifier in distinguishing them, it should not be harmful to the final performance of the model based on the purpose of this study, which is liquefaction detection, especially since the RGB means of liquefaction classes (dark and light liquefaction) are sufficiently different from these two classes (roads and pavements/driveways)There are some notable observations in Figure8about the two liquefaction classes.The light liquefaction class has the highest pixel values on average compared to the other classes.The median values of the individual RGB channels in the light liquefaction class are even higher than the 75th percentiles of all other classes, including the class of buildings in which bright roof pixels are common.The RGB mean of the dark liquefaction class is closest to the RGB mean of the soil class; however, looking at the individual box plots of their RGB channels, a difference is observed in terms of the distance between the medians of the channels within each class of dark liquefaction and soil.Another observation is that while for the trees and vegetation classes, the green channels have higher median values, the red channel dominates all the other classes in this sense.Since some liquefaction pixels are in vegetated areas, this might be a helpful distinguishing aspect.

Figure 7 .
Figure 7.The drawn ROIs to collect training data for different land use and land cover classes on the two used tiles in this paper, including trees (dark green), vegetation (light green), soil (light brown), shadow (pink), water (light blue), roads (black), and pavements/driveways (dark blue).Row and column are indicators of pixel number in each axis of the image shown in the figure.(a) Model development tile, and (b) Model application tile.

Figure 8 .
Figure 8. Comparative box plots of the individual RGB channels per class.The mean of RGB channels of each class is also calculated and plotted on the figure using magenta color.

Figure 7 . 29 Figure 7 .
Figure 7.The drawn ROIs to collect training data for different land use and land cover classes on the two used tiles in this paper, including trees (dark green), vegetation (light green), soil (light brown), shadow (pink), water (light blue), roads (black), and pavements/driveways (dark blue).Row and column are indicators of pixel number in each axis of the image shown in the figure.(a) Model development tile, and (b) Model application tile.

Figure 8 .
Figure 8. Comparative box plots of the individual RGB channels per class.The mean of RGB channels of each class is also calculated and plotted on the figure using magenta color.

Figure 8 .
Figure 8. Comparative box plots of the individual RGB channels per class.The mean of RGB channels of each class is also calculated and plotted on the figure using magenta color.

Figure 10 .
Figure 10.Feature extraction outputs via texture analysis techniques.(a) The RGB image with Sanon et al. (2022) polygons (in red) and validation polygons used in this study (in magenta).(b-e) Gabor filters generated at 4 orientations (0, 45, 90, and 135 degrees, respectively) with a wavelength of 5 pixels/cycle.(f) Approximation coefficients of the two-dimensional Haar wavelet transform with symmetric extension mode (half point): boundary value symmetric replication.(g) Convolution filter.(h) Correlation filter.

Figure 10 .
Figure 10.Feature extraction outputs via texture analysis techniques.(a) The RGB image with Sanon et al. (2022) polygons (in red) and validation polygons used in this study (in magenta).(b-e) Gabor filters generated at 4 orientations (0, 45, 90, and 135 degrees, respectively) with a wavelength of 5 pixels/cycle.(f) Approximation coefficients of the two-dimensional Haar wavelet transform with symmetric extension mode (half point): boundary value symmetric replication.(g) Convolution filter.(h) Correlation filter.

Figure 10 .
Figure 10.Feature extraction outputs via texture analysis techniques.(a) The RGB image with Sanon et al. (2022) [43] polygons (in red) and validation polygons used in this study (in magenta).(b-e) Gabor filters generated at 4 orientations (0, 45, 90, and 135 degrees, respectively) with a wavelength of 5 pixels/cycle.(f) Approximation coefficients of the two-dimensional Haar wavelet transform with symmetric extension mode (half point): boundary value symmetric replication.(g) Convolution filter.(h) Correlation filter.

Figure 12 .
Figure 12.Feature extraction outputs via dimensionality reduction techniques.(a) The RGB image with Sanon et al. (2022) polygons (in red) and validation polygons used in this study (in magenta).(b) Grayscale image.(c-e) First, second, and third bands of PCA method's output, respectively.(fh) First, second, and third bands of MNF method's output, respectively.

Figure 12 .
Figure 12.Feature extraction outputs via dimensionality reduction techniques.(a) The RGB image with Sanon et al. (2022) polygons (in red) and validation polygons used in this study (in magenta).(b) Grayscale image.(c-e) First, second, and third bands of PCA method's output, respectively.(fh) First, second, and third bands of MNF method's output, respectively.

Figure 12 .
Figure 12.Feature extraction outputs via dimensionality reduction techniques.(a) The RGB image with Sanon et al. (2022) [43] polygons (in red) and validation polygons used in this study (in magenta).(b) Grayscale image.(c-e) First, second, and third bands of PCA method's output, respectively.(f-h) First, second, and third bands of MNF method's output, respectively.

Figure 13 .
Figure 13.Binary classification accuracy results of the proposed semi-supervised model using different combination of features, which were calculated by comparing the binarized classification labels with the validation liquefaction labels shown in Figure 3b.The darker green color in the heatmap table indicates superior performance by the model.

Figure 14 .
Figure 14.Binary classification accuracy results of the proposed semi-supervised model versus the supervised model using the preferred combination of features, which were calculated by comparing the binarized classification labels with the validation liquefaction labels shown in Figure 3b.The darker green color in the heatmap table indicates superior performance by the model.

Figure 13 .
Figure 13.Binary classification accuracy results of the proposed semi-supervised model using different combination of features, which were calculated by comparing the binarized classification labels with the validation liquefaction labels shown in Figure 3b.The darker green color in the heatmap table indicates superior performance by the model.

29 Figure 13 .
Figure 13.Binary classification accuracy results of the proposed semi-supervised model using different combination of features, which were calculated by comparing the binarized classification labels with the validation liquefaction labels shown in Figure 3b.The darker green color in the heatmap table indicates superior performance by the model.

Figure 14 .
Figure 14.Binary classification accuracy results of the proposed semi-supervised model versus the supervised model using the preferred combination of features, which were calculated by comparing the binarized classification labels with the validation liquefaction labels shown in Figure 3b.The darker green color in the heatmap table indicates superior performance by the model.

Figure 14 .
Figure 14.Binary classification accuracy results of the proposed semi-supervised model versus the supervised model using the preferred combination of features, which were calculated by comparing the binarized classification labels with the validation liquefaction labels shown in Figure 3b.The darker green color in the heatmap table indicates superior performance by the model.

29 Figure 15 .
Figure 15.(a) Binary classification results of Model 6 (preferred model) with Sanon et al. (2022) [43] liquefaction polygons (in red) overlaying the model prediction (in yellow).(b) Model 6 output (in yellow) compared with validation labels (in magenta).(c) Supervised classification output with validation labels overlayed.(d) Model 1 (RGB-based model) classification output with validation labels overlayed.Row and column are indicators of pixel number in each axis of the image shown in the figure.

Figure 15 .
Figure 15.(a) Binary classification results of Model 6 (preferred model) with Sanon et al. (2022) [43] liquefaction polygons (in red) overlaying the model prediction (in yellow).(b) Model 6 output (in yellow) compared with validation labels (in magenta).(c) Supervised classification output with validation labels overlayed.(d) Model 1 (RGB-based model) classification output with validation labels overlayed.Row and column are indicators of pixel number in each axis of the image shown in the figure.

Figure 16 29 Figure 16 .
Figure 16  presents some examples of the performance of Model 6, compared with the base Model 1, with a closer view of different locations within the model development image tile.Figure16ais an example showing how the over-prediction of Model 1 is reduced by Model 6.In addition, Figure16ashows that objects such as machinery are excluded from the model prediction.At the same time, they have been included within validation liquefaction polygons due to the difficulty of excluding them when drawing the polygon around such a large-scale liquefaction feature.Remote Sens. 2023, 15, x FOR PEER REVIEW 23 of 29

Figure 16 .
Figure 16.Binary classification results of Model 1 (RGB-based model in the middle column) and Model 6 (preferred model in the right column) with Sanon et al. (2022) [43] liquefaction polygons (in thick red) and validation labels (in magenta) overlaying the model prediction (in yellow).(a-e) Example regions in the model development tile visualized for evaluation of the predictions.

Figure 17 .
Figure 17.The model application image tile with binary classification results of Model 6 (preferred model in the right column) and Sanon et al. (2022) [43] liquefaction polygons (in thick red).Row and column are indicators of pixel number in each axis of the image shown in the figure.

Figure 17 .
Figure 17.The model application image tile with binary classification results of Model 6 (preferred model in the right column) and Sanon et al. (2022) [43] liquefaction polygons (in thick red).Row and column are indicators of pixel number in each axis of the image shown in the figure.

29 Figure 18 .
Figure 18.The model application example images with binary classification results of Model 6 (preferred model in the right column) and Sanon et al. (2022) [43] liquefaction polygons (in thick red).

Figure 18 .
Figure 18.The model application example images with binary classification results of Model 6 (preferred model in the right column) and Sanon et al. (2022) [43] liquefaction polygons (in thick red).

Table 1 .
Spatial coverage of the polygons used per image tile.

Table 2 .
Extracted features used in the study.

Table 3 .
MRMR feature ranking results.The rank column in the table is an indicator of rank within the category, not the overall rank.