3.1. Validation and Benchmarking
Initially, the region of the Jezero Crater is analyzed, which has been studied by Pletl et al. [
9] and Gao et al. [
10]. To evaluate the clustering performance, we apply two established internal validation metrics commonly used in unsupervised learning: the Calinski–Harabasz (CH) index [
38] and the Davies–Bouldin (DB) [
39] index. These metrics quantify the quality of a clustering result based on intra-cluster compactness and inter-cluster separation. The CH index yields higher values when clusters are dense and well-separated [
38], whereas the DB index penalizes overlapping or poorly separated clusters, with lower values indicating better validity [
39].
To make an evidence-based conclusion, we follow the approach of Pletl et al. [
9]. In this study, the average values of the metrics across all examined cluster numbers are calculated. This is done to mitigate fluctuations in the metric values. The results are presented in
Table 1.
According to all three indices, the UMAP algorithm with optimized parameters outperforms the reference method. The application of the stripe filter leads to improvements in two of the evaluated metrics, while a negative effect is observed in the DB metric, indicating that the overall contribution of the filter is rather minor. Nevertheless, the combination of optimized UMAP parameters and the filter suggests a generally more robust clustering performance, although the impact of each individual component may vary depending on the dataset. Since this configuration produces the optimal values for the Jezero region, subsequent analyses are carried out with the stripe filter applied. In the next step, a Silhouette plot is analyzed to evaluate the method’s ability to identify the optimal number of clusters. Once again, the results are compared with those reported by Pletl et al. [
9].
For this purpose, the SC values are plotted against the number of clusters. In
Figure 3a, the silhouette coefficient reaches a peak of 0.50 at four clusters, suggesting that this configuration provides the most coherent and well-separated groupings in the dataset. In contrast, the reference results by Pletl et al. [
9], shown in
Figure 3b, exhibit generally lower SC values across all tested cluster numbers, with a maximum value that remains below 0.45. The overall range of SC values in the optimized pipeline lies between 0.42 and 0.50, which consistently surpasses the previous benchmarks. This indicates that the combination of parameter tuning in UMAP and the application of the stripe correction filter not only improves clustering compactness but also enhances inter-cluster separability. Additionally, the automated process successfully identifies the optimal cluster count based on the defined SC-based evaluation metrics, eliminating the need for manual interpretation of plots. These results suggest a more reliable clustering outcome, particularly for geologically complex regions like Jezero Crater.
As the next step, a qualitative analysis will be conducted in addition to the quantitative evaluation. For this purpose, the minerals extracted from the SCM are compared with an expert map.
To assign a mineral label to a class, a representative mean spectrum for that class is first calculated. In the initial step, all pixels are grouped according to the their classes
C. For each wavelength band
, the pixel values are summed, and the average value is calculated. Subsequently, the resulting spectrum is compared with a reference spectrum. These reference spectra are sourced from a library specifically created for CRISM data, provided by Ehlmann et al. [
40]. The assignment of an mean spectrum to a representative mineral is performed by evaluating the numerical similarity with each spectrum stored in the database. The reference spectrum with the highest similarity is classified as the identified mineral. To quantify the numerical similarity between spectra, the spectral angle is calculated. This metric is commonly used in the comparison of hyperspectral images and is described by Agarla et al. [
41]. Using Equations (
13) and (
14), the spectral angle APPSA is calculated, aiming for minimal values to indicate a high degree of similarity and lower error.
Here,
m and
n denote the number of pixels in the horizontal and vertical directions, respectively, while
p represents the number of spectral channels.
and
refer to the reference spectrum and the average spectrum of a class, respectively. By applying the comparison to an image consisting of a single pixel, this metric can be effectively utilized to align the two spectral profiles.
The expert map, which is used for comparison is provided by Gao et al. [
10]. It contains a total of six different classes, five of which are associated with minerals, complemented by an unclassified category. To ensure an accurate assessment of the post-processing performance, the comparison is conducted using the previously identified optimal number of four clusters. And additionally, an SCM generated analogously to [
10] using six clusters is shown for comparison in
Figure 4.
The resulting class labels for (a) are listed in the
Table A2 in descending order of their similarity. An analysis of the results for (a) reveals a significant overlap between the classification based on the database comparison and the expert assessment. For classes 1 to 3, the identified minerals align with the expert interpretation. Notably, in class 1, both iron- and magnesium-rich olivines are identified. However, for class 4, the mineral smectite is not among the five minerals with the highest similarity.
The second region analyzed is the FRT0000AA7D dataset from the Mawrth Vallis region, which was previously studied by Bishop et al. [
42]. As previously noted, the availability of expert maps is limited. Consequently, no reference values for the clustering metrics are available for this region. To enable evaluation of the pipeline in this region, we first compute the metrics without applying UMAP optimization or the stripe filter. Subsequently, the impact of these two modifications is calculated analogously to the Jezero region, as summarized in
Table 2.
In contrast to the Jezero region, using the optimized UMAP parameters did not result in a clearly pronounced performance improvement. Although higher values were achieved for two of the metrics, the CH index decreased. The effect of applying the stripe filter is particularly noteworthy. While the CH value increased considerably in this case, the crucial SC value dropped significantly, and the DB value decreased slightly. This indicates that the application of the stripe correction filter can also have a negative impact on cluster performance. A method for controlling the application of the stripe filter is discussed in
Section 3.2 based on a global analysis. Since the application of the optimized UMAP parameters without the stripe filter yields the best results, the following qualitative analysis is conducted using these settings.
In the qualitative analysis, the generated SCM shown in
Figure 5 exhibits strong visual agreement with a reference map provided by [
42]. The results of the mineralogical database comparison are again summarized in
Table A3. In addition to these visual correspondences, mineralogical overlaps are also apparent. For the class that appears in red in the expert map and is labeled as Fe/Mg-smectite, the database comparison identifies Al-smectite as well as a dominance of Fe/Mg-olivines. This likely indicates either a mixed mineralogy (smectite + primary silicates + evaporites) or spectral misclassification within the SCM. The turquoise class, which the expert interprets as Al-phyllosilicates, corresponds well with the database results yielding alunite and kaolinite. Additional minerals such as Mg-carbonate, gypsum, and Fe-sulfates suggest associated carbonates, evaporites or secondary sulfates. The greenish class of poorly crystalline aluminosilicates is largely associated with the black class of the SCM.
However, the corresponding black class in the expert map was not subjected to mineralogical interpretation, making a definitive assessment difficult. These discrepancies highlight a limitation of a purely database-driven mineralogical interpretation. Alternative approaches that could be implemented in future work are hence discussed in
Section 4.
The third region analyzed is an area within Nili Fossae, using the FRT00003E12 dataset. As a reference, a map based on mineral indicators, created according to [
18] and published by Mustard et al. [
43], is utilized. This region presents a particular challenge because, unlike the previously analyzed areas, it predominantly consists of only two distinct mineralogical components.
Analogous to the Mawrth Vallis region, no quantitative comparison is available for this area, a baseline value without UMAP optimization and without the application of the stripe filter is determined first. The corresponding results are presented in
Table 3 and show a similar pattern to the two previously analyzed regions. Here as well, the usage of the UMAP parameters leads to a partly significant improvement across all clustering metrics. Of particular note is the comparatively strong increase in the SC value. When applying the stripe filter, however, a deterioration in clustering performance is observed for the Nili Fossae region. While the SC and CH parameters decrease considerably in some cases, the DB value shows at least a stagnation rather than a decline. Overall, the results confirm the general trend that UMAP optimization tends to have a positive effect on clustering performance, whereas the application of the stripe filter exerts a rather negative influence.
Since a quantitative analysis of this region has not been conducted so far, the assessment is based solely on comparing the two mineralogical maps, which are shown in
Figure 6. Additionally, it draws on the mineralogical interpretation of the SCM by comparing it with the spectral library, as summarized in
Table A4. Particularly noteworthy is the class highlighted in both representations by a reddish coloration. Here, there is a high degree of agreement between the SCM and the reference map, with even fine structural details aligning in both. Furthermore, comparison of the representative mean spectrum with the spectral library suggests that iron- and magnesium-rich olivines exhibit the greatest similarity. For the second class, however, the database comparison does not provide a correct identification, and slight discrepancies in the cluster structures between the two maps can be observed. While phyllosilicates were identified by the author, the database comparison instead yields ices and gypsum as the minerals with the highest spectral similarity. This could be attributed to the spectral resemblance of the materials. Nonetheless, aside from a few inconsistencies in the central region, the SCM replicates these structures in a relatively congruent manner. The discrepancy in the central region may be attributable to the fact that, unlike the clustering approach, mineral indicators need not provide complete coverage of the area.
3.2. Evaluation of the Stripe Filter Approach
In this section, we evaluate the capability of UMAP and k-means, in combination with a striping filter, to robustly cluster regions of the Martian surface. This approach is particularly relevant because it is intended to form the basis for a fully automated process.
As shown in
Figure 7b, many regions exhibit vertically oriented striping patterns in their clustering results. The intensity of these patterns ranges from purely vertical stripes to mixed forms in which additional cluster structures can be observed. These outcomes arise when the previously described clustering procedure is conducted without using the striping filter.
On the left side of
Figure 7a, the results after applying the striping filter are shown. Notably, each region displays a significant degree of improvement after applying the striping filter. In several cases, the filter effectively fully reduces the dominance of vertical artifacts and allows cluster boundaries to emerge more clearly, particularly in regions where mixed cluster structures were previously obscured. Although the maps differ in the extent to which striping is reduced, the patterns become more interpretable. The remaining variations points to a dataset-dependent filter performance and emphasizes the need for a quantitative criterion to assess its effectiveness.
To assess whether the noise level in a dataset can be effectively handled by the striping filter, we used the
noise_variance_ attribute from the scikit-learn PCA implementation [
44]. This method follows the probabilistic interpretation of PCA by Tipping and Bishop [
45]. The attribute estimates the isotropic noise before filtering. Conventional clustering metrics are unsuitable for this purpose, as the resulting stripes, while visually appearing as compact clusters, are mineralogically meaningless. If the noise level exceeds a defined threshold, the filter is unlikely to produce substantial improvements.
As established in
Section 3.1, the use of the filter may be unnecessary and could even reduce the performance of the clustering process. For this reason, it is beneficial to establish a quantitative threshold value that indicates whether the application of the stripe correction filter is warranted. Furthermore, a threshold can be defined above which the clustering results are likely to be affected by residual stripe artifacts, potentially leading to erroneous classifications.
To determine such thresholds, a test dataset comprising 25 regions representing diverse Martian surface types was analyzed, and the noise variance was computed for each region. The regions and their corresponding values are listed in
Table A5 in the Appendix. Furthermore, these regions are classified into three different classes according to the filter efficiency as follows.
Class 1 includes regions that do not exhibit any stripe artifacts during clustering even without applying the filter. Among these are, for instance, the regions examined in
Section 3.1. Class 2 contains those regions that initially show stripe artifacts but can be corrected by applying the filter. This also includes regions in which minimal residual artifacts remain (see FRT00008A4A in
Figure 7). Class 3 considers regions that still display significant residual artifacts even after applying the stripe filter. An example of this is the dataset FRT0001784D.
For determining a threshold value for the application of the filter, the distributions of classes 1 and 2 are examined, as shown in
Figure 8. Class 1 shows relatively low and consistent noise variance values, while class 2 exhibits a higher mean noise variance and a wider spread, suggesting increased variability among datasets. The upper limit of class 1 (0.17) overlaps with the lower limit of class 2 (0.09). This overlap is not undesirable, as it reflects the natural variability of CRISM observations and suggests that the threshold between low and moderate noise levels is continuous rather than discrete. Since, however, the effects of omitting the filter for class 2 regions are more severe than the redundant application of the filter to class 1 regions, a value close to the minimum of class 2 is preferred. Based on the examination of both point distributions, a value of 0.10 is therefore proposed as the threshold for filter application and is integrated into our framework. In general, all regions exceeding this value are processed using the filter.
The second threshold is intended solely as additional information for the user, serving as an indicator of potentially compromised clustering results that may require manual inspection. For this purpose, the class boundaries between classes 2 and 3 are again analyzed. It is observed that class 3 exhibits a much broader distribution, whereas class 2 is more compactly distributed around a mean value of 0.21. Considering the point distributions, it is proposed that clustering results may become unreliable when the noise variance exceeds a threshold of 0.40. Since this threshold serves only an informative purpose, it is chosen to be higher than the maximum value of class 2 and close to the mean of class 3.
In summary, the filter yields artifact-free clustering results in 18 out of 25 regions , demonstrating a high level of robustness across diverse datasets. Moreover, it almost completely corrects artifacts in 13 out of 20 affected regions . Only 7 regions continue to exhibit more pronounced artifacts. However, even in these cases, the resulting cluster structures remain sufficiently coherent to enable at least partial interpretation. Overall, the filtering procedure can therefore be considered effective, with potential for further improvement particularly in regions exhibiting threshold values above 0.40, where additional refinements may enhance performance.