Next Article in Journal
New Optimal Control Problems for Wastewater Treatment with Different Types of Bacteria
Next Article in Special Issue
Stock Market Prediction Using Machine Learning and Deep Learning Techniques: A Review
Previous Article in Journal
Alternative Support Threshold Computation for Market Basket Analysis
Previous Article in Special Issue
ECAN-Detector: An Efficient Context-Aggregation Network for Small-Object Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Benchmarking Point Cloud Feature Extraction with Smooth Overlap of Atomic Positions (SOAP): A Pixel-Wise Approach for MNIST Handwritten Data

1
Strategy and Development Services, Metropolia University of Applied Sciences, Myllypurontie 1, 00920 Helsinki, Finland
2
College of Industrial Technology, Nihon University, 1-2-1 Izumicho, Narashino 275-8575, Japan
3
Department of Design and Data Science, Research Center for Space Science, Advanced Research Laboratories, Tokyo City University, 3-3-1 Ushikubo-Nishi, Tsuzuki-ku, Yokohama 224-8551, Japan
*
Authors to whom correspondence should be addressed.
AppliedMath 2025, 5(2), 72; https://doi.org/10.3390/appliedmath5020072
Submission received: 27 February 2025 / Revised: 21 May 2025 / Accepted: 3 June 2025 / Published: 13 June 2025
(This article belongs to the Special Issue Optimization and Machine Learning)

Abstract

In this study, we introduce a novel application of the Smooth Overlap of Atomic Positions (SOAP) descriptor for pixel-wise image feature extraction and classification as a benchmark for SOAP point cloud feature extraction, using MNIST handwritten digits as a benchmark. By converting 2D images into 3D point sets, we compute pixel-centered SOAP vectors that are intrinsically invariant to translation, rotation, and mirror symmetry. We demonstrate how the descriptor’s hyperparameters—particularly the cutoff radius—significantly influence classification accuracy, and show that the high-dimensional SOAP vectors can be efficiently compressed using PCA or autoencoders with minimal loss in predictive performance. Our experiments also highlight the method’s robustness to positional noise, exhibiting graceful degradation even under substantial Gaussian perturbations. Overall, this approach offers an effective and flexible pipeline for extracting rotationally and translationally invariant image features, potentially reducing reliance on extensive data augmentation and providing a robust representation for further machine learning tasks.

1. Introduction

Feature extraction is often used in machine learning and data analysis, shaping the quality and relevance of the input data for a given task. In the field of image processing, training robust models often requires addressing the challenges posed by spatial transformations such as translation, rotation, and mirror symmetry [1]. These transformations can significantly affect pixel intensities and spatial relationships within an image, creating challenges for machine learning models to generalize effectively. To mitigate these issues, data augmentation techniques are commonly employed [2,3], but they introduce their own limitations:
  • Translation invariance: Images may undergo shifts in spatial position, causing pixel values to move across the image grid. Training models to handle translation typically involves augmenting the dataset with translated versions of the original images.
  • Rotation invariance: Images can appear in different orientations. Achieving robustness to rotations requires augmenting the dataset with rotated images, increasing computational cost and memory requirements.
  • Mirror symmetry: Certain images may appear as mirror reflections. Training models to handle such transformations often involves flipping the images horizontally or vertically, further expanding the dataset.
While these augmentation techniques are effective to some extent, they are computationally expensive and do not inherently guarantee invariance [4]. There is a growing need for feature extraction techniques that are intrinsically invariant to such transformations, reducing the reliance on augmentation and enhancing model efficiency.
In quantum chemistry and materials science, the Smooth Overlap of Atomic Positions (SOAP) descriptor [5,6,7] has revolutionized the way local structural environments around atoms are encoded. Originally designed to represent atomic configurations in molecular and crystalline systems, SOAP has found success in a variety of machine learning tasks, including potential energy surface modeling [8], molecular similarity analysis [9], and structure–property predictions [10].
SOAP encodes structural information by representing atomic environments as high-dimensional, rotationally and translationally invariant features derived from smooth atomic density overlaps. These descriptors are computed using expansions in angular basis and radial basis functions, creating a rich representation of the local geometry and chemistry around atoms. Their continuous, differentiable nature makes SOAP particularly attractive for machine learning workflows that require robust and transferable representations.
While SOAP has primarily been applied to atomistic systems, this work presents a novel application of SOAP descriptors to the domain of image analysis. Specifically, we propose using SOAP-inspired spectra for pixel-wise feature extraction, introducing a new methodology for representing local pixel environments in images. Analogous to atomic neighborhoods, each pixel can be treated as a “local environment” characterized by the intensity values and spatial relationships of its neighboring pixels. By extending the principles of SOAP to these pixel neighborhoods, we derive rotationally, translationally, and mirror invariant descriptors capable of capturing rich, spatially-aware features.
The novelty of this approach lies in its ability to bridge concepts from quantum chemistry with computer vision, creating a new paradigm for pixel-wise feature extraction. Unlike traditional image descriptors that rely on predefined filters or convolutional kernels, the SOAP framework offers a fundamentally different perspective by encoding the spatial “overlap” of pixel distributions. This enables the extraction of high-dimensional features that are both robust to noise and sensitive to local variations, making them ideal for complex tasks such as segmentation, classification, and object recognition.
In this paper, we predict MNIST handwritten data [11], pixel-wise, using SOAP spectra as a feature extraction technique. We observe that the correlation matrix of the SOAP vectors reveals a high degree of correlation among its elements. To address this, we measure the compression efficiency of the SOAP descriptors by comparing three methods: linear autoencoding, principal component analysis (PCA), and deep autoencoding [12,13]. Additionally, we analyze the prediction accuracy in relation to the degree of compression. Finally, we evaluate the robustness of the approach by introducing noise into the dataset by perturbing pixel positions with Gaussian random distributions [14] and assess the predictive performance under these conditions.
Using the mathematical rigor and invariance properties of SOAP, this study introduces a novel feature extraction technique that offers a new perspective in image processing. To our knowledge, this is the first application of SOAP-based methodologies in the context of pixel-wise image analysis. This interdisciplinary approach not only enhances the toolbox of image processing techniques but also demonstrates the potential for repurposing advanced descriptors from quantum chemistry for entirely new domains.
In this paper, the content is organized as follows: Section 2 reviews related work in the field. Section 3 presents the methodology, including an overview of how the SOAP descriptor is computed and the process of mapping 2D images to 3D point clouds. Section 4 details four experiments: (1) optimizing SOAP parameters using Monte Carlo methods and evaluating classification accuracy on the MNIST handwritten digits dataset; (2) analyzing the compressibility of SOAP vectors; (3) assessing the robustness of the SOAP method under positional perturbations; and (4) comparing the performance of SOAP and convolutional neural networks (CNNs) on localized image regions, both with and without data augmentation. Section 5 outlines potential directions for future work, and Section 6 concludes the paper.

2. Related Work

Data augmentation has long served as a crucial technique in machine learning for mitigating the challenges of limited data and overfitting. Initially introduced as a statistical method to facilitate maximum likelihood estimation from incomplete data [15,16], augmentation techniques soon found applications in Bayesian analysis [15] and later evolved to become a staple in modern machine learning workflows. Early approaches in image processing, for instance, focused on perturbing data through affine transformations to simulate different viewpoints and enhance training datasets [17]. These geometric transformations—comprising rotations, translations, and mirror reflections—were adopted to instill invariance in convolutional neural networks (CNNs), despite the increased computational and memory overhead that comes with augmenting the dataset with multiple modified copies of each image.
The evolution of data augmentation techniques saw the integration of more sophisticated methods such as elastic distortions [17], color space adjustments, and noise injection, all aimed at enhancing the diversity of training data. These methods have been instrumental in addressing issues such as class imbalance, where techniques like the Synthetic Minority Over-sampling Technique (SMOTE) generate new synthetic examples by interpolating between minority class samples [18]. Such synthetic oversampling methods have proven particularly effective in domains where data scarcity is pronounced, including medical diagnosis and signal processing [19].
More recent research has turned to generative models, such as Generative Adversarial Networks (GANs) [20], to produce high-fidelity synthetic data. These approaches have not only been applied to image classification tasks but also extended to the augmentation of biological and mechanical signals, thereby enhancing model performance in applications ranging from EEG-based emotion recognition [21] to industrial control systems [22].
Despite the broad success of these data augmentation strategies, a persistent challenge remains: while the augmentation process can enrich the dataset, it does not inherently confer invariance to spatial transformations such as translation, rotation, and mirror symmetry. This limitation often necessitates large-scale data augmentation to achieve robustness, which in turn incurs significant computational costs.
Another influential development in pixel-wise machine learning is the U-Net architecture, which has become a benchmark for image segmentation tasks, particularly in biomedical imaging [23]. U-Net employs an encoder–decoder structure with skip connections that efficiently combine low-level spatial information with high-level semantic features, enabling precise localization and robust segmentation even with limited training data. Its success has spurred numerous variants and inspired a range of applications in pixel-level prediction tasks. However, U-Net and similar architectures typically rely on extensive data augmentation and complex network designs, which can be computationally demanding and may still not guarantee complete invariance to spatial transformations [23].
While traditional data augmentation techniques and architectures such as U-Net enhance model robustness by artificially expanding training datasets and leveraging complex encoder–decoder frameworks, SOAP-inspired descriptors operate on a fundamentally different principle. Rather than relying on extensive augmentation to enforce invariance, SOAP directly encodes spatial relationships through its mathematical formulation. By embedding invariance to translation, rotation, and mirror symmetry at the feature representation level, SOAP circumvents the need for excessive data manipulation and augmentation. This not only streamlines model training but also provides a more structured and theoretically grounded approach to capturing local geometric patterns in image data.

3. Methodology

Our objective is to extract the local information on a pixel, by getting the SOAP vector (or SOAP spectrum), on an image. In this section, we will go through the methodology of how SOAP spectra are acquired and how the images are projected from 2D to 3D to make that possible. An overview of our methodology is shown in Figure 1.

3.1. SOAP Formulation

The Smooth Overlap of Atomic Positions (SOAP) descriptor provides a robust framework for encoding local environments, representing them as rotationally, translationally and mirror symmetry invariant features. Originally designed for quantum chemistry applications, the SOAP descriptor was adapted in this study for pixel-wise feature extraction in images. This section outlines the mathematical formulation of SOAP.

3.1.1. Density Function

We describe the local environment around a reference point r o using a density function ρ o , where the contributions from surrounding points within a hyperparameter r cut , are smoothly distributed through Gaussian smoothing:
ρ o ( x o , y o , z o ) = i exp r o R i 2 2 σ p 2 ,
where exp is the exponential function, o represents a local point, R i = ( x i , y i , z i ) are the positions of neighboring points, σ p is a hyperparameter that determines the width of the Gaussian smoothing, and r o = ( x o , y o , z o ) . An example can be seen in Figure 2b.

3.1.2. Spatial Basis Function

The spatial basis function Φ n l m o ( x o , y o , z o ) is defined as the product of two components: a radial function g n l o ( r o ) and an angular function Y l m o ( θ o , ϕ o ) . These components are combined as follows:
Φ n l m o ( x o , y o , z o ) = Φ n l m o ( r o , θ o , ϕ o ) = g n l o ( r o ) Y l m o ( θ o , ϕ o ) ,
where g n l o ( r o ) captures the radial variation, while Y l m o ( θ o , ϕ o ) encodes the angular dependence in spherical coordinates, where r o = x o 2 + y o 2 + z o 2 is the distance from the reference point, θ o = arccos ( z o / r o ) is the polar angle, and ϕ o = arctan 2 ( y o , x o ) is the azimuthal angle (The function atan 2 ( y , x ) computes the angle θ between the positive x-axis and the point ( x , y ) , returning a value in ( π , π ] . Unlike arctan ( y / x ) , atan 2 ( y , x ) accounts for the signs of both arguments to determine the correct quadrant of θ ). See Figure 3 as an example.

3.1.3. Radial Basis Functions

The radial basis functions g n l o ( r o ) capture the radial dependencies of the local environment. These functions may either depend on the angular number l (denoted as g n l o ( r o ) ) or be independent of l (denoted as g n o ( r o ) ). Orthonormality (A set of functions { f i } is orthonormal if it satisfies f i , f j = a b f i ( x ) f j ( x ) d x = 0 for i j (orthogonality) and f i , f i = a b f i 2 ( x ) d x = 1 (normalization)) is a key property of these basis functions, ensuring that the expansion coefficients are unique and non-redundant (See Figure 4).

3.1.4. Angular Basis Functions

The angular basis functions Y l m o ( θ o , ϕ o ) encode the angular dependencies of the local environment. These functions are constructed to represent directional information and are parameterized by two indices: l, which controls the level of angular detail, and m, which distinguishes variations within each level. This is analogous to the frequencies of sine and cosine functions around a sphere. In our case, we use spherical harmonics Y l m ( θ , ϕ ) as the angular basis functions (See Figure 5).
A key property of the angular basis functions is their orthonormality, ensuring that the components of the representation remain independent and non-redundant. Additionally, spherical basis functions depend only on angular coordinates, meaning they are invariant to scaling of the input vector: for any constant a,
Y l m ( x , y , z ) = Y l m ( a x , a y , a z ) .

3.1.5. SOAP Expansion Coefficients

The expansion coefficients c n l m o are key to representing the local environment in the SOAP formulation. These coefficients quantify the projection of the local environment density function ρ o ( x o , y o , z o ) onto the spatial basis functions Φ n l m o ( x o , y o , z o ) , which combine radial and angular components. This projection ensures that the complex spatial information encoded in ρ o is transformed into a compact and expressive feature representation:
c n l m o = ρ o ( x o , y o , z o ) Φ n l m o ( x o , y o , z o ) d x o d y o d z o .
The orthonormality of the basis functions ensures that these coefficients are unique and non-redundant, making them an efficient and interpretable representation of the local environment. An example of an integrant is shown in Figure 6.

3.1.6. SOAP Power Spectrum

The SOAP power spectrum is a descriptor that is rotationally, translationally, and mirror invariant. It is computed as the inner product of the expansion coefficients over m, capturing the essential characteristics of the local environment. The power spectrum is defined as:
P o SOAP = P n n l o = π 8 2 l + 1 c ¯ n l c n l = π 8 2 l + 1 m c ¯ n l m o c n l m o ,
where c ¯ n l m o is the complex conjugate of c n l m o . The inner product over m ensures that the power spectrum encodes information about the radial and angular dependencies while removing orientation-specific.
The resulting descriptor, P o SOAP , is a high-dimensional, invariant feature vector that represents the local environment around the reference point r o . This invariance is critical for tasks requiring consistent feature extraction across different orientations and positions.
A more detailed examples of the radial basis functions, angular basis functions, and coefficients, including their computation and role in feature construction, are provided in Appendix A.

3.2. Converting Images to 3D Points and Computing SOAP Descriptors

To adapt the SOAP formulation for image analysis, the pixel intensities of 2D images are converted into 3D point representations. These 3D points serve as the input for computing SOAP descriptors. This section explains the methodology for these steps.

3.2.1. Converting Gray-Scale Images to 3D Points

Each image is represented as a collection of 3D points, where the x and y coordinates correspond to the pixel positions in the image, and the z-coordinate is derived from the gray-scale pixel intensity which are the values from maximum 255 divided by 10 to maximum 25.5, independent of Gaussian scaling. Algorithm 1 describes the procedure for generating the 3D representation, including an optional Gaussian displacement to account for variability or noise in the data. Variable descriptions are shown in detail in Table A1 and Table A2 in Appendix B.
For a single M 0 × M 1 image, the intensity values of each pixel are scaled and mapped to the z-axis, while the x and y coordinates retain their pixel positions. A Gaussian displacement with standard deviation σ disturbance is applied to the x, y, and z coordinates to introduce variability. This random displacement is particularly useful when studying the divergence of SOAP features under small perturbations. Only non-zero intensity pixels are considered in the transformation, ensuring computational efficiency by excluding irrelevant regions.
The result of this process is a 3D structure F k , where each point ( x i , y i , z i ) corresponds to a pixel in the original image. This step bridges the gap between the 2D image space and the 3D local environments required for SOAP descriptor computation (Figure 7).
Algorithm 1 ConvertImagesToXYZ (3D structure)
Inputs: 
●  { I k } : A collection of n images, each of size M 0 × M 1 , where I k is a single image.
●  σ disturbance : Standard deviation for Gaussian displacement.
Outputs: 
● Points { F k } = { ( x i , y i , z i ) } derived from the intensity values of each pixel in each image, with optional random displacement. This is the input for Algorithm 2.
 
 
1:
function ImageToXYZ( I , σ disturbance )       ▹ Converts a single M 0 × M 1 image I into 3D points.
2:
    Initialize an empty list P for points.
3:
    for  i 0   to   M 0 1  do
4:
        for  j 0   to   M 1 1  do
5:
            z I [ i , j ] / 10       ▹ Squashed intensity from 255 to a scale closer to the image dimension.
6:
           if  z > 0  then     ▹ Eliminated pixels that are zero
7:
                x j + N ( 0 , σ disturbance 2 )
8:
                y i + N ( 0 , σ disturbance 2 )
9:
                z z + N ( 0 , σ disturbance 2 )
10:
             Append ( x , y , z ) to F .
11:
           end if
12:
        end for
13:
    end for
14:
    return  F
15:
end function
 
     ▹ Main procedure: convert all images { I k } into 3D point sets.
16:
for  k 1   to   n  do
17:
     F k ImageToXYZ  ( I k , σ disturbance )
18:
end for

3.2.2. Computing SOAP Descriptors for Image 3D Structures

Once the images are converted into 3D structures, SOAP descriptors are computed for each point in the 3D space. Algorithm 2 provides a detailed procedure for this computation. Each 3D structure F k is processed to extract per-point SOAP descriptors, capturing the local spatial arrangement of points.
For each point r o = ( x o , y o , z o ) in F k , the SOAP formulation outlined earlier is applied. Using the radial basis functions g n l o ( r o ) and angular basis functions Y l m o ( θ , ϕ ) , the density function is expanded into orthonormal basis functions. The expansion coefficients c n l m o are then used to compute the power spectrum P o SOAP as described in Equation (6).
The result is a matrix P k , where each row represents the SOAP descriptor P o for a single point in the 3D structure. The descriptors encode local spatial patterns in a manner invariant to rotations, translations, and mirror symmetry. By aggregating the SOAP descriptors across all points, a comprehensive feature set for the image is obtained, enabling pixel-wise classification and other machine learning tasks.
This approach leverages the robustness of SOAP descriptors to provide a high-dimensional, invariant representation of image features, merging methodologies from quantum chemistry and computer vision.
Algorithm 2 Compute SOAP Matrices for a Set of 3D Structures
Inputs: 
● A collection of 3D structures { F k } k = 1 n , where each structure F k is an N k × 3 matrix containing coordinates in 3D space:
F k = x 1 y 1 z 1 x 2 y 2 z 2 x N k y N k z N k .
 
● Parameters for the SOAP descriptor: { r cut , n max , l max , σ p } .
Outputs: 
● A set of SOAP matrices { P k } k = 1 n , where each P k is an N k × d matrix holding the per-point SOAP vectors { P o } for F k . In other words,
P k = P o 1 P o 2 p o N k , P o R d .
 
This will be the input for Algorithm 3.
                    ▹Algorithm Steps
1:
Initialize an empty set { P k } k = 1 n store all SOAP matrices.
2:
for  k 1   to   n  do
3:
    Let F k be of size ( N k × 3 ) .
4:
    Define P k as an ( N k × d ) matrix.
5:
    for  o 1 to N k  do       ▹ Compute SOAP vector P o for the o-th point of structure F k .
6:
         P o SOAP F k ; r cut , n max , l max , σ p
      ▹ Compute SOAP Spectra for each non-zero pixel.
7:
        Insert P o (a 1 × d vector) into the o-th row of P k .
8:
    end for
9:
    Store P k in the overall result set { P k } k = 1 n .
10:
end for
11:
return  { P k } k = 1 n     ▹ Each P k is a matrix capturing SOAP descriptors of F k .
Algorithm 3 Random Extraction of SOAP Spectra with Rescaling
Inputs: 
Collection of SOAP vectors from 3D structures, { P k } k = 0 N k 1 = { P 0 , , P N k 1 } .
Outputs: 
Extracted and rescaled SOAP descriptors X R T × d , and corresponding labels y R T with the rescaler s R R . This will be the dataset for the experiments.
 
 
1:
Initialize X 0 T × d
2:
Initialize y 0 T
3:
for  t 1  to T do
4:
    Pick a random index r { 0 , , N k 1 }
5:
    Select a random descriptor P t R d from file P r
6:
     X [ t , : ] P t
7:
     y [ t ] r
8:
end for
9:
( X , s R R ) RobustRescalor ( X )
10:
return  X , y , s R R

4. Experiments and Results

To evaluate the performance of SOAP-based pixel-wise classification, a series of experiments were conducted. This section details the creation of the training datasets and outlines the experimental setups used to validate the methodology.
To systematically assess the effectiveness of SOAP-based pixel-wise classification, we conduct three key experiments. The first experiment focuses on hyperparameter optimization and pixel predictions, where we explore the impact of various SOAP descriptor parameters on model performance to determine an optimal configuration, then predict their classification of each pixel on the validation and test set. The second experiment investigates SOAP vector compression, evaluating different dimensionality reduction techniques—including PCA, linear autoencoding, and deep autoencoding—to quantify the trade-off between compression and classification accuracy. Finally, the third experiment examines robustness to pixel position perturbations, introducing Gaussian noise to pixel coordinates to assess the stability of SOAP-based feature extraction in the presence of spatial distortions.

4.1. Training Data Preparation

The training data for the experiments was derived from the SOAP descriptors computed for the 3D structures obtained from the MNIST dataset of handwritten digits. This dataset consists of 60,000 gray-scale images of size 28 × 28 pixels. Each image was converted into a 3D structure following the methodology described in Section 3.2. 120,000 random SOAP spectra were collected, and split into 0.8:0.2 training and validation sets. The test dataset was collected from the MNIST handwritten dataset of 10,000 gray-scale images, and 10,000 random SOAP spectra were collected as a test set. The processes for generating the datasets in each experiment are detailed in Algorithm 3.
As for the training and validation, a collection of SOAP matrices, { P k } k = 0 N k 1 , was computed using the Dscribe Python, package, version 2.1.1 [6,24] and then randomly sampled to extract T = 12 × 10 4 descriptors. These descriptors form the training feature matrix X R T × d . Each SOAP vector P t was assigned a corresponding label r , indicating its association with the r-th digit class in the MNIST dataset.
By using the MNIST dataset, this study leverages the well-established benchmark for handwritten digit recognition, enabling a rigorous evaluation of the proposed methodology and facilitating comparisons with other approaches.
To ensure numerical stability and facilitate model convergence, the SOAP descriptors were rescaled using a robust rescaling procedure, RobustRescalor, which adjusts the data based on the distribution of feature values. The rescaled descriptors and their corresponding labels constitute the final dataset, ( X , y ) , with the parameters for robust rescaling for later use s R R used in subsequent experiments.
The creation of this dataset ensures diversity in the sampled descriptors and maintains a balanced representation across the different input structures, facilitating robust model training and evaluation.

4.2. Experiment 1: Hyperparameter Optimization for SOAP and Predictions

4.2.1. Objective

The objective of this experiment is to identify the optimal SOAP descriptor parameters ( r cut , n max , l max , and σ p ) for pixel-wise digit classification and to evaluate their impact on model performance. This experiment aims to establish the sensitivity of the model to these parameters and determine the configurations that maximize validation accuracy while minimizing redundancy.

4.2.2. Methods

In this experiment, we employed a hyperparameter search using a Monte Carlo sampling strategy [25] over 168 trials. Monte Carlo sampling refers to a probabilistic technique where values are drawn randomly from specified distributions rather than systematically exploring all possible combinations (as in grid search). This approach allows for efficient exploration of high-dimensional hyperparameter spaces by sampling diverse configurations, reducing the computational cost associated with exhaustive search. The search explored the following ranges: the neighborhood radius r cut was varied between 2 and 100, the radial basis count n max between 2 and 11, the angular resolution l max between 2 and 11, and the Gaussian width σ p between 1 and 10. The model architecture used was the pixel-wise classification model (see Figure 8), trained on 120,000 data points consisting of SOAP descriptors derived from MNIST images. The training protocol included the Adam optimizer [26] with a learning rate of 0.001, a batch size of 128, and 300 training epochs, with the data split into 80% for training and 20% for validation. Validation accuracy served as the primary metric, and the influence of individual hyperparameters was analyzed by correlating them with accuracy trends across the trials.

4.2.3. Results

The best combination of hyperparameters, listed in Table A1, yielded a validation accuracy of 0.6844. Notably, r cut exerted the greatest influence on accuracy, while σ p performed best between 2 and 5. The parameters n max and l max had less impact, provided they were larger than approximately 6. The size of the pixel-wise SOAP spectra with the optimal parameters was 7 ( 7 + 1 ) / 2 × ( 10 + 1 ) = 308 . The results are summarized in Table 1 and the confusion matrix on the test set is shown in Figure 9. Figure 10 presents scatter plots illustrating the relationships between the different hyperparameters ( n max , l max , r cut , and σ p ) and their effect on validation accuracy.
Figure 11 shows examples of handwritten digits that are relatively easy to classify, indicating near-perfect predictions on clear and unambiguous shapes. These images highlight situations where SOAP-based features can successfully capture local environments without requiring additional data augmentation (e.g., rotation or flipping).
Figure 12 demonstrates a challenging case where a handwritten 7 (Figure 12a) can be rotated 90 degrees and mirror-flipped (Figure 12b), causing the model to misclassify it as a 4. This misclassification arises because SOAP features do not inherently distinguish between these symmetries.
Figure 13 displays another example where points far from the handwritten shape (digit 3) tend to be predicted less accurately. These edge points do not strongly resemble any digit, indicating that SOAP features, while robust, still depend on local geometry and can produce errors on pixels far from the number’s main structure.
Finally, Figure 14 shows ambiguous shapes of handwritten digits (e.g., a 4 and a 9) that can confuse not only the model but also human observers. In such cases, even the most sophisticated feature extraction approaches may fail if the digit is too ambiguous.

4.2.4. Discussion

The results underscore the importance of selecting an appropriate r cut and ensuring σ p lies in the range of 2–5 for improved accuracy. By leveraging SOAP features, our model does not require augmentation for training, such as rotation, translation, or mirror flipping. This is because SOAP naturally encodes local geometric information of each pixel or point in the handwritten digits.
However, the same property that makes SOAP robust against certain transformations also introduces challenges when symmetrical orientations are key to correct identification. For instance, as shown in Figure 12, a handwritten digit 7 rotated 90 degrees and mirror-flipped closely resembles a 4. Humans also tend to misinterpret it in such an orientation [27], but in deep learning-based models without built-in symmetry handling, such misclassifications can be frequent. Moreover, SOAP struggles with highly ambiguous handwriting (see Figure 14), although this limitation is not unique to SOAP.
Computational cost must also be taken into account, as the pipeline introduces additional steps not present in typical CNN-based methods—such as the conversion of image data into 3D point clouds and the calculation of SOAP descriptors, which involve relatively high computational complexity (see Appendix A).
In summary, SOAP-based feature extraction presents a strong option for digit classification tasks, particularly for reducing the need for data augmentation. It is especially effective for clear, unambiguous shapes and for learning from relatively limited data. Yet, there are limitations for SOAP (e.g., not being able to distinguish between 6 and 9 sometimes due to rotational invariance), and additional strategies to account for orientation or symmetries may be required to further improve accuracy.

4.3. Experiment 2: SOAP Vector Compression and Impact on Prediction Accuracy

4.3.1. Objective

The high-dimensional nature of SOAP descriptors (308 dimensions in our optimal configuration) introduces computational challenges for downstream machine learning tasks. This experiment evaluates the compressibility of SOAP vectors by comparing three encoding methods—principal component analysis (PCA), linear autoencoding, and deep autoencoding—and quantifies the trade-off between compression ratio and reconstruction accuracy. We further analyze how compression impacts the performance of digit classification.

4.3.2. Methods

For this experiment, we use a subset of 120,000 SOAP descriptors from Experiment 1, which is divided into training (80%) and validation (20%) sets. The compression techniques considered include PCA, which performs linear dimensionality reduction via singular value decomposition; a linear autoencoder, implemented as a single-layer neural network with h e hidden units and linear activation (see Figure 15); and a deep autoencoder, which employs a non-linear architecture with an encoder defined as f e : R 308 R 308 h m R h e and a decoder defined as f d : R h e R 308 h m R 308 , where h m { 2 , 4 , 10 } controls the hidden layer capacity (see Figure 16). The evaluation metrics include the reconstruction loss, measured as the mean squared error (MSE) [28] between the original and reconstructed SOAP vectors, and the classification accuracy of Model A (from Experiment 1) when using the compressed features. All autoencoders are implemented using the Adam optimizer with a learning rate of 0.0001 and a batch size of 512, and they are trained for 10,000 epochs.

4.3.3. Results

Figure 17 reveals strong correlations between SOAP vector components, suggesting significant redundancy and motivating compression to eliminate redundant dimensions without sacrificing predictive power. Figure 18 shows the relationship between the encoding dimension h e and the reconstruction loss, where both PCA and a linear autoencoder exhibit identical performance for h e < 200 , with PCA becoming superior at higher dimensions due to its optimal linear subspace identification, while a deep autoencoder outperforms linear methods for h e < 50 by leveraging non-linear mappings to preserve information. Furthermore, Figure 19 demonstrates the impact of compression on classification accuracy: for high dimensions ( h e > 50 ), all methods achieve more than 95% of the baseline accuracy (308 dimensions), with PCA slightly outperforming autoencoders, whereas under aggressive compression ( h e < 50 ), test accuracy suddenly drops and deep autoencoding outperforms PCA.

4.3.4. Discussion

SOAP vectors exhibit substantial redundancy, enabling compression to approximately 100 dimensions (one-third of the original size) without any loss in accuracy. The key findings include computational efficiency—since principal component analysis (PCA) provides optimal compression for h e > 50 , requiring no training and minimal implementation effort—and performance in the high compression regime, where deep autoencoders outperform linear methods for h e < 50 , albeit at the cost of increased model complexity. Moreover, for MNIST classification, compressing to h e = 100 results in nearly no loss in prediction accuracy (98% of the baseline). This analysis confirms that SOAP’s rotational and translational invariance does not preclude efficient compression, and it indicates that the choice between linear and non-linear compression depends on the target dimensionality and acceptable accuracy trade-offs. Future work could explore hybrid approaches or task-specific compression to further optimize this balance.

4.4. Experiment 3: Robustness to Pixel Position Perturbations

4.4.1. Objective

To evaluate the robustness of SOAP-based feature extraction against noise, we introduce Gaussian perturbations to pixel positions and measure the impact on validation accuracy. This experiment tests whether the method gracefully degrades with increasing noise, thereby reflecting its stability in real-world scenarios with imperfect data.

4.4.2. Methods

In our approach, noise is injected into each image by perturbing the pixel coordinates ( x , y ) and the intensity-derived z-values with additive Gaussian noise according to the equation
r i = r i + ϵ , ϵ N ( 0 , σ disturbance 2 I ) ,
where σ disturbance controls the noise magnitude (tested over a range from 0.1 to 5.0 in 20 logarithmic steps). The dataset consists of 10,000 MNIST test images converted to 3D structures with noise using the same SOAP parameters as in Experiment 1 ( r cut = 63 , n max = 7 , l max = 10 , σ p = 3 ), and the model employed is our three-layer prediction model (see Figure 8). The primary metric for evaluation is the validation accuracy as a function of σ disturbance .

4.4.3. Results

The results, as shown in Figure 20, indicate that validation accuracy decreases smoothly with increasing σ disturbance . At a noise level of σ disturbance = 1.0 , accuracy remains at 92% of the baseline (i.e., the case when σ disturbance = 0 ), demonstrating robustness to moderate noise; however, performance drops to chance levels (approximately 51%) at σ disturbance = 3.9 , a point where local pixel neighborhoods are irrecoverably distorted. Additionally, a critical threshold is observed: accuracy declines sharply beyond σ disturbance = 0.5 .

4.4.4. Discussion

These findings demonstrate that SOAP-based features exhibit gradual performance degradation under controlled noise, confirming their stability for practical applications. The smooth decline in accuracy, rather than a catastrophic failure, validates the method’s suitability for scenarios with noisy data and positional uncertainty, and suggests that future work could couple SOAP with denoising techniques to further enhance robustness.

4.5. Experiment 4: Comparison of SOAP and CNN Under Rotation Augmentation

4.5.1. Objective

The objective of this experiment is to compare the performance of SOAP descriptors and CNN-based models under rotation-based data augmentation. Specifically, we evaluate how both approaches perform when trained on datasets with and without rotation augmentation and tested on uniformly rotated versions of the MNIST dataset.

4.5.2. Methods

Two training datasets were constructed from the MNIST handwritten digits training set. For the first dataset, we randomly selected 6000 images and applied random in-plane rotations between 0 and 360 degrees. From each rotated image, a local patch of size 15 × 15 pixels was extracted, centered on a pixel with a non-zero value to ensure the region contained part of a digit.
The second training dataset consisted of 6000 samples selected in the same manner, but without applying any rotation. In both cases, the local patches were cropped from the original 28 × 28 images.
The test set consisted of 10,000 local 15 × 15 patches extracted from the MNIST test set. Each test image was randomly rotated between 0 and 360 degrees. Importantly, in all cases, only the center pixel of each 15 × 15 patch was used for label prediction. For the SOAP model, this means that a SOAP spectrum was computed exclusively at the center of each local image region, corresponding to the pixel to be classified.
SOAP descriptors were parameterized as in Experiment 1: r cut = 42 , n max = 5 , l max = 10 , and σ p = 2 .
For the CNN baseline, we used a standard architecture comprising two convolutional layers followed by fully connected layers. The network consisted of: a convolutional layer with 16 filters of size 3 × 3 , followed by ReLU activation and 2 × 2 max pooling; a second convolutional layer with 32 filters of size 3 × 3 , again followed by ReLU activation and 2 × 2 max pooling; a flattening step; a fully connected layer with 64 hidden units and ReLU activation; and finally, a 10-way softmax output layer for classification. This network operates directly on the 15 × 15 image patches.
Training was performed with an 80:20 train–validation split, using a batch size of 2048, a learning rate of 0.0001, and for 1000 epochs. Testing was conducted using the model that achieved the best validation accuracy. Results were averaged over 10 independent runs to reduce variance. Performance was evaluated by incrementally increasing the training set size from 100 to 6000 samples, in steps of 100.

4.5.3. Results

Figure 21 shows the test accuracy curves for both the SOAP and CNN models under the two training conditions: (a) training without rotation augmentation, and (b) training with rotation augmentation. Each point on the curve represents the average test accuracy over 10 independent runs.

4.5.4. Discussion

The results clearly show that when the training set does not include rotation augmentation, the SOAP model significantly outperforms the CNN model on rotated test images. This supports the idea that SOAP inherently encodes rotational invariance, reducing the need for explicit augmentation.
However, when both models are trained with rotation-augmented data, their performance becomes comparable for training sizes larger than approximately 1000 samples. This demonstrates that CNNs can learn rotational robustness with sufficient data and augmentation, although at the cost of requiring more training samples and additional data augmentation.
While this experiment used simple geometric augmentation (in-plane rotation) on a relatively clean dataset, in real-world applications such transformations are often more complex and difficult to engineer. In such cases, the invariance properties of SOAP could offer practical advantages. That said, the computational cost of SOAP—including the conversion of each local patch into a 3D point cloud and the evaluation of the descriptor—must be taken into account (see Appendix A).

5. Future Work

While this study has demonstrated the potential of SOAP-based descriptors for pixel-wise classification as a benchmark, several extensions and improvements can be explored in future work. One intriguing direction is the adaptation of SOAP for RGB images rather than gray-scale. Since SOAP includes species as a hyperparameter, different channels of an RGB image could be encoded using distinct species. For instance, one could draw an analogy by assigning the red, green, and blue channels to chemical species such as hydrogen (H), helium (He), and lithium (Li), respectively. This approach may introduce a richer feature space by allowing inter-channel interactions to be represented in a way similar to multi-species atomic environments.
Another key limitation of SOAP is its inherent invariance to symmetry transformations, which may discard crucial orientation-dependent information. To address this, a strategy of forced symmetry breaking could be employed. One possible method is to introduce auxiliary points near each pixel, such as a structured line below a handwritten digit, to provide directional context. This additional information could help encode spatial orientation, enabling the descriptors to retain some asymmetry where needed.
Beyond pixel-wise classification, future work could explore leveraging SOAP vectors to construct global representations for entire images. For example, one could compute an aggregate representation by averaging SOAP vectors across all pixels in an image, creating a holistic descriptor that remains invariant yet captures key structural patterns. Alternatively, more sophisticated approaches such as graph neural networks could be applied to learn higher-order relationships between SOAP descriptors, potentially enhancing performance in global classification tasks.
Additionally, in this study, we utilized the SOAP power spectrum, which provides a robust yet relatively compact representation of local environments. However, SOAP also offers a more expressive addition known as the bispectrum, which retains higher-order structural correlations and can encode more intricate geometric details. Future work could investigate whether incorporating the SOAP bispectrum leads to improved classification performance, particularly in tasks where capturing finer structural nuances is critical.
Finally, another potential avenue is the direct application of SOAP-based descriptors to point cloud classification tasks. Given that SOAP was originally designed for atomic-scale modeling, its extension to three-dimensional point clouds in computer vision could be a natural progression. This could involve adapting SOAP to tasks such as 3D object recognition, scene reconstruction, or LiDAR data analysis, where local geometric structures play a crucial role in classification.
These directions illustrate the versatility of SOAP-based feature extraction and open up exciting possibilities for extending its applications beyond gray-scale image classification to more complex and structured data representations.

6. Conclusions

In this work, we have demonstrated how the Smooth Overlap of Atomic Positions (SOAP), originally developed for atomic-scale modeling in chemistry and materials science, can be adapted to extract pixel-wise descriptors for images. By viewing each pixel as a local “environment” and lifting 2D image data into 3D space, we obtain SOAP vectors that capture rich local structure while maintaining invariance to translation, rotation, and mirror symmetry. One of the primary strengths of this method is that it obviates the need for extensive data augmentation for these transformations, allowing us to train models effectively without having to create or include rotated, translated, or mirrored variants of the input images. However, if mirror-flipping information (or other orientation-dependent cues) is intrinsically relevant to the classification task, SOAP’s invariant nature can become a limitation, since it effectively discards such distinguishing orientation-specific features, such as distinguishing between the numbers 6 and 9.
Our experiments on MNIST show that careful tuning of SOAP hyperparameters, especially the cutoff radius, is critical for optimal classification performance. Furthermore, we have illustrated the high compressibility of SOAP features via PCA and autoencoders, reducing dimensionality without significantly degrading predictive accuracy. We also investigated the robustness of SOAP-based descriptors to positional noise. Perturbing the pixel coordinates with Gaussian noise revealed a smooth decline in accuracy, confirming that SOAP gracefully handles moderate spatial uncertainties. This resilience is valuable for real-world datasets where image acquisition or labeling may be imperfect.
A major strength of this approach is its general applicability to any set of data points that can be projected into 3D space. Beyond images, the same pipeline can potentially be readily applied to diverse domains such as 3D object recognition, geospatial data analysis, or even higher-dimensional biomedical images where pixel or voxel intensities can be mapped into spatial coordinates. By combining inherent invariance, robust local feature encoding, and flexible dimensionality reduction, SOAP-based descriptors provide a framework for learning tasks that rely on capturing local patterns in a manner invariant to common image transformations. The results presented here open a potential avenue for future work in computer vision and related fields, where the capacity to incorporate sophisticated descriptors from quantum chemistry can lead to robust, efficient, and interpretable representations.

Author Contributions

Conceptualization, E.V.M. and Y.O.; Methodology, E.V.M.; Software, E.V.M.; Writing—original draft, E.V.M.; Writing—review & editing, M.H. and H.T.; Supervision, Y.O., M.H. and H.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Tables for the encoding dimensions vs. MSE losses for experiment 2 can be found at [29].

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A. Example of C’s

In this appendix, we will derive the close form of SOAP with spherical harmonics as spherical basis functions, and Gaussian Orbital Type (GTO) functions as radial basis functions [24] (See Figure A1).
Spherical harmonics is defined as:
Y l m ( θ , ϕ ) = ( 1 ) m ( 2 l + 1 ) 4 π ( l m ) ! ( l + m ) ! P l m ( cos θ ) e i m ϕ ,
where l m l and P l m ( x ) is the associated Legendre polynomials.
GTO basis function is defined as:
g n l ( r ) = b = 1 N b β l b n r l e α b l r 2 ,
where α b l s are hyperparameters that need to be designed, and β l b n s are arthonomalization constants, which can be obtained as:
β l n n = S l 1 / 2 ,
where
S l n n = 0 r 2 l e ( α l n + α l n ) r 2 d r
= 1 2 ( α l n + α l n ) ( 2 l + 1 ) / 2 Γ 2 l + 1 2 ,
is the overlap matrix, where Γ is the gamma function.
By using the density function:
ρ ( r ) = p = 1 N p e | r R p | 2 2 σ p 2 ,
where R p = x p 2 + y p 2 + z p 2 , we can obtain a closed form of the coefficients in Equation (4) by integration:
c n l m = λ l m ( 1 ) m 2 π σ p 2 3 b = 1 N b β l b n 1 + 2 α l b σ p 2 2 l + 3 p = 1 N p e α l b 1 + 2 α l b σ p 2 R p 2 ( x p + i y p ) m R p l m k = m l ξ l m k z p k m R p m k ,
where
λ l m = 2 l ( 2 l + 1 ) ( l m ) ! 4 π ( l + m ) ! ,
ξ l m k = l + k 1 2 ! ( k m ) ! ( l k ) ! ( l + k 1 2 l ) ! .
When l + k = even, ξ l m k = 0 . α b l is a hyperparameter that depends on the r cut and a design choice. How the parameters are chosen in the Dscribe [6,24] package is shown in Algorithm A1 (Alternatively, there are other packages that use different basis functions such as QUIP [7,30]).
The computational complexity of the SOAP descriptor, as implemented in dscribe, can be expressed as O ( N p L max N b 2 N s ) . Here, N p is the number of evaluated points (e.g., pixels or atomic sites), N b is the number of radial basis functions, L max is the maximum angular momentum quantum number (which determines the number of spherical harmonics), and N s is the number of chemical species present in the system. In our case, since the input consists of gray-scale MNIST images and only a single type of pixel intensity is treated as a “species”, we set N s = 1 .
The quadratic dependence on N b and linear dependence on L max make hyperparameter tuning particularly important, as these terms directly influence both the expressiveness and computational cost of the descriptor. While efficient, the SOAP calculation remains significantly more expensive than pretrained CNN embeddings, especially when scaled to high-resolution images or large datasets.
Figure A1. The SOAP algorithm takes a Gaussian-smeared representation of points and computes a rotational, translational, and mirror-symmetric invariant vector P i (SOAP power spectrum) at a given reference point, typically located at an existing data point. This SOAP vector encodes the structural environment surrounding the reference point.
Figure A1. The SOAP algorithm takes a Gaussian-smeared representation of points and computes a rotational, translational, and mirror-symmetric invariant vector P i (SOAP power spectrum) at a given reference point, typically located at an existing data point. This SOAP vector encodes the structural environment surrounding the reference point.
Appliedmath 05 00072 g0a1
Algorithm A1 GetBasisGTO ( r cut , n max , l max )
Inputs: 
●  r cut R + : The radial cutoff distance.
 
●  n max N : The number of GTO radial basis functions.
 
●  l max N : The maximum angular momentum quantum number.
Outputs: 
●  { α l , i } : A ( l max + 1 ) × n max array of radial decay exponents.
 
●  { β l , i , j } : A ( l max + 1 ) × n max × n max array of Löwdin-orthonormalization factors.
 
 
1:
function GetBasisGTO( r cut , n max , l max )
2:
     threshold 10 3      ▹ Fixed decay threshold for the Gaussian functions.
3:
    Initialize the array { a i } i = 1 n max
4:
    for  i 1  to  n max  do
5:
         a i 1 + ( i 1 ) ( r cut 1 ) n max 1      ▹ Equally spaced radial points from 1 to r cut .
6:
    end for
7:
    Initialize  α l , i 0 for l = 0 , , l max and i = 1 , , n max
8:
    Initialize  β l , i , j 0 for l = 0 , , l max and i , j = 1 , , n max
9:
    for  l 0  to  l max  do
10:
        for  i 1  to  n max  do
11:
            α l , i ln threshold a i l a i 2        ▹ Choose α l , i so that
a i l exp α l , i a i 2 = threshold .
12:
        end for
13:
        Initialize the matrix M i , j  for  i , j = 1 , , n max
14:
        for  i 1  to  n max  do
15:
           for  j 1  to  n max  do
16:
                M i , j α l , i + α l , j
17:
           end for
18:
        end for
19:
        Initialize the matrix S i , j for i , j = 1 , , n max
20:
        for  i 1  to  n max  do
21:
           for  j 1  to  n max  do
22:
               
S i , j 0.5 Γ l + 3 2 M i , j l + 3 2 .
23:
           end for
24:
        end for
25:
        Compute the inverse S 1 of S ▹ Use any standard matrix inversion algorithm.
26:
        Compute β temp S 1 ▹ This denotes the matrix square root of S 1 (Löwdin orthonormalization).
27:
        if any entry of β temp is complex then
28:
           raise an error: “Could not calculate real-valued normalization factors.”
29:
        end if
30:
        for  i 1  to  n max  do
31:
           for  j 1  to  n max  do
32:
                β l , i , j β i , j temp
33:
           end for
34:
        end for
35:
    end for
36:
    return  { α l , i } , { β l , i , j }
37:
end function

Appendix B. Table of Variables

Table A1. List of variables for SOAP descriptor computation.
Table A1. List of variables for SOAP descriptor computation.
VariableType/DimensionDescription
{ F k } k = 1 n Collection of N k × 3 matrices3D structure.
F k N k × 3 matrixThe k-th 3D structure containing coordinates ( x , y , z ) of each 3D-pixel.
{ r cut , n max , l max , σ p } ScalarsParameters defining the SOAP descriptor computation.
P k N k × d matrixSOAP descriptors for each 3D-pixel in the k-th structure.
P o 1 × d vectorSOAP descriptor for the o-th 3D-pixel in structure F k .
kIntegerIndex for iterating over each structure ( 1 k n ).
oIntegerIndex for iterating over each 3D-pixel within a structure ( 1 o N k ).
{ P k } k = 1 n Collection of N k × d matrixOutput set of SOAP descriptors for all structures and their 3D structure.
Table A2. Variables for SOAP extraction with rescaling.
Table A2. Variables for SOAP extraction with rescaling.
VariableType/DimDescription
X T × d Final collection of extracted and rescaled SOAP descriptors, where in our case T = 1.2 × 10 5 for the training data and validation data, and T = 1 × 10 4 for the test data.
y TLabels for each row of X .
P t 1 × d A single descriptor randomly chosen from P r .
s R R pRobust Rescale Parameters for later use.

References

  1. Quiroga, F.; Ronchetti, F.; Lanzarini, L.; Bariviera, A.F. Revisiting data augmentation for rotational invariance in convolutional neural networks. In Modelling and Simulation in Management Sciences: Proceedings of the International Conference on Modelling and Simulation in Management Sciences (MS-18); Springer: Berlin/Heidelberg, Germany, 2020; pp. 127–141. [Google Scholar]
  2. Maharana, K.; Mondal, S.; Nemade, B. A review: Data pre-processing and data augmentation techniques. Glob. Transit. Proc. 2022, 3, 91–99. [Google Scholar] [CrossRef]
  3. Omae, Y.; Saito, Y.; Fukamachi, D.; Nagashima, K.; Okumura, Y.; Toyotani, J. Impact of chest radiograph image size and augmentation on estimating pulmonary artery wedge pressure by regression convolutional neural network. AIP Conf. Proc. 2023, 2872, 120065. [Google Scholar]
  4. Yoo, J.; Kang, S. Class-adaptive data augmentation for image classification. IEEE Access 2023, 11, 26393–26402. [Google Scholar] [CrossRef]
  5. Bartók, A.P.; Kondor, R.; Csányi, G. On representing chemical environments. Phys. Rev. B—Condens. Matter Mater. Phys. 2013, 87, 184115. [Google Scholar] [CrossRef]
  6. Himanen, L.; Jäger, M.O.J.; Morooka, E.V.; Canova, F.F.; Ranawat, Y.S.; Gao, D.Z.; Rinke, P.; Foster, A.S. DScribe: Library of descriptors for machine learning in materials science. Comput. Phys. Commun. 2020, 247, 106949. [Google Scholar] [CrossRef]
  7. Caro, M.A. Optimizing many-body atomic descriptors for enhanced computational performance of machine learning based interatomic potentials. Phys. Rev. B 2019, 100, 024112. [Google Scholar] [CrossRef]
  8. Jäger, M.O.J.; Morooka, E.V.; Federici Canova, F.; Himanen, L.; Foster, A.S. Machine learning hydrogen adsorption on nanoclusters through structural descriptors. Npj Comput. Mater. 2018, 4, 37. [Google Scholar] [CrossRef]
  9. De, S.; Bartók, A.P.; Csányi, G.; Ceriotti, M. Comparing molecules and solids across structural and alchemical space. Phys. Chem. Chem. Phys. 2016, 18, 13754–13769. [Google Scholar] [CrossRef]
  10. Caruso, C.; Cardellini, A.; Crippa, M.; Rapetti, D.; Pavan, G.M. TimeSOAP: Tracking high-dimensional fluctuations in complex molecular systems via time variations of SOAP spectra. J. Chem. Phys. 2023, 158, 21. [Google Scholar] [CrossRef]
  11. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
  12. Gewers, F.L.; Ferreira, G.R.; Arruda, H.F.D.; Silva, F.N.; Comin, C.H.; Amancio, D.R.; Costa, L.F. Principal component analysis: A natural approach to data exploration. ACM Comput. Surv. (CSUR) 2021, 54, 1–34. [Google Scholar] [CrossRef]
  13. Berahmand, K.; Daneshfar, F.; Salehi, E.S.; Li, Y.; Xu, Y. Autoencoders and their applications in machine learning: A survey. Artif. Intell. Rev. 2024, 57, 28. [Google Scholar] [CrossRef]
  14. Malik, J.S.; Hemani, A. Gaussian random number generation: A survey on hardware architectures. ACM Comput. Surv. (CSUR) 2016, 49, 1–37. [Google Scholar] [CrossRef]
  15. Tanner, M.A.; Wong, W.H. The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 1987, 82, 528–540. [Google Scholar] [CrossRef]
  16. Wei, L. Empirical Bayes test of regression coefficient in a multiple linear regression model. Acta Math. Appl. Sin. 1990, 6, 251–262. [Google Scholar] [CrossRef]
  17. Simard, P.Y.; Steinkraus, D.; Platt, J.C. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of the ICDAR, Edinburgh, UK, 6 August 2003. [Google Scholar]
  18. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  19. Elreedy, D.; Atiya, A.F. A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class imbalance. Inf. Sci. 2019, 505, 32–64. [Google Scholar] [CrossRef]
  20. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27. [Google Scholar]
  21. Bao, G.; Yan, B.; Tong, L.; Shu, J.; Wang, L.; Yang, K.; Zeng, Y. Data augmentation for EEG-based emotion recognition using generative adversarial networks. Front. Comput. Neurosci. 2021, 15, 723843. [Google Scholar] [CrossRef]
  22. Chen, L.; Li, Y.; Deng, X.; Liu, Z.; Lv, M.; Zhang, H. Dual auto-encoder GAN-based anomaly detection for industrial control system. Appl. Sci. 2022, 12, 4986. [Google Scholar] [CrossRef]
  23. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015), Munich, Germany, 5–9 October 2015; Springer: Munich, Germany, 2015; pp. 234–241. [Google Scholar]
  24. Laakso, J.; Himanen, L.; Homm, H.; Morooka, E.V.; Jäger, M.O.J.; Todorović, M.; Rinke, P. Updates to the DScribe library: New descriptors and derivatives. J. Chem. Phys. 2023, 158, 234802. [Google Scholar] [CrossRef] [PubMed]
  25. Shapiro, A. Monte Carlo sampling methods. In Handbooks in Operations Research and Management Science; Springer: Berlin/Heidelberg, Germany, 2003; Volume 10, pp. 353–425. [Google Scholar]
  26. Kingma, D.P. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  27. Kumar, V. Pruning Distorted Images in MNIST Handwritten Digits. arXiv 2023, arXiv:2307.14343. [Google Scholar]
  28. Hodson, T.O.; Over, T.M.; Foks, S.S. Mean squared error, deconstructed. J. Adv. Model. Earth Syst. 2021, 13, e2021MS002681. [Google Scholar] [CrossRef]
  29. Zenodo Dataset. Available online: https://zenodo.org/records/14916887 (accessed on 24 February 2025).
  30. Klawohn, S.; Darby, J.P.; Kermode, J.R.; Csányi, G.; Caro, M.A.; Bartók, A.P. Gaussian approximation potentials: Theory, software implementation and application examples. J. Chem. Phys. 2023, 159, 174108. [Google Scholar] [CrossRef]
Figure 1. In our study, we take gray-scale MNIST handwritten images as input and project them into 3D point clouds. These points are then processed using the SOAP algorithm to generate SOAP spectra—feature vectors that encode local environments while remaining invariant to rotation, translation, and mirror symmetry. Each vector can be labeled and used for classification or regression tasks with models such as feed-forward neural networks. Due to SOAP’s inherent symmetry invariance, data augmentation for rotation, translation, and flipping is unnecessary during training. Additionally, since some SOAP components are highly correlated, dimensionality reduction techniques such as autoencoding or PCA can be applied for compression.
Figure 1. In our study, we take gray-scale MNIST handwritten images as input and project them into 3D point clouds. These points are then processed using the SOAP algorithm to generate SOAP spectra—feature vectors that encode local environments while remaining invariant to rotation, translation, and mirror symmetry. Each vector can be labeled and used for classification or regression tasks with models such as feed-forward neural networks. Due to SOAP’s inherent symmetry invariance, data augmentation for rotation, translation, and flipping is unnecessary during training. Additionally, since some SOAP components are highly correlated, dimensionality reduction techniques such as autoencoding or PCA can be applied for compression.
Appliedmath 05 00072 g001
Figure 2. (a) Illustration of our approach for extracting features from a pixel. P o a and P o b represent independent SOAP vectors that encode local structural information up to a distance of r cut . (b) Example of a density function with two sample points. (a) A 10 by 10 image, with two SOAP spectra. The information contained in the SOAP spectra is determined solely by the length of r cut , a hyperparameter; information beyond that is not captured. (b) Example of a cross section of a density function, ρ o , with 2 points. P 1 = ( 0 , 0 , 1 ) and P 2 = ( 1 , 1 , 1 ) with σ p = 0.5 .
Figure 2. (a) Illustration of our approach for extracting features from a pixel. P o a and P o b represent independent SOAP vectors that encode local structural information up to a distance of r cut . (b) Example of a density function with two sample points. (a) A 10 by 10 image, with two SOAP spectra. The information contained in the SOAP spectra is determined solely by the length of r cut , a hyperparameter; information beyond that is not captured. (b) Example of a cross section of a density function, ρ o , with 2 points. P 1 = ( 0 , 0 , 1 ) and P 2 = ( 1 , 1 , 1 ) with σ p = 0.5 .
Appliedmath 05 00072 g002
Figure 3. Example of a cross section of a spatial basis function, using spherical harmonics and GTO radial basis function. (a) Real Φ 053 with a cross section at z = 1.0 . (b) Imag Φ 053 with a cross section at z = 1.0 .
Figure 3. Example of a cross section of a spatial basis function, using spherical harmonics and GTO radial basis function. (a) Real Φ 053 with a cross section at z = 1.0 . (b) Imag Φ 053 with a cross section at z = 1.0 .
Appliedmath 05 00072 g003
Figure 4. Example of radial basis functions with r cut = 10 .
Figure 4. Example of radial basis functions with r cut = 10 .
Appliedmath 05 00072 g004
Figure 5. Spherical harmonics as angular basis functions. (a) Real Y 53 with a cross section at z = 1.0 . (b) Imag Y 53 with a cross section at z = 1.0 .
Figure 5. Spherical harmonics as angular basis functions. (a) Real Y 53 with a cross section at z = 1.0 . (b) Imag Y 53 with a cross section at z = 1.0 .
Appliedmath 05 00072 g005
Figure 6. Example of a cross section of an integrand ρ o × Φ 053 o , using the density function in Figure 2b. (a) Real ρ o Φ 053 o with a cross section at z = 1.0 . (b) Imag ρ o Φ 053 o with a cross section at z = 1.0 .
Figure 6. Example of a cross section of an integrand ρ o × Φ 053 o , using the density function in Figure 2b. (a) Real ρ o Φ 053 o with a cross section at z = 1.0 . (b) Imag ρ o Φ 053 o with a cross section at z = 1.0 .
Appliedmath 05 00072 g006
Figure 7. Example of a projection of a 2D image to a 3D structure.
Figure 7. Example of a projection of a 2D image to a 3D structure.
Appliedmath 05 00072 g007
Figure 8. Pixel-wise prediction model used for all experiments. ReLU activation functions, and Dropout (0.1) was used for the hidden layers.
Figure 8. Pixel-wise prediction model used for all experiments. ReLU activation functions, and Dropout (0.1) was used for the hidden layers.
Appliedmath 05 00072 g008
Figure 9. Confusion matrix of 10,000 randomly selected test set from the MNIST handwritten data, normalized by each row. Accuracy: 0.6863, Recall: 0.6863, Precision: 0.6821, F1 Score: 0.6832. It can be seen, for example, that predicting between 6 and 9 is particularly hard, because SOAP cannot distinguish between symmetries.
Figure 9. Confusion matrix of 10,000 randomly selected test set from the MNIST handwritten data, normalized by each row. Accuracy: 0.6863, Recall: 0.6863, Precision: 0.6821, F1 Score: 0.6832. It can be seen, for example, that predicting between 6 and 9 is particularly hard, because SOAP cannot distinguish between symmetries.
Appliedmath 05 00072 g009
Figure 10. Comparison of various scatter plots showing relationships between different parameters ( n max , l max , r cut , and σ p ) and their effect on validation accuracy. Each plot visualizes one pair of parameters, with color indicating the validation accuracy achieved. r cut is the most important parameter, while σ p tends to do well between 2 and 5. (a) Scatter plot of n max vs. l max colored by validation accuracy. (b) Scatter plot of r cut vs. l max colored by validation accuracy. (c) Scatter plot of r cut vs. n max colored by validation accuracy. (d) Scatter plot of σ p vs. l max colored by validation accuracy. (e) Scatter plot of σ p vs. n max colored by validation accuracy. (f) Scatter plot of σ p vs. r cut colored by validation accuracy.
Figure 10. Comparison of various scatter plots showing relationships between different parameters ( n max , l max , r cut , and σ p ) and their effect on validation accuracy. Each plot visualizes one pair of parameters, with color indicating the validation accuracy achieved. r cut is the most important parameter, while σ p tends to do well between 2 and 5. (a) Scatter plot of n max vs. l max colored by validation accuracy. (b) Scatter plot of r cut vs. l max colored by validation accuracy. (c) Scatter plot of r cut vs. n max colored by validation accuracy. (d) Scatter plot of σ p vs. l max colored by validation accuracy. (e) Scatter plot of σ p vs. n max colored by validation accuracy. (f) Scatter plot of σ p vs. r cut colored by validation accuracy.
Appliedmath 05 00072 g010
Figure 11. Predictions on the validation set for easy-to-classify shapes. For clear and unambiguous shapes, the model is very accurate. (a) Handwritten 0. (b) Handwritten 1. (c) Handwritten 2. (d) Handwritten 3. (e) Handwritten 4. (f) Handwritten 5. (g) Handwritten 6. (h) Handwritten 7. (i) Handwritten 8. (j) Handwritten 9.
Figure 11. Predictions on the validation set for easy-to-classify shapes. For clear and unambiguous shapes, the model is very accurate. (a) Handwritten 0. (b) Handwritten 1. (c) Handwritten 2. (d) Handwritten 3. (e) Handwritten 4. (f) Handwritten 5. (g) Handwritten 6. (h) Handwritten 7. (i) Handwritten 8. (j) Handwritten 9.
Appliedmath 05 00072 g011aAppliedmath 05 00072 g011b
Figure 12. Because SOAP cannot distinguish between certain symmetries, the model misclassifies the rotated and flipped 7 as a 4. (a) Handwritten 7. (b) The same image as (a), but rotated clockwise 90 degrees and mirror flipped on the x-axis.
Figure 12. Because SOAP cannot distinguish between certain symmetries, the model misclassifies the rotated and flipped 7 as a 4. (a) Handwritten 7. (b) The same image as (a), but rotated clockwise 90 degrees and mirror flipped on the x-axis.
Appliedmath 05 00072 g012
Figure 13. (a) A simple handwritten 3. (b) A 3D projection of the pixel data, illustrating that points far from the primary shape are predicted less accurately. (a) Handwritten 3. (b) 3D projection of (a).
Figure 13. (a) A simple handwritten 3. (b) A 3D projection of the pixel data, illustrating that points far from the primary shape are predicted less accurately. (a) Handwritten 3. (b) 3D projection of (a).
Appliedmath 05 00072 g013
Figure 14. Examples of highly ambiguous handwritten digits (4 and 9). Even human observers may find these shapes confusing. (a) Ambiguous handwritten 4. (b) Ambiguous handwritten 9.
Figure 14. Examples of highly ambiguous handwritten digits (4 and 9). Even human observers may find these shapes confusing. (a) Ambiguous handwritten 4. (b) Ambiguous handwritten 9.
Appliedmath 05 00072 g014
Figure 15. Our linear autoencoder–decoder Model.
Figure 15. Our linear autoencoder–decoder Model.
Appliedmath 05 00072 g015
Figure 16. Our deep autoencoder–decoder model.
Figure 16. Our deep autoencoder–decoder model.
Appliedmath 05 00072 g016
Figure 17. Correlation matrix of the SOAP vectors for the 120,000 samples. Many of the elements are correlated, which suggests that it is highly compressible.
Figure 17. Correlation matrix of the SOAP vectors for the 120,000 samples. Many of the elements are correlated, which suggests that it is highly compressible.
Appliedmath 05 00072 g017
Figure 18. PCA dominates the MSE accuracy from 308 until around 200, then PCA and linear model become identical. Below around 175, deep autoencoding becomes more accurate, and there is not much difference between h m = 2 , 4 and 10. (a) Encoding MSE from 0 to 308, linear scale. (b) Encoding MSE from 0 to 308, logarithmic scale. (c) Encoding MSE from 0 to 100, logarithmic scale. (d) Encoding MSE from 0 to 200, logarithmic scale.
Figure 18. PCA dominates the MSE accuracy from 308 until around 200, then PCA and linear model become identical. Below around 175, deep autoencoding becomes more accurate, and there is not much difference between h m = 2 , 4 and 10. (a) Encoding MSE from 0 to 308, linear scale. (b) Encoding MSE from 0 to 308, logarithmic scale. (c) Encoding MSE from 0 to 100, logarithmic scale. (d) Encoding MSE from 0 to 200, logarithmic scale.
Appliedmath 05 00072 g018
Figure 19. The linear model and deep model both perform similarly, with PCA giving slightly better accuracy over h e = 50 , and deep autoencoding giving slightly better accuracy over h e = 50 . (a) Test accuracy for compression dimension between 0 and 308 with the linear model (PCA) and deep autoencoding ( h m = 2 ). (b) Test accuracy for compression dimension between 0 and 50 with linear model (PCA) and deep autoencoding ( h m = 2 ).
Figure 19. The linear model and deep model both perform similarly, with PCA giving slightly better accuracy over h e = 50 , and deep autoencoding giving slightly better accuracy over h e = 50 . (a) Test accuracy for compression dimension between 0 and 308 with the linear model (PCA) and deep autoencoding ( h m = 2 ). (b) Test accuracy for compression dimension between 0 and 50 with linear model (PCA) and deep autoencoding ( h m = 2 ).
Appliedmath 05 00072 g019
Figure 20. Validation accuracy with noise.
Figure 20. Validation accuracy with noise.
Appliedmath 05 00072 g020
Figure 21. Comparison of SOAP and CNN test accuracy under rotation. Each training image consists of a 15 × 15 local patch extracted from MNIST digits. All test images are randomly rotated. Classification is performed on the center pixel of each patch. Training sizes range from 100 to 6000 samples, in steps of 100. Each point represents the average over 10 training runs. (a) Test accuracy using non-augmented training data; test images are randomly rotated. (b) Test accuracy using both rotated training and test data.
Figure 21. Comparison of SOAP and CNN test accuracy under rotation. Each training image consists of a 15 × 15 local patch extracted from MNIST digits. All test images are randomly rotated. Classification is performed on the center pixel of each patch. Training sizes range from 100 to 6000 samples, in steps of 100. Each point represents the average over 10 training runs. (a) Test accuracy using non-augmented training data; test images are randomly rotated. (b) Test accuracy using both rotated training and test data.
Appliedmath 05 00072 g021
Table 1. Test values for the optimal hyperparameters found by Monte Carlo search using validation accuracy as benchmark. The parameters found are r cut = 63 , n max = 7 , l max = 10 , σ p = 3 . The validation accuracy yielded 0.6844 while test accuracy yielded 0.6863.
Table 1. Test values for the optimal hyperparameters found by Monte Carlo search using validation accuracy as benchmark. The parameters found are r cut = 63 , n max = 7 , l max = 10 , σ p = 3 . The validation accuracy yielded 0.6844 while test accuracy yielded 0.6863.
AccuracyRecallPrecisionF1 Score
0.68630.68630.68210.6832
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Morooka, E.V.; Omae, Y.; Hämäläinen, M.; Takahashi, H. Benchmarking Point Cloud Feature Extraction with Smooth Overlap of Atomic Positions (SOAP): A Pixel-Wise Approach for MNIST Handwritten Data. AppliedMath 2025, 5, 72. https://doi.org/10.3390/appliedmath5020072

AMA Style

Morooka EV, Omae Y, Hämäläinen M, Takahashi H. Benchmarking Point Cloud Feature Extraction with Smooth Overlap of Atomic Positions (SOAP): A Pixel-Wise Approach for MNIST Handwritten Data. AppliedMath. 2025; 5(2):72. https://doi.org/10.3390/appliedmath5020072

Chicago/Turabian Style

Morooka, Eiaki V., Yuto Omae, Mika Hämäläinen, and Hirotaka Takahashi. 2025. "Benchmarking Point Cloud Feature Extraction with Smooth Overlap of Atomic Positions (SOAP): A Pixel-Wise Approach for MNIST Handwritten Data" AppliedMath 5, no. 2: 72. https://doi.org/10.3390/appliedmath5020072

APA Style

Morooka, E. V., Omae, Y., Hämäläinen, M., & Takahashi, H. (2025). Benchmarking Point Cloud Feature Extraction with Smooth Overlap of Atomic Positions (SOAP): A Pixel-Wise Approach for MNIST Handwritten Data. AppliedMath, 5(2), 72. https://doi.org/10.3390/appliedmath5020072

Article Metrics

Back to TopTop