Mathematical Approaches for the Characterization and Analysis of Molecular Markers in the Study of the Progression and Severity of Amyotrophic Lateral Sclerosis

Carracciuolo, Luisa; D’Amora, Ugo; Dubbioso, Raffaele; Fasolino, Ines

doi:10.3390/appliedmath6020022

Open AccessArticle

Mathematical Approaches for the Characterization and Analysis of Molecular Markers in the Study of the Progression and Severity of Amyotrophic Lateral Sclerosis

¹

Institute of Polymers, Composites and Biomaterials (IPCB) of the National Research Council (CNR), 80078 Pozzuoli, Italy

²

Department of Neurosciences, Reproductive Sciences and Odontostomatology, University of Naples Federico II, 80131 Naples, Italy

^*

Author to whom correspondence should be addressed.

AppliedMath 2026, 6(2), 22; https://doi.org/10.3390/appliedmath6020022

Submission received: 9 December 2025 / Revised: 28 January 2026 / Accepted: 29 January 2026 / Published: 5 February 2026

Download

Browse Figures

Versions Notes

Abstract

Amyotrophic Lateral Sclerosis (ALS) is a progressive neurodegenerative disorder for which despite its severity, no validated biomarker currently exists to support early diagnosis, limiting therapeutic effectiveness and patient survival. In this context, mathematical modeling therefore becomes essential: it allows us to maximize the information obtainable from a limited number of samples, identify patterns that may not be directly observable, and estimate the relative contribution of different molecular markers to ALS progression. In this work, we propose methods for qualitatively and quantitatively evaluating the relevance of selected biomarkers in ALS classification and disease-state identification and laying the foundations for the definition of a protocol useful for constructing “digital twins” of the entire process of study, diagnosis, and treatment of the disease from the perspective of innovative precision medicine.

Keywords:

Amyotrophic Lateral Sclerosis; molecular markers; Principal Components Analysis; Multiple Imputation by Chained Equations; deep neural networks

1. Introduction

Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive adult-onset neurodegenerative disorder that primarily targets upper and lower motor neurons in the cerebral cortex and spinal cord [1]. Despite its severity, no validated biomarker currently exists to support early diagnosis, limiting therapeutic effectiveness and patient survival. The predominantly sporadic nature of ALS, with only 5–10% of cases being hereditary, further complicates early detection. On average, life expectancy is approximately five years from symptom onset [2]. Although traditionally considered a “pure” motor disease, ALS also involves sensory and autonomic dysfunction, as demonstrated in punch skin biopsy studies [3,4]. Reductions in intraepidermal nerve fiber density have been reported in roughly 80% of patients, highlighting peripheral nervous system involvement [5,6]. Emerging evidence also implicates neuroinflammation, characterized by activated microglia, astrogliosis, and immune infiltration, in the spinal cord and peripheral nerves of both ALS patients and transgenic mouse models [7].

In this context, identifying early, reliable biomarkers is crucial to reducing diagnostic delays. One promising candidate pathway is the NOD-, LRR- and pyrin domain-containing protein 3 (NLRP-3) inflammasome. Although NLRP-3 activation has been extensively described in other neurodegenerative diseases, its role in ALS has only recently gained attention [8]. Pathological studies show NLRP-3 activation in the brain and spinal cord of SOD1G93A mice and in human ALS tissue [9]. In SOD1 models, increased NLRP-3 correlates with active caspase-1, elevated interlecukin-18 (IL-18) and IL-1β [10]. Upregulation of microglial NLRP-3 has also been observed in TDP-43Q331K mice, and in vitro studies demonstrate that TDP-43 can trigger microglial NLRP3-mediated inflammation toxic to motor neurons [11]. Building on this rationale, the present work proposes the application of mathematical tools capable of extracting relevant insights from molecular data. A comprehensive biological characterization of ALS would ideally require the collection of a large number of skin biopsies and blood samples across multiple disease stages. However, such extensive sampling is often impractical due to the invasive nature of biopsies, patient burden, and clinical resource constraints. Mathematical modeling therefore becomes essential: it allows us to maximize the information obtainable from a limited number of samples, identify patterns that may not be directly observable, and estimate the relative contribution of different molecular markers to ALS progression.

Within this framework, we developed methods for qualitatively and quantitatively evaluating the relevance of selected biomarkers in ALS classification and disease-state identification. To generate the molecular dataset required for model training and validation, we used human skin biopsy and blood serum as minimally invasive tools to investigate the pathophysiological mechanisms underlying peripheral nerve degeneration in ALS. Specifically, we quantified inflammatory responses, such as NLRP-3 inflammasome activation and cytokine production, alongside oxidative stress indicators (Superoxide Dismutase, SOD expression) and pathological phospho-TDP-43 accumulation. Skin and blood samples from ALS patients and healthy controls were processed using complementary analytical approaches, including Western blotting, ELISA, and fluorescence imaging.

By integrating these molecular measurements with mathematical analysis, our approach enables a more robust interpretation of limited clinical data and supports the identification of biomarker patterns that could be used in future diagnostic applications.

In particular, the novelty of our work, to the best of our knowledge regarding the current state-of-the-art, lies in an interdisciplinary approach that combines biochemical data from ALS patients with advanced computational techniques—such as Multiple Imputation Methods (MIMs), Principal Component Analysis (PCA), and neural networks (NNs)—to

Integrate MIM, PCA, and NN within the ALS biomedical context;
Use PCA and BiPlots, and NN for biomarker relevance interpretation by also using some simple techniques to open NN “black-box”;
Combine a mathematical–biological workflow as a foundation for future “digital twin” construction.

To the best of our knowledge, this study is the first to formally combine mathematically interpretable multivariate modeling with biochemically validated correlations among oxidative, neuroinflammatory, and neurodegenerative biomarkers of ALS across both serum and peripheral tissues. Previous studies have shown that peripheral cutaneous nerve degeneration can mirror neurodegenerative mechanisms occurring in the central nervous system [5]. However, unlike earlier investigations, we specifically quantified protein biomarkers within peripherally innervated tissue and directly compared them with circulating serum markers. This integrative approach lays the groundwork for the future development of minimally invasive strategies (such as skin biosensors) aimed at detecting ALS progression and disease severity through cutaneous, peripherally innervated tissue.

The paper is structured as follows: Section 2 describes the biochemical analysis and the mathematical framework used for data analysis; Section 3 presents case-study results illustrating the application of these methods; and Section 4 discusses the implications of these findings, summarizing the study’s contributions and outlining future research directions in Section 5.

2. Materials and Methods

2.1. Collection of Skin Biopsies and Serum

Sixty patients diagnosed with ALS within eighteen months from symptom onset and classified as “definite”, “probable” or “probable laboratory-supported” according to the revised El Escorial criteria were enrolled at the ALS Center of Federico II University, along with thirty-two healthy controls selected from the patients’ caregivers. Individuals under the age of eighteen and those with coexisting conditions that could affect the peripheral nervous system (e.g., glucose intolerance, diabetes, thyroid disorders, vitamin deficiencies) were excluded.

This study was designed as an exploratory analysis to investigate biochemical alterations in skin biopsies and serum, which required highly specific inclusion criteria and standardized sampling procedures, as reported in similar studies [12], providing sufficient statistical power for the primary outcomes.

2.2. Biochemical Assays

Skin biopsies and blood samples were collected from the sixty ALS patients to assess the expression levels of inflammasome pathway components in comparison to healthy controls. Specifically, two 3 mm punch biopsies were taken at baseline (T0) from the thigh, leg, and fingertip on the side most affected by the disease. Samples were fixed in cold Zamboni’s solution (Sigma Aldrich, Milan, Italy) and cryoprotected in 20% sucrose phosphate-buffered saline (PBS, Sigma Aldrich, Milan, Italy). Then, skin biopsies were homogenized for Western blot, ELISA, and SOD assays. Blood was drawn into tubes containing anticoagulant and centrifuged (1000–2000× g for 10 min in a refrigerated centrifuge) to obtain serum samples.

2.3. Mathematical Tools

Since the 1960s, figures such as John Wilder Tukey (the inventor, along with J. W. Cooley, of the FFT algorithm, one of the most elegant, beautiful, and powerful in the history of mathematics) argued that data analysis could be considered a discipline that could also be independent of mathematics. David Donoho, who was Tukey’s student and is the author of some works related to the present and future of Data Science [13,14], in a position lecture presented at AMS Math Challenges 2020 [15], summarized Tukey’s position as follows:

Data are here now; they will be coming on more and more in the future. We must analyze them, often using very humble means, and insistence on mathematics—for example, on deep mathematical analysis and proof—will likely distract us from recognizing these fundamental points.

But, in the same lecture, Donoho states

… while over the last forty years, data analysis has developed by separating itself from its mathematical parent, there is now need for a reconnection, which might yield benefits both for mathematics and for data analysis.

and

We are now in a setting where many very important data analysis problems are high-dimensional. Many of these high-dimensional data analysis problems require new or different mathematics. A central issue is the “curse of dimensionality”, which has ubiquitous effects through out the sciences.

Donoho well expressed the idea that, given the complexity of data, effective and efficient data analysis cannot (no longer) ignore a robust knowledge of mathematical approaches (both old and new).

Following this section, mathematical tools or approaches will be shortly introduced as a strategy to solve problems very relevant in the context of clinical research related to imputation of missing data, data dimensionality reduction, and building predictive models from data.

Clinical research frequently has missing data. When the values of the variables of interest are not measured or documented, this is known as missing data. A patient’s unwillingness to answer a particular question, a patient’s loss to follow-up, an investigator or mechanical error, or a doctor’s failure to order certain investigations for a patient are some of the reasons why data may be missing [16,17,18,19]. Replacing the missing values of variables with reasonable values is one way to get around the limits of a full case analysis. Because one is imputing a value of the variable for those individuals with missing data on that variable, this method is known as “imputation” [20].

Large-scale patient data, genomic research, and medical imaging produce enormous volumes of high-dimensional data in healthcare research that need to be processed effectively. These datasets become computationally costly and difficult to analyze in the absence of appropriate dimensionality reduction. The term “dimensionality reduction” describes methods for generating a set of principal variables by reducing the high dimensional space of the original data into a low dimensional space. Healthcare researchers can improve disease prediction models and facilitate more precise clinical decision-making by extracting important features from complicated datasets through dimensionality reduction. Dimensionality reduction, for instance, aids in precision medicine by identifying important genetic markers linked to diseases in genomic investigations. Principal Component Analysis (PCA) is the best statistical method to reduce dimensionality and remove redundancy if every feature is equally relevant to the investigation [21]. Building a predictive model from data [22,23] has once again become a necessary approach, given the large amount of data available from which to extract knowledge. Techniques for creating these models, particularly those based on machine learning techniques, are quickly becoming crucial for the interpretation of intricate clinical and other data as well as for clinical decision support. The link between input and output signals of a complex system can be approximated using neural networks (NNs), which are highly parameterized, non-linear models with collections of processing units known as neurons. NNs are sometimes referred to as say black boxes, even though they can be more effective predictors than more traditional models (like linear regression). NN models are much harder to interpret than linear techniques, and it can be difficult to determine which descriptors (predictors) are most significant and how they relate to the property being modeled [24].

In the following part of this section, some methods will be proposed and briefly described. They could be used to solve the aforementioned problems afflicting clinical research based on data.

2.3.1. Handling Incomplete Data

As mentioned above, if the problem of missing data is very widespread in the field of medical research it is also true that it is a problem that affects many other contexts: see, for example, the long list available in [25] in which, in addition to the long list of problems related to medicine and health, applications related to management sciences, politics, psychology and sociology can also be highlighted.

For all these listed applications, a multiple imputation approach for a missing data problem is proposed since it has a proven effectiveness in multiple contests [17]. In particular, some form of the so called “chained equation approach” is proposed.

Here, we describe, and use, a version of that approach which is known as “Multivariate Imputation by Chained Equations (MICE)” [20,25,26]. Given the variables used in the imputation process, MICE operates on the assumption that missing data are “Missing At Random (MAR)”, meaning that the probability that a value is missing depends only on seen values and not on unobserved values [26]. Although using MICE when data are not MAR may lead to biased results, some research indicates that, even in these circumstances, MICE produces fewer biased estimates than naive approaches to managing the same censored values [17].

Definition 1

(The Multivariate Imputation by Chained Equations (MICE) method). Let Y be a multivariate variable based on p univariate variables among which k are incomplete. Such variable Y is represented by n observations.

Then,

Y = (Y_{1}, \dots, Y_{k}, {\bar{Y}}_{1}, \dots, {\bar{Y}}_{p - k})

where

\{Y_{j}, j = 1, \dots, k\}

denotes the set of the incomplete variables and

\{{\bar{Y}}_{j}, j = 1, \dots, p - k\}

denotes the set of

p - k

complete variable. Let the observed and missing parts of

Y_{j}

be denoted by

Y_{j}^{o b s}

and

Y_{j}^{m i s}

, respectively, so

Y^{o b s} = (Y_{1}^{o b s}, \dots, Y_{k}^{o b s}, {\bar{Y}}_{1}, \dots, {\bar{Y}}_{p - k})

and

Y^{m i s} = (Y_{1}^{m i s}, \dots, Y_{k}^{m i s})

stand for the observed and missing data in Y. Let

Y_{- j} = (Y_{1}, \dots, Y_{j - 1}, Y_{j + 1}, \dots Y_{k}, {\bar{Y}}_{1}, \dots, {\bar{Y}}_{p - k})

denotes the multivariate variable depending from all the variables in Y except

Y_{j}

.

Suppose that Y is partially observed random sample from the p-variate multivariate distribution

P (Y | θ)

. We assume that the multivariate distribution of Y is completely specified by θ, a vector of unknown parameters.

The MICE method has the aim to obtain the multivariate distribution of θ, either explicitly or implicitly. Given the parameters

θ_{j}, j = 1, \dots, k

, which are specific to the respective conditional variable

Y_{j}

, the MICE algorithm (see Algorithm A1 in Appendix A.1) implementing such method obtains the posterior distribution of θ by sampling iteratively from conditional distributions of the form

\begin{matrix} P (Y_{1} | Y_{- 1}, θ_{1}) \\ ⋮ \\ P (Y_{k} | Y_{- k}, θ_{k}) . \end{matrix}

Given a model

M_{j} (Y_{j}^{o b s}, Y_{- j}; θ_{j}), j = 1, \dots, k

, defined by parameter

θ_{j}

, to impute from observed (and already imputed) marginal distributions, the iterative procedure can be considered a Gibbs sampler that is used m times to calculate a number m of independent samples

{}^{(i)}Y, i = 1, \dots, m

of the complete variable where each of the

{}^{(i)}Y

inherits the observed part

Y^{o b s}

of Y filling the missing part

Y^{m i s}

of Y by “imputed” observations. That is, if

Y_{j}^{(t)} = (Y_{j}^{o b s}, Y_{j}^{* (t)})

is the j-th imputed variable at iteration t,

{}^{(i)}Y = (Y_{1}^{(m a x i t)}, \dots, Y_{k}^{(m a x i t)}, {\{{\bar{Y}}_{j^{'}}\}}_{j^{'} = 1, \dots, p - k}) .

All the m independent samples

{}^{(i)}Y, i = 1, \dots, m

can be collected in the set X for further analysis.

Regarding the selection of the number of imputations m, the desired “relative efficiency”

E_{M I} (m, λ)

of MI estimates [27] should be considered.

E_{M I} (m, λ)

can be evaluated by using the following formula [17]

E_{M I} (m, λ) = {(1 + \frac{λ}{m})}^{- 1},

(1)

where

λ

is the rate of missing data. See Table A1 in Appendix A.1 for the listing for some values of

E_{M I} (m, λ)

.

To overcome problems related with skewed data or data not described by linear models, the Predictive Mean Matching (PMM) method [28] is proposed as a flexible model

M_{j} (\cdot, \cdot; θ_{j})

for imputation. As described in [29], PMM is based on the following steps:

Model Building: A predictive model

\hat{y} = f (X)

is built using complete cases to estimate the relationship between the target variable

\hat{y}

(to be imputed) and a set of predictor variables represented by X.

Prediction Generation: This model is employed to predict values for both complete and incomplete cases. Let

{\hat{y}}_{i}

be the predicted value for the i-th observation of

\hat{y}

.

Matching Process: For each incomplete case, PMM identifies one or several complete cases whose predicted values are closest in distance to the incomplete case is predicted value. For each missing instance i, calculate:

j_{d m i n} = arg min_{j \in C} d i s t ({\hat{y}}_{i}, {\hat{y}}_{j})

where

C

is the indexes list for all completed observations.

Imputation by Matching: Replace the missing value

{\hat{y}}_{i}

with the observed value from the best-matching candidate

{\hat{y}}_{i} = {\hat{y}}_{j_{d m i n}} .

Optionally, multiple matches can be pooled to inject further randomness and variability into the imputation process.

2.3.2. Interpreting Multidimensional Data by Principal Components Analysis

In several fields, large datasets are becoming more and more common. Such datasets must be considerably reduced in dimensionality in an interpretable manner while maintaining the majority of the data’s content in order to be interpreted. Although many methods have been developed for this purpose, one of the oldest and most used is “Principal Component Analysis (PCA)”. Its concept is straightforward: lower a dataset’s dimensionality while maintaining as much “variability” (i.e., statistical information) as feasible [30].

PCA’s goal is to extract the important information from a set of observed data, described by several inter-correlated variables, to represent it as a set of new orthogonal variables called “principal components”, and to display the pattern of similarity of the observations and of the variables as points in appropriate maps called BiPlot [31]. The main uses of PCA are descriptive, rather than inferential [30]. “PCA allows us to simultaneously describe the association between variables, as well as the resemblance among individuals. PCA can also be regarded to as a dimension reduction technique of quantitative variables, often employed as an intermediate step towards a subsequent model building phase [32]”. Mathematically, PCA depends upon the solution to an eigenproblem or, alternatively, upon the singular value decomposition (SVD) of the (centered) data matrix. A complete description of the PCA approach can be found in [30,33]. For an informal but more descriptive approach in introducing PCA, we also suggest reading a paper by Aluja et al. [32]. Here, we propose some definitions and descriptions that are useful for our discussion of the proposed results.

Definition 2

(Principal Components Analysis (PCA) of a dataset Correlation Matrix). Let

D

a dataset with observations on p numerical variables, for each of n entities or individuals. These data values define an

n t i m e s p

data matrix

X

, whose j-th column is the vector

x_{j}

of observations on the j-th variable. Let

Z

be the normalized data matrix of

X

; then, for its generic elements,

z_{i j}

is valid as follows:

z_{i j} = \frac{y_{i j}}{s_{j}},

where

{\bar{x}}_{j}

is the mean value of the n observations of variable j, where

y_{i j} = x_{i j} - {\bar{x}}_{j}

and where

s_{j}

is the standard deviation of the elements

{\{y_{i j}\}}_{1, \dots, n}

, i.e.,

s_{j} = \sqrt{\frac{\sum_{i} y_{i j}^{2}}{n - 1}}

.

Let

R

be the sample covariance matrix related to

Z

;

R

is a positive semi-definite matrix. Then, it has an eigen-decomposition such as the following one:

R = A Λ A^{T},

(2)

where the p column vectors

a_{j} \in ℜ^{p}

of matrix

A

are the p linearly independent eigenvectors of

Z

defining an orthonormal set of vectors, i.e.,

a_{j} a_{j^{'}}^{T} = \{\begin{matrix} 1, & i f j = j^{'}, \\ 0, & o t h e r w i s e . \end{matrix}

Let us define the set of vectors

b_{j}, j = 1, \dots, p

as

b_{j} = Z a_{j}

(3)

Since the matrix

R

coincides with the correlation matrix of

X

, the vectors defined in (3) are called Principal Components (PCs) of the Correlation Matrix of the

D

dataset, and this is called a “Normalized PCA”, which is the PC approach based on those vectors.

It can be shown [30,31] that

A

is a solution of the following optimization problem:

A = \underset{\begin{matrix} Q : Q^{T} Q = I \\ Z Q o r t h o g o n a l \end{matrix}}{arg max} v a r (Z Q) .

(4)

Equation (4) expresses the essence of Principal Component Analysis which is related to identify a “projection matrix

Q

with orthonormal columns”, which transform the “cloud of points”, representing the observations

X

in a p dimensional space, in such a way that the new configuration, represented in a r-dimensional space

r \leq p

, is as close as possible to the original configuration (i.e., the distances among all different points are preserved as much as possible; see Figure 1). Let us call the new space a “Factorial Space”, or a “Factorial Plane” if

r = 2

.

When variables used in a dataset have different units of measurement, it is common practice to begin by standardizing the variables as in Definition 2. Correlation matrix PCs are therefore the best option for datasets where various scale variations are possible for each variable, since they are invariant to linear changes in units of measurement. For all these reasons, some statistical software assumes by default that a PCA means a normalized PCA [30].

In standard PCA terminology, the elements of the eigenvectors

a_{j}

are commonly called “PC loadings”, while the elements of vectors

b_{j}

are called “PC scores”, as they are the values that each individual would score on a given PC [30].

It is noteworthy that PCA is related to a Singular Value Decomposition (SVD) of the matrix

Z

(see Proposition A1 in Appendix A.2).

Diagonal elements of matrix

Λ

can be used to evaluate the variance of the projection

p r o j e c t i o n (z_{j}, a_{j})

of column vectors

z_{j}

of

Z

(i.e., the variables) on the computed components

a_{k}, k = 1, \dots, p

. In particular,

\begin{matrix} v a r (p r o j e c t i o n (z_{j}, a_{j})) & = & Λ_{j j} . \end{matrix}

(5)

where a normalized PCA is considered as follows:

V a r_{T o t} \sum_{j}^{p} v a r (p r o j e c t i o n (z_{j}, a_{j})) = \sum_{j}^{p} Λ_{j j} = p .

It is also possible to define the

R e l e v a n c e (j)

of the components j,

j = 1, \dots, p

in representing the total variance

V a r_{T o t}

, as well as the cumulative relevance

C u m u l a t i v e R e l e v a n c e (j)

of the first j components, as

\begin{matrix} R e l e v a n c e (j) & = & \frac{Λ_{j j}}{p} \times 100 \end{matrix}

(6)

\begin{matrix} C u m u l a t i v e R e l e v a n c e (j) & = & \sum_{j = 1}^{j} R e l e v a n c e (j) \end{matrix}

(7)

Thanks to the reformulation (A2) of PCA, one could assert that PCA “… is at heart a dimensionality-reduction method, whereby a set of p original variables can be replaced by an optimal set of q derived variables, the PCs.” [30], where

q < p

. In fact, consider the “Reduced-Rank”

Z q

approximation of matrix

Z r

(where

q < r

) (see Proposition A2 in Appendix A.2).

The BiPlot is a helpful tool for data analysis that makes it possible to visually evaluate the structure of data matrices. It is particularly useful in Principal Component Analysis, where the BiPlot may exhibit variances and correlations of the variables, as well as inter-unit distances and unit clustering [34]. Also, thanks to PCA, this graphing method can exploit the opportunity offered by PCA to approximate the data matrix by a matrix product of dimension 2 [32]. In a BiPlot, the individuals

z_{i}

and the variables

z_{j}

, of a normalized data matrix

Z \in ℜ^{n \times p}

defined as in Definition 2, are graphically represented, respectively, as points and as vectors (i.e., arrows) in a bidimensional Cartesian system (see Proposition A3 in Appendix A.2 for its description). The BiPlot is based on an approximation

\hat{Z}

of

Z

defined by the product

\hat{Z} = {\hat{Z}}_{1} {\hat{Z}}_{2}^{T},

where

{\hat{Z}}_{1} \in ℜ^{n \times 2}

and

{\hat{Z}}_{2} \in ℜ^{p \times 2}

, and where the rows of

{\hat{Z}}_{1}

and of

{\hat{Z}}_{2}

represent, respectively, the individuals and the variables.

As suggested in [30], and thanks to the definition of the bidimensional Cartesian system

C^{(j_{1}, j_{2})}

on which the BiPlot is based (see Proposition A3 in Appendix A.2), the BiPlot has the following properties:

The cosine of the angle between any two vectors representing variables is the coefficient of correlation between those variables.
Similarly, the cosine of the angle between any vector representing a variable and the axis representing a given PC is the coefficient of correlation between those two variables.
The inner product between the markers for individual i and variable j gives the value of individual i on variable j. The practical implication of this result is that orthogonally projecting the point representing individual i onto the vector representing variable j recovers the value $z_{i j}$ .
The Euclidean distance between the markers for individuals i and $i^{'}$ is proportional to the “Mahalanobis distance” [35] between them (see [33] for more details).

Roughly speaking,

Interpreting Points: The relative location of the points can be interpreted. Points that are close together correspond to observations that have similar scores on the components displayed in the plot. To the extent that these components fit the data well, the points also correspond to observations that have similar values on the variables.

Interpreting Vectors: Both the direction and length of the vectors can be interpreted. Vectors point away from the origin in some direction. With the principal components under consideration, a vector direction is associated with the highest correlation. The squared multiple correlation between the projected variable and the components under consideration determines the vector’s length. As a result, variables with comparable response profiles and meanings within the context of the data are represented by vectors pointing in the same direction. The observations with the greatest amount of variable measures are those whose points project the farthest in the direction of the vector points. The points with the least amount are those that project at the opposite end. The amount of those projecting in the middle is average.

2.3.3. Building Predictive Model from Data by Neural Networks

Neural networks, a cornerstone of artificial intelligence (AI) and machine learning, are computational models built from data inspired by the structure and function of the human brain [36]. The concept of neural networks dates back to the mid-20th century, but it was not until the advent of powerful computers and the availability of large datasets in the 21st century that neural networks truly flourished.

A neural network consists of layers of neurons (nodes), where each neuron is a function that takes inputs, processes them using weights, biases, and an activation function, and produces an output. A deep neural network (DNN) can be considered the result of putting more than one level of neurons one after another. Figure 2 shows an example of a DNN composed of 3 layers. The numbers

N_{1}

,

N_{2}

and

N_{3}

of “neurons” at each of the three layers are, respectively, 3, 4 and 2. If the number L of the DNN is equal to three, the DNN is called a “shallow neural network”.

We recall that the term “Data fitting” denotes the process of constructing a mathematical function

g : ℜ^{n} \to ℜ^{k}

(the model) that has “the best fit” to a series of data points

{\{(x_{i}, y_{i})\}}_{i = 1, \dots m} : x_{i} \in ℜ^{n}, y_{i} \in ℜ^{k}

. Curve fitting can involve either interpolation where an exact fit to the data is required (i.e.,

g (x_{i}) = y_{i}, \forall i = 1, \dots, M

) or smoothing in which a “smooth” g function is constructed that approximately fits the data (i.e.,

∥{[g (x_{i})]}_{i = 1, \dots m} - {[y_{i}]}_{i = 1, \dots m}∥ < ϵ

) for some small value for

ϵ

and some norm

∥\cdot∥

defined on

ℜ^{m}

. The most widely used approach, especially in the field of machine learning based on neural networks, to build and use a data-driven model is related to a “Data fitting” smoothing process, where a function

f_{α} : ℜ^{n} \to ℜ

, defined through a set of k parameters

α = {\{α_{j}\}}_{j = 1 \dots, k}

, should be determined by a “learning” process on known information to be subsequently used to “predict/describe” new ones. In detail, this looks as follows:

Learning phase Given a set of m data points

{\{(x_{i}, y_{i})\}}_{i = 1, \dots m}

, let

f_{α} : ℜ^{n} \to ℜ

be a function defined by a set of k parameters

α = {\{α_{j}\}}_{j = 1 \dots, k} : α_{j} \in ℜ

, organized in the vector

α \in ℜ^{k}

. The aim of the learning process, given a loss (or “cost function”) C, is to compute the following minimum:

α_{B e s t} = a r g m i n_{α \in ℜ^{k}} C ({\{(f_{α} (x_{i}), y_{i})\}}_{i = 1, \dots m})

(8)

Predict phase After learning from known information, the best fit function

f_{α_{B e s t}}

, where

α_{B e s t}

is the solution of problem (8),

f_{α_{B e s t}}

can be used to “predict/describe” unknown information

y_{u n k o w n}

about new data

x_{n e w}

y_{u n k o w n} = f_{α_{B e s t}} (x_{n e w})

(9)

Definition 3 defines the form of the fitting function

f^{N}

related to a DNN composed of L layers.

Definition 3

(Fitting function

f^{N}

of a DNN). Let

N

be a DNN composed of L layers and let

N_{l}

be the number of “neurons” in the l-th layer of

N

. The fitting function

f^{N}

of

N

has the form of the following function compositions:

Let

f : A \to B

and

g : B \to C

be two functions. Then, the composition of f and g, denoted by

g \circ f

, is defined as the function

g \circ f : A \to C

given by

g \circ f (x) = g (f (x)), \forall x \in A

.

Given a set of M functions

\{f_{i} | i = 1, \dots, M\}

such that

f_{i} : A_{i} \to A_{i + 1}

, with the symbol

◯_{i = 1}^{M} f_{i}

, we have

◯_{i = 1}^{M} f_{i} = f_{M} \circ f_{M - 1} \circ \dots \circ f_{2} \circ f_{1} .

f^{N} (x, {\{w_{i^{l} j^{l}}^{(l)}\}}_{\begin{matrix} l = 2, \dots, L \\ i^{l} = 1, \dots, N_{l} \\ j^{l} = 1, \dots, N_{l - 1} + 1 \end{matrix}}) = ◯_{l = 2}^{L} f_{l}^{N} (x^{l - 1}, {\{w_{i^{l} j^{l}}^{(l)}\}}_{\begin{matrix} i^{l} = 1, \dots, N_{l} \\ j^{l} = 1, \dots, N_{l - 1} + 1 \end{matrix}}),

(10)

where

w_{i^{l} j^{l}}^{(l)}

is the so called “weight” from the

j^{l}

-th “neuron” in the

(l - 1)

-th layer to the

i^{l}

-th “neuron” in the l-th layer [37]. Each function

f_{l}^{N}

is defined as the composed function

f_{l}^{N} (x^{l - 1}, {\{w_{i^{l} j^{l}}^{(l)}\}}_{\begin{matrix} i^{l} = 1, \dots, N_{l} \\ j^{l} = 1, \dots, N_{l - 1} + 1 \end{matrix}}) = σ (y^{l})

(11)

where

σ (y)

is the so called “activation function” and where

\begin{matrix} y^{l} & = & W^{(l)} [\begin{matrix} x^{l - 1} \\ 1 \end{matrix}] \end{matrix}

(12)

\begin{matrix} σ (y^{l}) & = & {(σ (y_{i^{l}}^{l}))}_{i^{l} = 1, \dots, N_{l}} \end{matrix}

(13)

\begin{matrix} W^{(l)} & = & {(w_{i^{l} j^{l}}^{(l)})}_{\begin{matrix} i^{l} = 1, \dots, N_{l} \\ j^{l} = 1, \dots, N_{l - 1} + 1 \end{matrix}} \end{matrix}

(14)

\begin{matrix} x^{1} & = & x \end{matrix}

(15)

The value

N = \sum_{l = 2}^{L - 1} N_{l}

of the total number of nodes in the hidden layers (

1 < l < N

) is called “Complexity of DNN

N

”.

The success of such models is also due to their approximation capabilities for which neural networks (NNs) are known as “Universal Approximators” [38,39,40,41,42] in the sense that they can approximate arbitrarily well any continuous function of n variables on a compact domain [42]. Theorem A1 (see Appendix A.3) expresses a universal approximation concept in a more formal way [42,43].

If just the “shallow neural networks”

N_{θ} (n, k), θ \in Θ : L = 3

are considered, some results about the computational complexity of computing a

O (ϵ)

approximation of a function g by a fitting function defined on

N_{θ} (n, k)

is available from [42] and is stated by Theorem A2 in Appendix A.3.

Theorem A2 (see Appendix A.3) states that “shallow neural networks” are able to give any desiderable approximation

ϵ

to any function

g \in W_{m}^{n}

at a computational cost that grows exponentially with n unless the smoothness of the approximant is increased.

In [42], Poggio et al. answer to some questions about “Which classes of functions can it approximate and learn well?”, giving the message that

.. deep networks have the theoretical guarantee, which shallow networks do not have, that they can avoid the “curse of dimensionality” for an important class of problems, corresponding to … hierarchically local compositional functions where all the constituent functions are local in the sense of bounded small dimensionality.

Since an optimization problem as in (8) should be solved in the context of machine learning based on DNNs to find a good approximation

f \in F (n, k)

of a function g that “models data”, particular attention should be paid to the conditions that guarantees the existence of optimization problem solution and the effectiveness of the algorithms used to compute such solution numerically [44,45].

In the context of machine learning [46,47], the most commonly used algorithm to compute the solution of problem (8) is the gradient descent algorithm [48] based on the Steepest Descent method [49]. Gradient descent is an algorithm (see Algorithm A2 in Appendix A.3) for unconstrained mathematical optimization. It is an iterative algorithm used for finding a local minimum of a differentiable multivariate function. It is based on the idea to take repeated steps in the opposite direction of the gradient

\nabla_{α} C (α)

(or approximate gradient) of the function

C (α)

at the current point, as this is the direction of steepest descent.

The gradient

\nabla_{α} C (α)

is a vector in

ℜ^{k}

defined as

\nabla_{α} C (α) = [\begin{matrix} \frac{\partial C (α)}{\partial α_{1}} \\ ⋮ \\ \frac{\partial C (α)}{\partial α_{k}} \end{matrix}]

It is possible to guarantee the convergence to a local minimum under certain assumptions on the function C (for example, C is a convex function and

\nabla C

is a Lipschitz functions) and particular choices of

γ_{n}

.

The “Learning phase” in the context of a DNN is then related with the aim to compute “the best values” for the “weights”

w = {(w_{i^{l}, j^{l}}^{(l)})}_{\begin{matrix} l = 2, \dots, L \\ i^{l} = 1, \dots, N_{l} \\ j^{l} = 1, \dots, N_{l - 1} + 1 \end{matrix}}

, given a “cost function”

C (w)

, by using Algorithm A2, taking into consideration the form of the “fitting function” f defined in Definition 3. Due to the nature of the cost function C and fitting function f, the gradient estimation needed at line 6 of Algorithm A2 can be computed by the “backpropagation method” [50].

Some studies such as “illuminating the NN black box” contribute to the literature on this matter [51,52]. Among them, we mention those that use the set of weights

w

to interpret predictor variable contributions in neural networks, such as the Garson and Olden methods, where the second is an evolution of the first one [52,53]. An alternate and more adaptable way for assessing variable relevance is the Olden method. Relevance is determined by multiplying the raw input-hidden and hidden-output connection weights for each input and output neuron and then summing the product across all hidden neurons. Unlike Garson’s technique, which only takes into account the absolute magnitude, this method preserves the relative contributions of every connection’s weight in terms of both magnitude and sign. For instance, Garson’s approach may produce misleading outcomes based on the absolute magnitude, while the connection weights that change sign (i.e., from positive to negative) between the input-hidden to hidden-output layers could have a canceling effect. The ability of Olden’s algorithm to evaluate neural networks with multiple hidden layers and response variables is an additional advantage. In the case of of a shallow neural network, the Olden method calculates the relevance coefficient

r_{i}^{q}

of the i-th input factor to q-th output using the product of the connection weights among the input layer neurons, hidden layer neurons and output layer neurons of the neural network by via following formula:

r_{i}^{q} = \frac{α_{i}^{q}}{\sum_{i^{'} = 1}^{n} α_{i^{'}}^{q}},

(16)

where

\begin{matrix} α_{i}^{q} & = & \sum_{j = 1}^{N} \frac{w_{i j}^{(2)} w_{j q}^{(3)}}{β_{j}}, \\ β_{j} & = & \sum_{i^{'} = 1}^{n} w_{i^{'} j}^{(2)} . \end{matrix}

Agreeing with Brownlee [54] who stated

… The objective of a neural network is to have a final model that performs well both on the data that we used to train it (e.g., the training dataset) and the new data on which the model will be used to make predictions …,

The main aim, in building such models, is to solve the challenging problem of defining models then can “generalize well” to new data [54]. Two cases indicate that the model fails in “generalization”: “Overfit” and “Underfit” models.

Underfit Model A model that performs poorly on a training dataset and does not perform well in predicting future observations reliably.

Overfit Model A model that corresponds too closely or exactly to a particular set of data and may therefore fail to fit to additional data or predict future observations reliably [55].

Underfitting by addressed by increasing the “capacity of the model”. Increasing such capacity is easily achieved by changing the structure of the model, such as adding more layers and/or more nodes to layers [54]. If an underfitting problem can be easily solved, this is not true for an overfitting case. Nonetheless, two ways can be considered to approach an overfit model based on an NN: training the network on more examples and changing the capacity of the network. As summarized by Brownlee [54], when the number of available data for training is limited, only the option of changing the network’s capacity can be taken advantage of by using one of the following two ways:

By changing the network structure (number of weights). This is called “structural stabilization” [54].
By changing the network parameters (values of weights) through the use of “regularization techniques” which, involving the addition of a penalty term to the “Cost function”, has the aim to constrain the values of the weights [54].

The term “regularization” is borrowed from the context of numerical analysis where “regularization approaches” are used to to transform an ill-posed problem into a more “stable” one. A problem is said to be ill-posed if small changes in the given information cause large changes in the solution. This instability with respect to the data makes solutions unreliable because small measurement errors for uncertainties in parameters may be greatly magnified and lead to wildly different responses. For an introductory description of the main regularization techniques in the context of NN-based models, we suggest reading Brownlee [54].

In summary, despite all the limitations related to the stability, convergence and interpretability of the algorithms on which NNs are based, they can be considered very powerful tools to model (also well approximating) any general phenomena thanks to their ability to express non-linear models and their nature of “Universal Approximators”. In this study, we will, then, evaluate the applicability of a shallow neural network of complexity N to build models for “supervised classification” (see Section 3.5) because of the following reasons:

From Theorem A2, they can be considered able to give any desirable approximation of order $ϵ$ to any function.
No information exists about observed data that gives the chance to reformulate the model by a hierarchically local compositional functions that can then justify a deep structure of an NN.

The choice for the values of complexity N is oriented by a compromise (see Equation (A13)) between the computational cost that grows exponentially with the dimension n of observations’ space and the smoothness of the approximant that conditions model generalization.

3. Results

This section will describe and comment on the tests performed to analyze which of the markers described in Section 3.1, and represented by “mathematical” variables introduced in Section 3.2, can be considered most useful and significant in identifying the disease state.

Given the high level of missing data, these were first be subjected to an imputation process (see Section 3.3) in order to be then (1) analyzed using PC-based approaches (see Section 3.4) and (2) used to build NN-type predictive models for supervised classification (see Section 3.5). In Figure 3, the flow-chart of of the different phases, involving mathematical tools introduced in Section 2, of qualitative and quantitative analyses of data is shown.

Figure 3. Flow-chart of the different phases, involving mathematical tools introduced in Section 2, of qualitative and quantitative analyses of data.

Figure 4. R code of the imputation phase.

3.1. Biochemical Data Description

Results from biochemical assays are summarized in Table 1. Table 1 shows a higher level of IL-18 (ELISA assay) in the skin and serum of ALS patients compared to healthy controls. By contrast, higher levels of IL-1

β

(ELISA assay) were observed only in the serum of ALS patients compared to controls. A more evident increase in NLRP-3 expression (Western blot analysis) in the skin biopsies of ALS patients compared to controls was detected, thus suggesting inflammasome cascade activation. Results from the ELISA kit assay demonstrated an increase in TGF-

β

levels in the serum of patients affected by ALS. In the skin biopsies, these levels were even higher than those detected in the serum, thus highlighting the importance of the skin as a strategic tool to predict ALS progression. At the same time, no particular differences were observed in IL-10 (an anti-inflammatory cytokine) levels (ELISA assay) in the serum and skin of ALS patients. We observed significantly increased levels of IL-6 (ELISA assay) only in the serum of patients affected by ALS. Furthermore, no significant differences were observed in NEK-7 expression (Western Blot analysis) in the skin biopsies of ALS patients compared to controls. By contrast, a two-fold increase in p-TDP-43 expression (Wester blot analysis) in the ALS skin biopsies compared to controls was observed, thus confirming the hypothesis of a correlation between NLRP-3 activation and nerve degeneration (p-TDP-43). Finally, we found that SOD activity was highly inhibited in the serum of ALS patients compared to healthy controls.

3.2. Mathematical Data and Computational Environment Description

A subset of the collected data are considered, consisting of data from fifty-six sick patients and seventeen healthy ones for a total of seventy-three observations. For each of the patient observations, the set of 14 biochemical markers described in Section 3.1 is represented as mathematical variables to which the Status variable is added to indicate the sick (Status = 1) or healthy (Status = 0) condition. In Table 2, such variables are listed together with

The name (column 1) and related marker description (column 5);
The numbers of available data for each marker (column 2) and its percentage on the total of observation (column 3);
The percentage of missing data for each marker (column 4).

All the tests were performed by using packages (whose full list is shown in Table 3) available for the R: a well-known language and environment for statistical computing. As suggested by its authors, R is an integrated suite of software facilities for data manipulation, calculation, and graphical display, where the term “environment” is intended to characterize it as a fully planned and coherent system designed around a true computer language, and allowing users to add additional functionality by defining new procedures [56]. For computationally intensive tasks, C, C++, and Fortran code can be linked and called at run time. Furthermore, R packages exist that can implement strategies for introducing “parallelism” in R algorithms [56].

All the presented results are obtained by using computing resources whose details are listed in Table 4.

3.3. Imputation Phase

In this subsection, some details are given about the imputation phase of the data described in Section 3.2. The imputation process was performed by procedures from the mice R package.

Missingness in the dataset may follow a Missing At Random (MAR) mechanism, meaning that the probability that a value is missing is related to observed patient characteristics, rather than to the unobserved molecular measurements themselves. For example, patients with more advanced ALS or with greater functional impairment might be less able, or less willing, to undergo invasive procedures such as skin biopsies. In such cases, the missing molecular data are systematically associated with disease severity or functional status, which are variables available in the dataset, consistent with an MAR pattern.

The following 6 markers have been eliminated due to collinearity with the remaining ones, such as suggested by the procedure mice from MICE R package [25]: IL6Skin, IL10Serum, pTDP43GAPDH, NEK7bActin, IL1BSkin, IL1BSerum.

We observe that the mice R procedure, by default, checks for collinearity when fitting the imputation model. If a variable is almost “collinear” with another one (the correlation level between variables is above a certain threshold), it is removed and not imputed [57]. The find.collinear internal MICE procedure is used to establish collinearity between variables and is based on correlation matrix computation [58].

The remaining 8 markers, together with the Status variable, are organized in an R data frame Y whose structure is described on lines 5–14 in Figure 4.

The censored data Y is then imputed by the mice procedure (see line 19 in Figure 4) whose output (the imp object) contains, among the others, information about the following:

The method used for the imputation of each univariate variable (see lines 23–27), which is the Predictive Mean Matching (PMM) (pmm) for all variables that must be imputed and is the empty method “” for the Status variable that does not require imputation since it does not contain censored data. Details about implementation of PMM in mice package can be found in [57].
The Predictor Matrix (see line 28), which is a square matrix of size ncol(Y) containing 0/1 data. Each row in Predictor Matrix identifies which predictors are to be used for the variable in the row name. In the case of our approach, the Status variable is used as a predictor for all the censored variables, although it should not imputed because such a variable can be considered as a relevant variable. We note that general advice from MICE experts is to include as many relevant variables including their interactions [25].

The number of imputations is chosen to be

m = 30

according to maximum efficiency

E_{M I} (m, λ)

obtainable when the percentage

λ

of censored data are greater than

70 %

. (see Table A1 in Appendix A.1). Instead, the value of the maximum number of iterations of the iterative process of the MICE algorithm (see line 6 of Algorithm A1 in Appendix A.1) is chosen to be the default value of of mice procedure (that is

m a x i t = 20

) since such value is often considered sufficient [25]. The random seed used in imputation step by mice procedure is seed = 23109 (see line 19 in Figure 4).

Once the MICE algorithm terminates, “it is important to inspect the imputations. In general, a good imputed value is a value that could have been observed had it not been missing [25]”. So, to check whether the imputations created by the MICE algorithm are plausible, density plots of both the observed and imputed values of all variables can be drawn. In Figure 5, the density trends of imputed (red lines) and observed data (blue line) for each of the 8 considered markers are plotted. Such plots, obtained by the R procedure densityplot (see lines 46–48 in Figure 4) using the density procedure from the R package stats, are based on the kernel density estimate (KDE) for the probability density of univariate variable [59]. Plots show that there are no discernible, meaningful disparities between the observed data and the imputed values.

At the end of the data imputation process, a data frame X comprising

73 \times 30 = 2190

samples is obtained, whose structure is described on lines 54–63 in Figure 4.

3.4. Interpretation Phase

To qualitatively study the relevance of the molecular markers introduced in Section 3.2 in the identification of ALS disease state, we can study the “relations” between the variables associated with such markers and the variable Status representing the disease state by using the dataset of complete observations obtained by the MI described in Section 3.3 on which a normalized PCA is performed.

PCs are computed by using the procedure princomp from the R stats package. Such procedure is based on a computation of eigenvalues of the covariance matrix of an input data frame passed to the princomp procedure.

Two types of tests were performed on such data:

Test set n. 1 Normalized PCA of the full data frame X obtained at the end of the MI phase described in Section 3.3 to identify the correlations between the marker variables and the disease state one. See lines 2–20 of the code listed in Figure 6. This analysis aims to identify which marker variables are more correlated with the disease state.

Test set n. 2 Normalized PCA of the data frame XMarkers obtained from X preserving just the variables associated with the markers. See lines 25–38 of the code listed in Figure 6. This analysis aims to identify which groups of marker variables are more effective in distinguishing the disease state.

Below is a series of descriptions of tests results:

The “Proportion of Variance” values listed in Figure 6 for each component $j = 1, \dots, p$ (where $p = 9$ for Test set n. 1 and $p = 8$ for Test set n. 2) represent the $R e l e v a n c e (j)$ defined by Equation (6). Figure 7 shows the trend of $R e l e v a n c e (j)$ as a function of j for Test set n. 1 (a) and Test set n. 2 (b). Furthermore, the “Cumulative Proportion” values represent the $C u m u l a t i v e R e l e v a n c e (j)$ defined by Equation (7).
Regarding Test set n. 1, in Figure 8, PC-based BiPlots of the vectors ${(z_{j})}^{∠_{j_{1}, j_{2}}}$ , representing the variable $z_{j}$ in an underlying Cartesian system $C_{T e s t S e t 1}^{(j_{1}, j_{2})}$ (see Section 2.3.2), are shown. The following list of $(j_{1}, j_{2})$ for PC-based BiPlots is considered: $j_{1} = 1$ , $j_{2} = 2, \dots, 9$ . In each BiPlot, the color of the vectors ${(z_{j})}^{∠_{j_{1}, j_{2}}}$ is tied to their “representation accuracy” of the variable $z_{j}$ in a related Cartesian system $C_{T e s t S e t 1}^{(j_{1}, j_{2})}$ . In Figure 9, PC-based BiPlots of the vectors ${(z_{j})}^{∠_{j_{1}, j_{2}}}$ and the individuals $z_{i}$ such as projected on the Cartesian system $C_{T e s t S e t 1}^{(j_{1}, j_{2})}$ (see Section 2.3.2) are shown. The following list of $(j_{1}, j_{2})$ for PC-based BiPlots is considered: $j_{1} = 1$ , $j_{2} = 2, \dots, 9$ . In each BiPlot, the gray color intensity of the vectors ${(z_{j})}^{∠_{j_{1}, j_{2}}}$ is tied to their “representation accuracy”.
Regarding Test set n. 2, in Figure 10, PC-based BiPlots of the vectors ${(z_{j})}^{∠_{j_{1}, j_{2}}}$ and the individuals $z_{i}$ such as projected on the Cartesian system $C_{T e s t S e t 2}^{(j_{1}, j_{2})}$ (see Section 2.3.2) are shown. The following list of $(j_{1}, j_{2})$ for PC-based BiPlots is considered: $j_{1} = 1$ , $j_{2} = 2, \dots, 8$ . In each BiPlot, the gray color intensity of the vectors ${(z_{j})}^{∠_{j_{1}, j_{2}}}$ is tied to their “representation accuracy”.
In Figure 9 and Figure 10, ellipses are plotted “around” individuals that represent a concentration ellipse [35] in normal probability where the percentage of the included individuals is $95 %$ . Such plots are based on the stat_ellipse procedure from the R ggplot2 package [60].

Below is a series of considerations based on the tests carried out:

■

From the lines 16 and 34, we observe what the first five components of the PCs from both the test sets express about the

77 %

of the “information” contained in each dataset correlation matrix. Each of the components from the second one to the 5-th express

10 %

.

■

The choice related to the couple

(j_{1}, j_{2})

for a PC-based BiPlots is based on a combination of

j_{1}, j_{2}

that expresses the same amount of “information” when

j_{2} = 2, 3, 4, 5

: about 35–40%.

■

From Figure 8, related to Test set n. 1, the following can be observed:

High values for the “representation accuracy” $c o s 2$ of the variables Status and SodSerum in all the considered BiPlot spaces, confirming that any of the considered spaces can be potentially useful to get information about the relations among such variables.
That in almost all the considered BiPlot spaces (i.e., excluding the “less significant” one), the variables Status and SodSerum are highly correlated.
High values of “representation accuracy” are also present for both the variables IL18Skin and IL18Serum which express a good level of correlation with the Status variable (for example, see Figure 8a,b,d–f).
Just some of the considered BiPlot spaces (see Figure 8b,c,g) show a good correlation level for the variable NLRP3bActinSkin (which has a medium level “representation accuracy”) with the Status one. Meanwhile, some other BiPlot spaces (see Figure 8c,d,f) show a good correlation level for the variable IL6Serum with Status, although the “representation accuracy” of IL6Serum is not so high.
That the variables TGFBSerum and IL10Skin appear almost uncorrelated with Status in almost all the considered BiPlot spaces where their “representation accuracy” can be considered appreciable (see Figure 8a–c). The variable TGFBSkin shows a behavior, with respect to the Status variable, that makes it difficult to interpret: high correlation with not so high “representation accuracy” (i.e., see Figure 8a) and low correlation where its “representation accuracy” has a bigger value (i.e., see Figure 8b).

■

From Figure 9, related to Test set n. 1, as counter-evidence of the effectiveness of representing individuals in a BiPlot space, it can be observed that the presence of the Statut variable separates sick individuals from healthy ones well.

■

The same considerations on the “individual separation” are not possible when the Status variable is not used (Test set n. 2). Indeed, from Figure 10, we observe that “concentration ellipses” for sick and healthy individuals can have a big overlap. We can also observe that the orientation and center of “concentration ellipses” for sick individuals seems to be, in the majority of BiPlot spaces which also coincide with the spaces that “carry-out the greatest amount of information”, particularly conditioned by the variables SodSerum, IL18Serum, IL18Skin, NLRP3bActinSkin and TGFBSkin (see Figure 10a–d).

3.5. Provisional Method for Supervised Classification Based on Neural Network

To quantitatively study the relevance of the molecular markers introduced in Section 3.2 in the identification of the ALS disease state, we can analyze the results of NN-type predictive models for supervised classification. See Figure 11 for the R code used to perform the analysis.

Such approach is based on the following steps:

Define and “Learn” an NN-type model to classify sick and healthy individuals on the basis of markers from some known individual states.

The first action is to define the model

g : X \to Y

, clarifying what is the domain

X

and co-domain

Y

of g. In particular, we call

X

the space of parameters and define it as

X = \{(x_{1}, \dots, x_{N_{P a r}}), x_{i} \in ℜ\},

where

N_{P a r} = 8

and where each

x_{i}

represents a marker according to the association defined in Table 5. The space

Y = \{(y_{1}, y_{2}), y_{i} \in \{0, 1\}\}

is based on the couples

\begin{matrix} (y_{1}, y_{2}) & = & \{\begin{matrix} (0, 1) & if individual is classified as sick, \\ (1, 0) & if individual is classified as healthy . \end{matrix} \end{matrix}

(17)

The second action is to improve the definition of g as an NN-based model; thus, we assume the following form for g:

g = h (f_{s i g m o i d} (x; w, N)),

where

f_{s i g m o i d} (x; w, N) \in F^{L = 3, σ = σ_{s i g m o i d}} (N_{P a r}, 2)

and where

F^{L = 3, σ = σ_{s i g m o i d}} (N_{P a r}, 2)

is the subset of

F (N_{P a r}, 2)

defined as

F^{L = 3, σ = σ_{s i g m o i d}} (N_{P a r}, 2) = \{f^{N_{θ} (N_{P a r}, 2)}, θ \in Θ : L = 3, σ = σ_{s i g m o i d}\} .

We consider that

σ_{s i g m o i d}

is a function defined as

\begin{matrix} σ_{s i g m o i d} (x) & = & \frac{1}{1 + e^{- x}} . \end{matrix}

The sigmoid function is a nonlinear activation function widely used in neural networks and other computational models. Its output range makes it particularly suitable for modeling probabilities, as it can map any real value to the

[0, 1]

interval, a property leveraged in binary classification problems [61].

Then,

f_{s i g m o i d} (x; w, N)

is a function “defined” by a shallow network whose “activation function” is the so-called “Sigmoid function”, whose “complexity” N is equal to the number of neurons of the single “hidden layer”, and whose set of weights is

w = {\{w_{i, j}^{(2)}\}}_{i = 1, \dots, N_{P a r}, j = 1, \dots, N} \cup {\{w_{i, j}^{(3)}\}}_{i = 1, \dots, N, j = 1, \dots, 2} .

A shallow network is preferable since, at the status, no information about pure locally interaction among neurons is available (see considerations about the “Curse of Dimensionality” in Section 2.3.3 and Theorem A2 in Appendix A.3). Moreover, since the limited value of

n = N_{P a r} = 8

, the complexity N remains limited to an order of

O (10^{3})

(see Figure 12) also because the considered use case does not require a very small value for

ϵ

:

ϵ = O (10^{- 1})

could be considered a good value fo

ϵ

since we have to just discriminate between the values of function

f_{s i g m o i d} (x; w, N)

that are “near” zero or one. Figure 12 shows the trend of shallow neural network complexity N providing the accuracy of at least

ϵ

when

n = N_{P a r} = 8

[42].

Moreover, we consider that h is a function defined as a (Winner-Takes-All [WTA] strategy):

\begin{matrix} h : z \in ℜ^{2} \mapsto h (z) \in Y & where & h (z) = \{\begin{matrix} (0, 1) & i f & z_{2} = max (z_{1}, z_{2}), \\ (1, 0) & i f & z_{1} = max (z_{1}, z_{2}) . \end{matrix} \end{matrix}

Verify NN-type model: To assess how the model will describe individuals (the “test individuals”

{\{x_{i}^{T e s t}, y_{i}^{T e s t}\}}_{i = 1, \dots m_{T e s t}}

) not used to “Learn” the model (that we identify by

{\{x_{i}^{L e a r n}, y_{i}^{L e a r n}\}}_{i = 1, \dots m_{L e a r n}}

). To “Learn” the parameters (the weights

w

) defining the function of the form

f_{s i g m o i d} (x; w, N)

, we use the procedure mlp from the R RSNNS package. Such procedure, if data related to “test individuals” are passed to procedure (see line 18 in Figure 11), has the advantage that the “Cost function” in both the learning and test sets are evaluated after each iteration (also called “epoch”) of the iterative process used to compute the minimum of such “Cost Function” (see Algorithm A2). Values of “Cost function” can later be used to analyze the training process, by the plotIterativeError procedure from the RSNNS package (see line 20 in Figure 11), for example. Training and test data can be both obtained from the input data X by the procedures splitForTrainingAndTest and normTrainingAndTestSet from the RSNNS package (see lines 13 and 14 in Figure 11). Data frame X is the output of the imputation process (see code in Figure 4) and collects data from

73 \times 30 = 2190

individuals.

The considered “Cost Function” is

C ({\{f_{s i g m o i d} ((x_{i}; w, N), y_{i})\}}_{i = 1, \dots m}) = \sum_{i = 1}^{m} {∥f_{s i g m o i d} (x_{i}; w, N) - y_{i}∥}^{2} .

Such “Cost function” is known as the “Summed Squared Error (SSE)”, i.e., the sum of the squared errors of all patterns (the set of all the observations

{\{x_{i}^{\cdot}, y_{i}^{\cdot}\}}_{i = 1, \dots m_{\cdot}}

) for every epoch.

See Figure 13 for results from the plotIterativeError procedure related to the neural network learning phase [53,62] for different values of neural network complexity N. This generates a plot (black line), normalized by dividing the SSE through the test set ratio (which is the amount of patterns in the test set divided by the amount of patterns in the training set) [63]. If a test set is provided, its SSE is also shown in the plot (red line).

See Figure 14 for the regression error plot obtained by the plotRegressionError procedure from the RSNNS package on test data (see lines 28 and 29 in Figure 11). As a classification is performed, ideally only the points

(0, 0)

and

(1, 1)

would be populated. The procedure plotRegressionError can be used to generate a regression plot which illustrates the quality of the regression. It has target values on the x-axis and fitted/predicted values on the y-axis. The optimal fit would yield a line through zero with gradient one. This optimal line is shown, as well as a linear fit to the actual data [63]. Regression error plot are showed for different values of complexity N. The predict procedure from the R RSNNS package can be used to obtain results on test data (see line 28 in Figure 11).

To analyze efficiency, the “confusion matrices” are used. Such matrices are obtained by the confusionMatrix procedure from the RSNNS package (see lines 44–47 in Figure 15). A confusion matrix,

C = {(c_{i, j})}_{i, j = 1, \dots, n_{c l a s s e s}}

, related to a classification problem of m individuals on

n_{c l a s s e s}

, represents the amount of times the network erroneously classified an individual of the i-th class to be a member of the j-th one. See Figure 16 for “Mean values of Confusion Matrix Efficiency”

E^{M e a n} (C)

obtained as a mean value of the 10 values of Confusion Matrix Efficiency

E^{k} (C), k = 1, \dots, 10

computed by a 10-fold cross-validation process [54] implemented by the code listed in Figure 15, for both the training and test sets of individuals, as a function of the complexity N of the NN where

E (C)

is defined as

E (C) = \frac{\sum_{i = 1}^{n_{c l a s s e s}} c_{i, i}}{m} .

E (C)

represents the percentage of classification successes compared to the total number of individuals.

In k-fold cross-validation, the original sample is randomly partitioned into k equal-sized subsamples, often referred to as “folds”. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining

k - 1

subsamples are used as training data. The cross-validation process is then repeated k times, with each of the k subsamples used exactly once as the validation data. The k results can then be averaged to produce a single estimation. The advantage of this method over repeated random sub-sampling is that all observations are used for both training and validation, and each observation is used for validation exactly once. Moreover, 10-fold cross-validation is commonly used [64] but, in general, k remains an unfixed parameter.

In Table 6, we show the Mean Brier Score [65] of the predicted values on test data. Such mean values are obtained by the BrierScore procedure from DescTools R package by the same cross-validation process described above (see lines 49–53 in Figure 15).

Results presented in Figure 16 indicate a very good level of efficiency

E (C)

in both the training and test phases. Values listed in Table 6 also indicate a good level for Brier Scores for test data.

From all Figure 13 and Figure 14, we observe that, for higher values of N, the neural network fitting function

f_{s i g m o i d} (x; w, N)

suffers from an “overfitting” problem. High values of N seem to be related to fitting functions with a low level of “smoothness” [38]. All the results presented in those Figures suggest choosing

N \leq 700

to prevent such a problem. Also, the use of “regularization techniques” aimed to limit the amplification of the weights values seems to be useful. Plots in Figure 17 show results from the plotIterativeError procedure related to the use of the “Weight Decay” in learning procedure [66] (see lines 24 and 25 in Figure 11). These plots show that divergence behavior throughout the learning phase on test data appears to be less prominent.

Analyze the relative importance of marker variables: to assess the relevance of markers in identifying sick status. See Figure 18 for results from an Olden-type Analysis of the neural network learning phase [53,62]. The plots are obtained for different values of N by the olden procedure from the NeuralNetTools package [67] (see line 32 in Figure 11). That procedure implements the Olden’s connection weights method described in Section 2.3.3. Olden-type Analysis, by computing the relevance coefficient

r_{i = 1, \dots, N_{P a r}}^{2}

for all the

N_{P a r}

parameters, finds that the parameters numbered 3, 1, 7, and 8, respectively, associated with the markers NLRP3bActinSkin, IL18Skin, IL6Serum, and SodSerum (see Table 5), are the most relevant in identifying sick status (second output variable out_var=’Output_1’, see line 32 in Figure 11).

4. Discussion

Based on the results presented in this study, both the qualitative and quantitative analyses described in Section 3.4 and Section 3.5 indicate that the most significant molecular markers for identifying and characterizing ALS disease are NLRP-3 in skin, IL-18 in skin, IL-6 in serum, SOD in serum, TGF-β in skin, and pTDP-43 in skin.

For example, the correlations highlighted from PCA can be interpreted as in the following. The activated NLRP-3 (pyroptosis) in skin amplifies local inflammation by the release of IL-18 [68]. Thus, IL-18 in the dermis is a direct marker of inflammasome activation. IL-6 is also strictly connected to a pyroptosis cascade and it is correlated with an inflammatory response in the acute phase and associated with oxidative stress and SOD dysfunction. In this scenario, altered SOD triggers activation of TGF-β with an imbalance of neurogenesis and neurodegeneration, stimulates NLRP-3, releasing IL-18 in tissues (including skin), thus contributing to disease progression [69]. Finally, pTDP-43 deposits in skin activate inflammatory responses, including NLRP-3, IL-6, IL-18 and TGF-β. This interpretation highlights how PCA dimensions capture biologically relevant pathways, reinforcing the clinical significance of our multivariate approach.

From a biochemical perspective, the findings suggest that NLRP-3, as a component of the inflammasome cascade, may play a role in ALS pathogenesis, and its gene expression levels could serve as a biomarker for predicting ALS onset and progression in human patients. Moreover, the detection of pathological TDP-43 aggregates in skin biopsies from living ALS patients, compared to controls, points to a strong link between inflammasome activation and neurodegenerative processes.

Additionally, persistent alterations in the TGF-β system associated with NLRP-3 activation have emerged as a potential therapeutic target for ALS. Our results also revealed elevated IL-6 levels in the serum of ALS patients. To further investigate oxidative stress related to ALS disorders, we assessed total SOD antioxidant activity in ALS patient serum to determine whether SOD maintains its protective role. The results showed a marked inhibition of SOD activity in ALS patients, confirming that oxidative stress contributes significantly to ALS development and progression by suppressing antioxidant enzyme function. Finally, among all cytokines analyzed, IL-18 production appears to be the most specific marker of inflammasome activation, closely linked to TDP-43 aggregation triggered by NLRP-3 activation.

In our study, the multivariate models based on the combination of mathematical and biochemical methods allowed identifying the most influential contributors to ALS disease classification and diagnosis (NLRP-3 in skin, IL-18 in skin, IL-6 in serum, TGF-β in skin, SOD in serum, and pTDP-43 in skin). These markers are known to be involved in biological processes, e.g., neuroinflammation, oxidative stress, which aligns with current understanding of ALS pathophysiology. Furthermore, the combination of neuroinflammation marker (NLRP-3 and its related cytokines), oxidative stress marker (SOD) and neurodegeneration marker (pTDP-43) improved classification accuracy, suggesting potential synergistic roles in disease progression. In addition, this study was performed on tissues obtained from living ALS patients, complemented by matched serum measurements.

Previous investigations have predominantly relied on animal models (e.g., SOD1G93A mice/rats) [9,70], and post-mortem human tissues [71], which are invaluable for mechanistic insight but limited for real-time diagnosis and treatment decisions. This is the first study to combine mathematically interpretable multivariate modeling with biochemically validated correlation between oxidative, neuroinflammatory and neurodegenerative ALS biomarkers across serum and peripheral tissue, enabling biologically grounded diagnosis. Indeed, by measuring biochemically validated pathway nodes in viable patient tissue, our work establishes a new, minimally invasive diagnostic approach that can support severity ALS stratification and progression monitoring, thereby informing therapeutic management in the clinical setting.

5. Conclusions and Future Directions

To overcome some limitations of this study, which are principally due to (1) the high missingness, (2) the modest sample size, and (3) the generalizability of the findings on more wide and complete cohorts, new studies already in progress will be focused on the following:

Acquiring a more complete and wide datasets for which an effective assessment of missingness type for data (MNAR, MCAR, or MAR) [16] is available.
Proposing and validating some evolution of the NN-based models, described in such study, that can solve the generalization problems related to small cohorts size very typical of clinical settings.

However, all the above considered, this study should be interpreted as a methodological and exploratory analysis rather than a definitive validation ready for clinical application. It provided us with a preliminary opportunity to design a “proof-of-concept”, which we envision as the starting point for future work.

About the identifying the most promising mathematical methodologies and techniques to use in ALS characterization, we will not ignore the analysis of “the state-of-the-art” as described in the literature [72,73]. Thanks to a deeper study of the “state-of-the-art”, our work will evolve with activities that allow the development of predictive models that can be used to improve the knowledge of the phenomena underlying the origin and evolution of the ALS disease. It could also lay the foundations for the definition of a protocol useful for constructing “digital twins” [74,75] of the entire process of study, diagnosis, and treatment of the disease from the perspective of innovative precision medicine. More in details, we intend to deepen all the aspects related with the “generalization” of the NN-based models in the sense introduced at the end of Section 2.3.3 with particular attention to “regularization techniques” and to the NN structure. Indeed, in addition to carrying out comparative tests with classic classification methods such as Random Forest ones [76] (see first very preliminary results proposed in Appendix B.1), we intends to investigate the use of graph neural networks (GNNs) that are considered very effective tools for learning on graph-structured data since have shown cutting-edge performance in a variety of applications, including biological network modeling and social network analysis [77]. Indeed, compared to analyzing data separately, the special capability of graphs to capture the structural relationships between data allows for the extraction of additional insights. The use of GNNs could also lay the basis for DNNs able overcome the “curse of dimensionality” problem since the fitting functions of DNN are based on hierarchically compositional functions which are local in the sense of bounded small dimensionality (see Section 2.3.3). Other key tools to build data-driven models potentially of interest are the recurrent neural network (RNNs) [78]: a class of neural networks designed for processing sequential data, such as text, speech, and time series [78].

Considering the aims of our work, related to ALS biomarkers identification and patient stratification, a multi-view classification [36], combined with a “Features Selection” operation [79], seems to be the most promising approach [80,81] since the multiomics nature of the problem. The combination of GNNs with RNNs will be investigated as a tool to predict ALS disease progression starting from time series collected data, the so called “longitudinal data” (see [82] for an example for such kind of application).

From the point of view of Computational Scientists, our work intend to be an useful case study to be used in the development of algorithms, such as the optimization algorithms that are the basis and of interest for many other application contexts, in the new High-Performance Computing (HPC) contexts made available in the nascent Exascale Computing Era [23,83,84,85,86,87,88,89,90].

Furthermore, the application of ML techniques in critical areas, such as healthcare, has made it necessary to understand their underlying mechanisms and their often opaque outputs. Explainable AI (XAI) [91] has emerged as a response to this demand, as it seeks to develop methods for explaining AI systems and their outputs “illuminating the NN black box”. Indeed, the increasing use of AI-based systems, especially in critical areas, has made XAI an area of study with significant practical and ethical value.

Author Contributions

Conceptualization, L.C. and I.F.; methodology, L.C.; software, L.C.; validation, L.C., U.D., R.D. and I.F.; data curation, U.D., R.D. and I.F.; writing—original draft preparation, L.C.; writing—review and editing, L.C., U.D., R.D. and I.F.; founding acquisition: R.D. and I.F. All authors have read and agreed to the published version of the manuscript.

Funding

The authors acknowledge support from Progetto MUR PRIN2022 INFLAMM-ALS, Grant No. 2022BNZLMN.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Comitato Etico Campania 3 of Ufficio Locale Comitato Etico-A.O.U. “Federico II” del (protocol number 100/17/ES01 (14 May 2017), protocol number 151/2023 (13 September 2023)).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy reasons.

Acknowledgments

Luisa Carracciuolo is a member of the “Gruppo Nazionale Calcolo Scientifico—Istituto Nazionale di Alta Matematica (GNCS–INdAM)”. This work was carried out using the computational resources available at the scientific datacenter of the University of Naples Federico II.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ALS	Amyotrophic Lateral Sclerosis
PCA	Principal Component Analysis
MI	Multiple Imputation
NN	Neural Network

Appendix A. Mathematical Tools—Further Material

Appendix A.1. Handling Incomplete Data

Algorithm A1 The MICE algorithm.

1:: procedure MICE(Y, k, p, m, ${\{M_{j} (Y_{j}^{o b s}, Y_{- j}; θ_{j})\}}_{j = 1, \dots, k}$ , X)
2:: Input: Y, k, p, m, ${\{M_{j} (Y_{j}^{o b s}, Y_{- j}; θ_{j})\}}_{j = 1, \dots, k}$
3:: Output: X
4:: parallel for $i = 1, \dots, m$ do
5:: Initialize $Y_{- 1}^{(0)}$ ▹ Impute, by simple method, each uni-variate variable $Y_{j}, j = 2, \dots, k$
6:: for $t = 1, \dots, m a x i t$ do
7:: for $j = 1, \dots, k$ do
8:: $θ_{j}^{* (t)} \leftarrow P (θ_{j} | Y_{j}^{o b s}, Y_{- j}^{(t)})$ ▹Estimate parameter $θ_{j}$ defining model $M_{j} (\cdot, \cdot; θ_{j})$
9:: $Y_{j}^{* (t)} \leftarrow M_{j} (Y_{j}^{o b s}, Y_{- j}^{(t)}; θ_{j}^{* (t)})$ ▹ Impute missing values of $Y_{j}$ by $M_{j} (\cdot, \cdot; θ_{j}^{* (t)})$
10:: end for
11:: end for
12:: ${}^{(i)}Y \leftarrow ({\{Y_{j}^{(m a x i t)}\}}_{j = 1, \dots, k}, {\{{\bar{Y}}_{j^{'}}\}}_{j^{'} = 1, \dots, p - k})$ ▹ Form the i-th complete data sample ${}^{(i)}Y$
13:: $X \leftarrow X \cup {}^{(i)}Y$ ▹ Add to the set X of completed data samples
14:: end parallel for
15:: end procedure

Table A1. Relative Efficiency of Multiple Imputation with an Increasing Number of Imputations and Proportion of Censored Data (see table 1 in [17]).

	Proportion of Censored Data ( $λ$ )
No. of Imputations ( $m$ )	10%	30%	50%	70%	90%
3	0.97	0.91	0.86	0.81	0.77
5	0.98	0.94	0.91	0.88	0.85
10	0.99	0.97	0.95	0.93	0.92
20	1.00	0.99	0.98	0.97	0.96
30	1.00	0.99	0.98	0.98	0.97

Appendix A.2. Interpretating Multidimensional Data by Principal Components Analysis

Proposition A1

(Relation beetween PCA and SVD of the matrix

Z

). Any arbitrary matrix of dimension

n \times p

and rank r (where necessarily,

r \leq min (n, p)

), matrix

Z

can be written as

Z = U Δ V^{T},

(A1)

where

U

,

V

are

n \times r

and

p \times r

matrices with orthonormal columns (

U^{T} U = I_{r} = V^{T} V

, with

I_{r}

the

r \times r

identity matrix) and

Δ

is an

r \times r

diagonal matrix.

Since

R = Z^{T} Z

, from Equation (A1) the following relations follows

\begin{matrix} R & = & Z^{T} Z \\ = & {[U Δ V^{T}]}^{T} [U Δ V^{T}] \\ = & [V Δ U^{T} U Δ V^{T}] \\ = & V Δ^{2} V^{T} \\ = & V Λ V^{T}, \end{matrix}

(A2)

where, if

r = p

,

Λ = Δ^{2}

and

V = A

.

Proposition A2

(Reduced rank approximation

Z q

of a matrix

Z

whose rank is equal to r (where

q < r

)). To build a “Reduced rank approximation

Z q

” of matrix

Z

whose rank is equal to r, the Truncated Singular Value Decomposition (TSVD) can be used. TSVD represents the best approximation of an arbitrary matrix in the spectral and Frobenius norms [83]. Then the matrix

Z_{q} = U_{q} Δ_{q} V_{q}^{T}

, where

U_{q}

and

V_{q}

are matrices built on the first q rows of and

U

and

V

, and

Δ_{q}

is the diagonal matrix built on the top-left

q \times q

block of the matrix

Δ

, is the best Reduced-Rank

Z q

approximation of matrix

Z

. For matrix

Z q

is valid

\begin{matrix} {∥Z - Z_{q}∥}_{2} & = & Δ_{r + 1, r + 1}, \end{matrix}

(A3)

\begin{matrix} {∥Z - Z_{q}∥}_{F} & = & \sqrt{\sum_{j = q + 1}^{r} Δ_{j j}^{2}}, \end{matrix}

(A4)

where

{∥\cdot∥}_{2}

and

{∥\cdot∥}_{F}

denote, respectively, the L2 and Frobenius norms.

Proposition A3

(Description of bidimensional Cartesian

C^{(j_{1}, j_{2})}

system used by a BiPlot representation). Lets consider the reduced-rank approximation

\hat{Z} = Z_{(j_{1}, j_{2})}

that is built retaining only the

j_{1}

-th and the

j_{1}

-th singular values of

Z

. Then, for matrices

{\hat{Z}}_{1}

and

{\hat{Z}}_{2}

, the following expression can be considered

\begin{matrix} {\hat{Z}}_{1} & = & U_{(j_{1}, j_{2})}, \end{matrix}

(A5)

\begin{matrix} {\hat{Z}}_{2}^{T} & = & Δ_{(j_{1}, j_{2})} V_{(j_{1}, j_{2})}^{T}, \end{matrix}

(A6)

where

U_{(j_{1}, j_{2})}

and

V_{(j_{1}, j_{2})}

are matrices built on the

j_{1}

-th and the

j_{1}

-th rows of and

U

and

V

, and

Δ_{(j_{1}, j_{2})}

is the diagonal matrix built on the

j_{1}

-th and the

j_{1}

-th diagonal elements of Δ. We recall that, if

j_{1} = 1

and

j_{2} = 2

, we obtain the reduced-rank approximation

\hat{Z} = Z_{q = 2}

based on TSVD.

Consider the bidimensional Cartesian

C^{(j_{1}, j_{2})}

system whose axes coincide with the two orthormal vectors represented by the columns of the matrix

V_{(j_{1}, j_{2})}

. Each individuals

z_{i}

is represented as a point

P_{i}^{I (C 1, C 2)} = (I C 1_{i}, I C 2_{i})

whose coordinates are

\begin{matrix} I C 1_{i} & = & b_{i j_{1}} \end{matrix}

(A7)

\begin{matrix} I C 2_{i} & = & b_{i j_{2}}, \end{matrix}

(A8)

where the vectors

b_{j}

are defined as

b_{j} = Z {(V_{(j_{1}, j_{2})})}_{j}, j = 1, 2 .

Indeed, each variable

z_{j}

is represented as a vector

{(z_{j})}^{∠_{j_{1}, j_{2}}}

, starting at the origin of the Cartesian system, and ending into a point

P_{j}^{V (C 1, C 2)} = (V C 1_{j}, V C 2_{j})

whose coordinates are

\begin{matrix} V C 1_{j} & = & c_{1 j}, \end{matrix}

(A9)

\begin{matrix} V C 2_{j} & = & c_{2 j}, \end{matrix}

(A10)

where the vectors

c_{j}

are defined as

c_{j} = \frac{1}{\sqrt{n}} {(Δ_{(j_{1}, j_{2})} V_{(j_{1}, j_{2})}^{T})}_{j}, j = 1, 2

The squared lenght in L2 norm of the vector

{(z_{j})}^{∠_{j_{1}, j_{2}}}

is defined as

c o s 2 (z_{j}, V_{(j_{1}, j_{2})})

of the projection of the variable

z_{j}

in the bidimensional Cartesian system whose axes coincide with the two orthormal vectors

{(V_{(j_{1}, j_{2})})}_{j = 1, 2}

, i.e.,

c o s 2 (z_{j}, V_{(j_{1}, j_{2})}) = {∥{(z_{j})}^{∠_{j_{1}, j_{2}}}∥}_{2}^{2} = c_{1 j}^{2} + c_{2 j}^{2} .

(A11)

c o s 2 (z_{j}, {(V_{(j_{1}, j_{2})})}_{j = 1, 2})

shows the “representation accuracy” of variable

z_{j}

in the space spawn by the components

{(V_{(j_{1}, j_{2})})}_{j = 1, 2}

.

Appendix A.3. Building Predictive Model from Data by Neural Networks

Theorem A1

(DNN as Universal Approximator). Let the space

Θ

of the tuple

θ

of the parameters

θ = (L, {\{N_{l}\}}_{l = 2, \dots, L}, {\{w_{i^{l} j^{l}}^{(l)}\}}_{\begin{matrix} l = 2, \dots, L \\ i^{l} = 1, \dots, N_{l} \\ j^{l} = 1, \dots, N_{l - 1} + 1 \end{matrix}}, σ)

defining the DNN

N_{θ} (n, k)

acting on input

x \in ℜ^{n}

and output

y \in ℜ^{k}

. Let

F (n, k)

be the functions set

F (n, k) = \{f^{N_{θ} (n, k)} : θ \in Θ\}

of all the functions

f^{N_{θ} (n, k)}

whose form is defined as in Equation (10).

If σ is a measurable function, then

F (n, k)

is fundamental in the space

C (n, k)

of all the continuous functions

g : ℜ^{n} \leftarrow ℜ^{k}

if and only if σ is not a polynomial.

Theorem A2

(Curse of Dimensionality Theorem). Let

F^{L = 3} (n, k)

be the subset of

F (n, k)

defined as

F^{L = 3} (n, k) = \{f^{N_{θ} (n, k)}, θ \in Θ : L = 3\} .

If σ is a measurable function and is not a polynomial, then

F^{L = 3} (n, k)

is fundamental in the space

C (n, k)

, and for every

ϵ > 0

and every function

g \in C (n, k)

, there is a function

f^{N_{θ} (n, k)} \in F^{L = 3} (n, k)

such that

∥g - f^{N_{θ} (n, k)}∥ < ϵ,

(A12)

where the considered norm

∥\cdot∥

is defined as

∥g (x)∥ = s u p_{x \in ℜ^{n}} |g (x)| .

Moreover, lets

W_{m}^{n}

be the set of all the functions

g (x)

in

C (n, k)

with continuous partial derivatives of orders up to

m : 1 \leq m < \infty

, such that

∥g (x)∥ + \sum_{1 \leq {|k|}_{1} \leq m} ∥D^{k} g (x)∥ \leq 1,

where

D^{k}

denotes the partial derivative indicated by the multi-integer

k \geq 1

, and

{|k|}_{1}

is the sum of the components of

k

. If σ is infinitely differentiable, for every

g \in W_{m}^{n}

, the complexity N of network that provide accuracy at least ϵ to g is

N = O (ϵ^{- \frac{n}{m}}),

(A13)

and is the best possible.

Algorithm A2 The Gradient descent Algorithm.

1:: procedure GradientDescendent( $C (α)$ , $α_{0}$ , $α_{B e s t}$ , $m a x i t$ )
2:: Input: C, $α_{0}$ , $m a x i t$
3:: Output: $α_{B e s t}$
4:: $n \leftarrow 1$
5:: repeat
6:: $α_{n} \leftarrow α_{n - 1} - γ_{n} \nabla_{α} C (α_{n - 1})$
7:: $n \leftarrow n + 1$
8:: until $n > m a x i t$
9:: $α_{B e s t} \leftarrow α_{n}$
10:: end procedure

Appendix B. Mathematical Tools—Further Results

Appendix B.1. Random Forest Parameter Relevance

This appendix describes some very preliminary results about the use of methods other than those based on NNs to asses parameters relevance in our case study.

These results were obtained using the procedures of the RandomForest package of R. In Figure A1 the R, code used is listed.

In Figure A2, instead, it is represented such relevance measured by the Mean Decrease Accuracy metrix. For details about such metrix and the RandomForest R package refer to Liaw et al. [92]. Presented results confirm those presented in the main part of this study.

Figure A1. R code of the Random Forest analysis.

Figure A2. Parameters relevance in identifying sick status evaluated by Random Forest Training process.

References

Feldman, E.L.; Goutman, S.A.; Petri, S.; Mazzini, L.; Savelieff, M.G.; Shaw, P.J.; Sobue, G. Amyotrophic lateral sclerosis. Lancet 2022, 400, 1363–1380. [Google Scholar] [CrossRef] [PubMed]
Van Es, M.A.; Hardiman, O.; Chio, A.; Al-Chalabi, A.; Pasterkamp, R.J.; Veldink, J.H.; van den Berg, L.H. Amyotrophic lateral sclerosis. Lancet 2017, 390, 2084–2098. [Google Scholar] [CrossRef]
Piccione, E.A.; Sletten, D.M.; Staff, N.P.; Low, P.A. Autonomic system and amyotrophic lateral sclerosis. Muscle Nerve 2015, 51, 676–679. [Google Scholar] [CrossRef] [PubMed]
Pugdahl, K.; Fuglsang-Frederiksen, A.; de Carvalho, M.; Johnsen, B.; Fawcett, P.R.W.; Labarre-Vila, A.; Liguori, R.; Nix, W.A.; Schofield, I.S. Generalised sensory system abnormalities in amyotrophic lateral sclerosis: A European multicentre study. J. Neurol. Neurosurg. Psychiatry 2007, 78, 746–749. [Google Scholar] [CrossRef] [PubMed]
Nolano, M.; Provitera, V.; Manganelli, F.; Iodice, R.; Stancanelli, A.; Caporaso, G.; Saltalamacchia, A.; Califano, F.; Lanzillo, B.; Picillo, M.; et al. Loss of cutaneous large and small fibers in naive and L-dopa-treated PD patients. Neurology 2017, 89, 776–784. [Google Scholar] [CrossRef]
Truini, A.; Vergari, M.; Biasiotta, A.; La Cesa, S.; Gabriele, M.; Di Stefano, G.; Cambieri, C.; Cruccu, G.; Inghilleri, M.; Priori, A. Transcutaneous spinal direct current stimulation inhibits nociceptive spinal pathway conduction and increases pain tolerance in humans. Eur. J. Pain 2011, 15, 1023–1027. [Google Scholar] [CrossRef]
Liu, Z.; Cheng, X.; Zhong, S.; Zhang, X.; Liu, C.; Liu, F.; Zhao, C. Peripheral and Central Nervous System Immune Response Crosstalk in Amyotrophic Lateral Sclerosis. Front. Neurosci. 2020, 14, 575. [Google Scholar] [CrossRef]
Voet, S.; Srinivasan, S.; Lamkanfi, M.; van Loo, G. Inflammasomes in neuroinflammatory and neurodegenerative diseases. EMBO Mol. Med. 2019, 11, EMMM201810248. [Google Scholar] [CrossRef]
Johann, S.; Heitzer, M.; Kanagaratnam, M.; Goswami, A.; Rizo, T.; Weis, J.; Troost, D.; Beyer, C. NLRP3 inflammasome is expressed by astrocytes in the SOD1 mouse model of ALS and in human sporadic ALS patients. Glia 2015, 63, 2260–2273. [Google Scholar] [CrossRef]
Debye, B.; Schmülling, L.; Zhou, L.; Rune, G.; Beyer, C.; Johann, S. Neurodegeneration and NLRP3 inflammasome expression in the anterior thalamus of SOD1(G93A) ALS mice. Brain Pathol. 2018, 28, 14–27. [Google Scholar] [CrossRef]
Zhao, W.; Beers, D.R.; Bell, S.; Wang, J.; Wen, S.; Baloh, R.H.; Appel, S.H. TDP-43 activates microglia through NF-κB and NLRP3 inflammasome. Exp. Neurol. 2015, 273, 24–35. [Google Scholar] [CrossRef] [PubMed]
Nowowiejska, J.; Baran, A.; Pryczynicz, A.; Hermanowicz, J.M.; Sieklucka, B.; Pawlak, D.; Flisiak, I. Gasdermin B (GSDMB) in psoriatic patients—A preliminary comprehensive study on human serum, urine and skin. Front. Mol. Biosci. 2024, 11, 1382069. [Google Scholar] [CrossRef] [PubMed]
Donoho, D. 50 Years of Data Science. J. Comput. Graph. Stat. 2017, 26, 745–766. [Google Scholar] [CrossRef]
Donoho, D. Data Science at the Singularity. Harv. Data Sci. Rev. 2024, 6, 1–51. Available online: https://hdsr.mitpress.mit.edu/pub/g9mau4m0 (accessed on 26 January 2026). [CrossRef]
Donoho, D.L. High-dimensional data analysis: The curses and blessings of dimensionality. AMS Math Challenges Lect. 2000, 1, 32. Available online: https://www.researchgate.net/publication/220049061_High-Dimensional_Data_Analysis_The_Curses_and_Blessings_of_Dimensionality (accessed on 26 January 2026).
Haukoos, J.S.; Newgard, C.D. Advanced Statistics: Missing Data in Clinical Research—Part 1: An Introduction and Conceptual Framework. Acad. Emerg. Med. 2007, 14, 662–668. [Google Scholar] [CrossRef]
Haukoos, J.S.; Newgard, C.D. Advanced Statistics: Missing Data in Clinical Research—Part 2: Multiple Imputation. Acad. Emerg. Med. 2007, 14, 669–678. [Google Scholar] [CrossRef]
Newgard, C.D.; Lewis, R.J. Missing Data: How to Best Account for What Is Not Known. JAMA 2015, 314, 940–941. [Google Scholar] [CrossRef]
Li, P.; Stuart, E.A.; Allison, D.B. Multiple Imputation: A Flexible Tool for Handling Missing Data. JAMA 2015, 314, 1966–1967. [Google Scholar] [CrossRef]
Austin, P.C.; White, I.R.; Lee, D.S.; van Buuren, S. Missing Data in Clinical Research: A Tutorial on Multiple Imputation. Can. J. Cardiol. 2021, 37, 1322–1331. [Google Scholar] [CrossRef]
Weeraratne, N.; Hunt, L.; Kurz, J. Optimizing PCA for Health and Care Research: A Reliable Approach to Component Selection. arXiv 2025, arXiv:2503.24248. [Google Scholar] [CrossRef]
Bradley, W.; Kim, J.; Kilwein, Z.; Blakely, L.; Eydenberg, M.; Jalvin, J.; Laird, C.; Boukouvala, F. Perspectives on the integration between first-principles and data-driven modeling. Comput. Chem. Eng. 2022, 166, 107898. [Google Scholar] [CrossRef]
Carracciuolo, L.; D’Amora, U. Mathematical Tools for Simulation of 3D Bioprinting Processes on High-Performance Computing Resources: The State of the Art. Appl. Sci. 2024, 14, 6110. [Google Scholar] [CrossRef]
Zhang, Z.; Beck, M.W.; Winkler, D.A.; Huang, B.; Sibanda, W.; Goyal, H.; on behalf of AME Big-Data Clinical Trial Collaborative Group. Opening the black box of neural networks: Methods for interpreting neural network models in clinical applications. Ann. Transl. Med. 2018, 6, 216. [Google Scholar] [CrossRef]
van Buuren, S.; Groothuis-Oudshoorn, K. MICE: Multivariate Imputation by Chained Equations in R. J. Stat. Softw. 2011, 45, 1–67. [Google Scholar] [CrossRef]
Azur, M.J.; Stuart, E.A.; Frangakis, C.; Leaf, P.J. Multiple imputation by chained equations: What is it and how does it work? Int. J. Methods Psychiatr. Res. 2011, 20, 40–49. [Google Scholar] [CrossRef]
Rubin, D.B. Multiple Imputation for Nonresponse in Surveys; Wiley Series in Probability and Statistics; Wiley: Hoboken, NJ, USA, 1987; p. 258. [Google Scholar] [CrossRef]
Little, R.J.A. Missing-Data Adjustments in Large Surveys. J. Bus. Econ. Stat. 1988, 6, 287–296. [Google Scholar] [CrossRef]
Lee, S. Mastering Predictive Mean Matching for Accurate Data Imputation. 2025. Available online: https://www.numberanalytics.com/blog/predictive-mean-matching-data-imputation-guide (accessed on 26 January 2026).
Jolliffe, I.T.; Cadima, J. Principal component analysis: A review and recent developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2016, 374, 20150202. [Google Scholar] [CrossRef]
Abdi, H.; Williams, L.J. Principal component analysis. WIREs Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
Aluja, T.; Morineau, A.; Sanchez, G. PCA4DS: Principal Component Analysis for Data Science. 2018. Available online: https://pca4ds.github.io (accessed on 26 January 2026).
Jolliffe, I.T. Principal Component Analysis; Springer Series in Statistics; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
Gabriel, K.R. The Biplot Graphic Display of Matrices with Application to Principal Component Analysis. Biometrika 1971, 58, 453–467. [Google Scholar] [CrossRef]
Friendly, M.; Monette, G.; Fox, J. Elliptical Insights: Understanding Statistical Methods through Elliptical Geometry. Stat. Sci. 2013, 28, 1–39. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Nielsen, M. Neural Networks and Deep Learning; Determination Press: San Francisco, CA, USA, 2016; Available online: http://neuralnetworksanddeeplearning.com/index.html (accessed on 26 January 2026).
Augustine, M.T. A Survey on Universal Approximation Theorems. arXiv 2024, arXiv:2407.12895. [Google Scholar] [CrossRef]
Mhaskar, H.N. Approximation properties of a multilayered feedforward artificial neural network. Adv. Comput. Math. 1993, 1, 61–80. [Google Scholar] [CrossRef]
Chui, C.K.; Li, X.; Mhaskar, H.N. Limitations of the approximation capabilities of neural networks with one hidden layer. Adv. Comput. Math. 1996, 5, 233–243. [Google Scholar] [CrossRef]
Pinkus, A. Approximation theory of the MLP model in neural networks. Acta Numer. 1999, 8, 143–195. [Google Scholar] [CrossRef]
Poggio, T.; Mhaskar, H.; Rosasco, L.; Miranda, B.; Liao, Q. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review. Int. J. Autom. Comput. 2017, 14, 503–519. [Google Scholar] [CrossRef]
Leshno, M.; Lin, V.Y.; Pinkus, A.; Schocken, S. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw. 1993, 6, 861–867. [Google Scholar] [CrossRef]
Bubeck, S. Convex Optimization: Algorithms and Complexity. Found. Trends Mach. Learn. 2015, 8, 231–357. [Google Scholar] [CrossRef]
Garrigos, G.; Gower, R.M. Handbook of Convergence Theorems for (Stochastic) Gradient Methods. arXiv 2024, arXiv:2301.11235. [Google Scholar] [CrossRef]
Hochreiter, S.; Younger, A.S.; Conwell, P.R. Learning to Learn Using Gradient Descent. In Artificial Neural Networks—ICANN 2001; Dorffner, G., Bischof, H., Hornik, K., Eds.; Springer: Berlin/Heidelberg, Germany, 2001; pp. 87–94. [Google Scholar]
Amari, S.I. Backpropagation and stochastic gradient descent method. Neurocomputing 1993, 5, 185–196. [Google Scholar] [CrossRef]
Poliak, B. Introduction to Optimization; Translations series in mathematics and engineering; Optimization Software, Publications Division: New York, NY, USA, 1987. [Google Scholar]
Curry, H.B. The method of steepest descent for non-linear minimization problems. Quart. Appl. Math. 1944, 2, 258–261. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Pentos, K. The methods of extracting the contribution of variables in artificial neural network models—Comparison of inherent instability. Comput. Electron. Agric. 2016, 127, 141–146. [Google Scholar] [CrossRef]
Gevrey, M.; Dimopoulos, I.; Lek, S. Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecol. Model. 2003, 160, 249–264. [Google Scholar] [CrossRef]
Olden, J.D.; Jackson, D.A. Illuminating the “black box”: A randomization approach for understanding variable contributions in artificial neural networks. Ecol. Model. 2002, 154, 135–150. [Google Scholar] [CrossRef]
Brownlee, J. Better Deep Learning: Train Faster, Reduce Overfitting, and Make Better Predictions; Machine Learning Mastery: San Juan, PR, USA, 2018. [Google Scholar]
Oxford English Dictionary. Overfitting Noun. 2024. Available online: https://www.oed.com/dictionary/overfitting_n?tl=true (accessed on 26 January 2026).
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2013; ISBN 3-900051-07-0. Available online: http://www.R-project.org/ (accessed on 26 January 2026).
MICE Package Reference Manual. mice.impute.pmm Imputation by Predictive Mean Matching. 2025. Available online: https://cran.r-project.org/web/packages/mice/refman/mice.html#mice.impute.pmm (accessed on 26 January 2026).
MICE Package Developer. find.collinear Internal MICE Procedure. 2024. Available online: https://github.com/amices/mice/blob/7df3487a56cd49dffc9192a8ebe9697bfa7258ca/R/internal.R#L84-L91 (accessed on 26 January 2026).
Rosenblatt, M. Remarks on Some Nonparametric Estimates of a Density Function. Ann. Math. Stat. 1956, 27, 832–837. [Google Scholar] [CrossRef]
Wickham, H. ggplot2: Elegant Graphics for Data Analysis; Springer: New York, NY, USA, 2016; Available online: https://ggplot2-book.org/ (accessed on 26 January 2026).
Choudhury, A.D.; Banerjee, R.; Kimbahune, S.; Pal, A. Chapter 3—Sensor signal analytics. In New Frontiers of Cardiovascular Screening Using Unobtrusive Sensors, AI, and IoT; Choudhury, A.D., Banerjee, R., Kimbahune, S., Pal, A., Eds.; Academic Press: Cambridge, MA, USA, 2022; pp. 61–89. [Google Scholar] [CrossRef]
Olden, J.D.; Joy, M.K.; Death, R.G. An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data. Ecol. Model. 2004, 178, 389–397. [Google Scholar] [CrossRef]
Bergmeir, C.; Benítez, J.M. Neural Networks in R Using the Stuttgart Neural Network Simulator: RSNNS. J. Stat. Softw. 2012, 46, 1–26. [Google Scholar] [CrossRef]
McLachlan, G.; Do, K.; Ambroise, C. Analyzing Microarray Gene Expression Data; Wiley Series in Probability and Statistics; Wiley: Hoboken, NJ, USA, 2004. [Google Scholar] [CrossRef]
Brier, G.W. Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 1950, 78, 1–3. [Google Scholar] [CrossRef]
Gupta, A.; Lam, M. The weight decay backpropagation for generalizations with missing values. Ann. Oper. Res. 1998, 78, 165–187. [Google Scholar] [CrossRef]
Beck, M.W. NeuralNetTools: Visualization and Analysis Tools for Neural Networks. J. Stat. Softw. 2018, 85, 1–20. [Google Scholar] [CrossRef] [PubMed]
Xu, W.; Huang, Y.; Zhou, R. NLRP3 inflammasome in neuroinflammation and central nervous system diseases. Cell. Mol. Immunol. 2025, 22, 341–355. [Google Scholar] [CrossRef] [PubMed]
Peters, S.; Zitzelsperger, E.; Kuespert, S.; Iberl, S.; Heydn, R.; Johannesen, S.; Petri, S.; Aigner, L.; Thal, D.R.; Hermann, A.; et al. The TGF-β System As a Potential Pathogenic Player in Disease Modulation of Amyotrophic Lateral Sclerosis. Front. Neurol. 2017, 8, 669. [Google Scholar] [CrossRef]
Gugliandolo, A.; Giacoppo, S.; Bramanti, P.; Mazzon, E. NLRP3 Inflammasome Activation in a Transgenic Amyotrophic Lateral Sclerosis Model. Inflammation 2018, 41, 93–103. [Google Scholar] [CrossRef]
Arseni, D.; Hasegawa, M.; Murzin, A.G.; Kametani, F.; Arai, M.; Yoshida, M.; Ryskeldi-Falcon, B. Structure of pathological TDP-43 filaments from ALS with FTLD. Nature 2022, 601, 139–143. [Google Scholar] [CrossRef]
Grollemund, V.; Pradat, P.-F.; Querin, G.; Delbot, F.; Le Chat, G.; Pradat-Peyre, J.-F.; Bede, P. Machine Learning in Amyotrophic Lateral Sclerosis: Achievements, Pitfalls, and Future Directions. Front. Neurosci. 2019, 13, 135. [Google Scholar] [CrossRef]
Chia, R.; Moaddel, R.; Kwan, J.Y.; Rasheed, M.; Ruffo, P.; Landeck, N.; Reho, P.; Vasta, R.; Calvo, A.; Moglia, C.; et al. A plasma proteomics-based candidate biomarker panel predictive of amyotrophic lateral sclerosis. Nat. Med. 2025, 31, 3440–3450. [Google Scholar] [CrossRef]
Singh, M.; Fuenmayor, E.; Hinchy, E.P.; Qiao, Y.; Murray, N.; Devine, D. Digital Twin: Origin to Future. Appl. Syst. Innov. 2021, 4, 36. [Google Scholar] [CrossRef]
Kamel Boulos, M.N.; Zhang, P. Digital Twins: From Personalised Medicine to Precision Public Health. J. Pers. Med. 2021, 11, 745. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The Graph Neural Network Model. IEEE Trans. Neural Networks 2009, 20, 61–80. [Google Scholar] [CrossRef] [PubMed]
Tealab, A. Time series forecasting using artificial neural networks methodologies: A systematic review. Future Comput. Inform. J. 2018, 3, 334–340. [Google Scholar] [CrossRef]
Theng, D.; Bhoyar, K.K. Feature selection techniques for machine learning: A survey of more than two decades of research. Knowl. Inf. Syst. 2023, 66, 1575–1637. [Google Scholar] [CrossRef]
Rad, H.; Su, Z.; Trinh, A.; Newton, M.A.; Shamsani, J.; Karim, A.; Sattar, A.; NYGC ALS Consortium. Amyotrophic lateral sclerosis diagnosis using machine learning and multi-omic data integration. Heliyon 2024, 10, e38583. [Google Scholar] [CrossRef]
Wang, T.; Shao, W.; Huang, Z.; Tang, H.; Zhang, J.; Ding, Z.; Huang, K. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat. Commun. 2021, 12, 3445. [Google Scholar] [CrossRef]
Nguyen, M.; He, T.; An, L.; Alexander, D.C.; Feng, J.; Yeo, B.T.T. Predicting Alzheimer’s disease progression using deep recurrent neural networks. NeuroImage 2020, 222, 117203. [Google Scholar] [CrossRef]
Carracciuolo, L.; Mele, V. New Strategies Based on Hierarchical Matrices for Matrix Polynomial Evaluation in Exascale Computing Era. Mathematics 2025, 13, 1378. [Google Scholar] [CrossRef]
Carracciuolo, L.; Lapegna, M. Implementation of a non-linear solver on heterogeneous architectures. Concurr. Comput. Pract. Exp. 2018, 30, e4903. [Google Scholar] [CrossRef]
Mele, V.; Constantinescu, E.M.; Carracciuolo, L.; D’Amore, L. A PETSc parallel-in-time solver based on MGRIT algorithm. Concurr. Comput. Pract. Exp. 2018, 30, e4928. [Google Scholar] [CrossRef]
Carracciuolo, L.; Mele, V.; Szustak, L. About the granularity portability of block-based Krylov methods in heterogeneous computing environments. Concurr. Comput. Pract. Exp. 2021, 33, e6008. [Google Scholar] [CrossRef]
Carracciuolo, L.; Casaburi, D.; D’Amore, L.; D’Avino, G.; Maffettone, P.; Murli, A. Computational simulations of 3D large-scale time-dependent viscoelastic flows in high performance computing environment. J. Non-Newton. Fluid Mech. 2011, 166, 1382–1395. [Google Scholar] [CrossRef]
Carracciuolo, L.; D’Amore, L.; Murli, A. Towards a parallel component for imaging in PETSc programming environment: A case study in 3-D echocardiography. Parallel Comput. 2006, 32, 67–83. [Google Scholar] [CrossRef]
Murli, A.; D’Amore, L.; Carracciuolo, L.; Ceccarelli, M.; Antonelli, L. High performance edge-preserving regularization in 3D SPECT imaging. Parallel Comput. 2008, 34, 115–132. [Google Scholar] [CrossRef]
D’Amore, L.; Constantinescu, E.; Carracciuolo, L. A Scalable Space-Time Domain Decomposition Approach for Solving Large Scale Nonlinear Regularized Inverse Ill Posed Problems in 4D Variational Data Assimilation. J. Sci. Comput. 2022, 91, 59. [Google Scholar] [CrossRef]
Longo, L.; Brcic, M.; Cabitza, F.; Choi, J.; Confalonieri, R.; Del Ser, J.; Guidotti, R.; Hayashi, Y.; Herrera, F.; Holzinger, A.; et al. Explainable Artificial Intelligence (XAI) 2.0: A manifesto of open challenges and interdisciplinary research directions. Inf. Fusion 2024, 106, 102301. [Google Scholar] [CrossRef]
Liaw, A.; Wiener, M. Classification and Regression by randomForest. R News 2002, 2, 18–22. Available online: https://journal.r-project.org/articles/RN-2002-022/RN-2002-022.pdf (accessed on 26 January 2026).

Figure 1. Cloud of points in the original p-dimensional space (a) and the cloud of transformed points in the new Factorial Space (b).

d_{X} (i, i^{'})

and

d_{F} (i, i^{'})

denote, respectively, the distance, between two different points i, and

i^{'}

, in the original space and in the Factorial Space. Credits: Aluja et al. [32].

Figure 1. Cloud of points in the original p-dimensional space (a) and the cloud of transformed points in the new Factorial Space (b).

d_{X} (i, i^{'})

and

d_{F} (i, i^{'})

denote, respectively, the distance, between two different points i, and

i^{'}

, in the original space and in the Factorial Space. Credits: Aluja et al. [32].

Figure 2. An example of a DNN which is composed of

L = 3

layers. The numbers

N_{1}

,

N_{2}

and

N_{3}

of “neurons” at each of the three layers are 3, 4 and 2, respectively. Credits: [37].

Figure 2. An example of a DNN which is composed of

L = 3

layers. The numbers

N_{1}

,

N_{2}

and

N_{3}

of “neurons” at each of the three layers are 3, 4 and 2, respectively. Credits: [37].

Figure 5. Kernel density estimates for the marginal distribution of the

m = 30

imputed data (red lines) and available data (blue line) for each of the 8 considered markers.

Figure 5. Kernel density estimates for the marginal distribution of the

m = 30

imputed data (red lines) and available data (blue line) for each of the 8 considered markers.

Figure 6. R code of the PCA phase.

Figure 7. Proportion of variance explained by each principal component with the status parameter included (a) and excluded (b).

Figure 8. PCA cos2 in the space of a couple of Principal Components: (a) Components 1–2, (b) Components 1–3, (c) Components 1–4, (d) Components 1–5, (e) Components 1–6, (f) Components 1–7, (g) Components 1–8, (h) Components 1–9. Status parameter included.

Figure 9. PCA Scattered BiPlot of both parameter cos2 and clusterd observations in the space of a couple of Principal Components: (a) Components 1-2, (b) Components 1–3, (c) Components 1–4, (d) Components 1–5, (e) Components 1–6, (f) Components 1–7, (g) Components 1–8, (h) Components 1–9. Status parameter included.

Figure 10. PCA scattered BiPlot of both parameter cos2 and clustered observations in the space of a couple of principal components: (a) components 1–2, (b) components 1–3, (c) components 1–4, (d) components 1–5, and (e) components 1–6. Status parameter excluded. Just the first six and more significant PCs are considered.

Figure 11. R code of the NN-based analysis.

Figure 12. Trend of shallow neural network complexity N providing accuracy of at least

ε

.

N_{P a r} = 8

.

Figure 12. Trend of shallow neural network complexity N providing accuracy of at least

ε

.

N_{P a r} = 8

.

Figure 13. Convergence error plot for the test data. (a)

N = 100

, (b)

N = 300

, (c)

N = 500

, (d)

N = 700

, (e)

N = 900

, (f)

N = 1100

.

Figure 13. Convergence error plot for the test data. (a)

N = 100

, (b)

N = 300

, (c)

N = 500

, (d)

N = 700

, (e)

N = 900

, (f)

N = 1100

.

Figure 14. Regression error plot for the test data. (a)

N = 100

, (b)

N = 300

, (c)

N = 500

, (d)

N = 700

, (e)

N = 900

, (f)

N = 1100

.

Figure 14. Regression error plot for the test data. (a)

N = 100

, (b)

N = 300

, (c)

N = 500

, (d)

N = 700

, (e)

N = 900

, (f)

N = 1100

.

Figure 15. R code of the cross-validation process used to evaluate mean values for Confusion Matrix Efficiency.

Figure 16. Mean confusion matrices’ efficiency.

Figure 17. Convergence error plot for the test data where learning function is based on “weight decay”. (a)

N = 100

, (b)

N = 300

, (c)

N = 500

, (d)

N = 700

, (e)

N = 900

, (f)

N = 1100

.

Figure 17. Convergence error plot for the test data where learning function is based on “weight decay”. (a)

N = 100

, (b)

N = 300

, (c)

N = 500

, (d)

N = 700

, (e)

N = 900

, (f)

N = 1100

.

Figure 18. Parameters’ relevance in identifying sick status evaluated by an Olden-type analysis. (a)

N = 100

, (b)

N = 300

, (c)

N = 500

, (d)

N = 700

, (e)

N = 900

, (f)

N = 1100

.

Figure 18. Parameters’ relevance in identifying sick status evaluated by an Olden-type analysis. (a)

N = 100

, (b)

N = 300

, (c)

N = 500

, (d)

N = 700

, (e)

N = 900

, (f)

N = 1100

.

Table 1. Results from biochemical assays performed on skin biopsies and serum from healthy and ALS patients. Results are expressed as mean ± standard deviation.

Patients	IL-18 Skin (pg/mL)	IL-18 Serum (pg/mL)	IL-1 $β$ Skin (pg/mL)	IL-1 $β$ Serum (pg/mL)	NLRP-3/ $β$ -Actin Ratio (Skin)	TGF- $β$ Skin (pg/mL)	TGF- $β$ Serum (pg/mL)
Healthy	12.2	340.7	24.9	25.6	1.4	1224.8	114.9
	± 5.2	± 130.9	± 2.1	± 1.3	± 0.5	± 490.3	± 62.5
ALS	131.8	701.7	27.5	31.9	15.9	2469.4	138.2
	± 62.3	± 380.2	± 9.3	± 8.5	± 10.1	± 1432.8	± 100.6
Patients	IL-10 Skin (pg/mL)	IL-10 Serum (pg/mL)	IL-6 Skin (pg/mL)	IL-6 Serum (pg/mL)	NEK7/ $β$ -Actin Ratio	p-TDP43/GAPDH Ratio	Sod Serum (Inhibition Percentage)
Healthy	6.4	23.8	22.0	5.1	0.9	1.2	49.2
	± 2.0	± 2.1	± 1.3	± 1.4	± 0.1	± 0.3	± 28.7
ALS	7.2	24.2	27.5	42.2	1.0	3.0	77.5
	± 2.5	± 5.5	± 7.5	± 33.1	± 0.1	± 0.8	± 14.2

Table 2. List of variables associated with biochemical markers.

Variable Name	Number Available Data	Percentage Available Data	Percentage Missing Data	Markers Name
`IL18Skin`	28	38	62	IL-18 Skin (pg/mL)
`IL18Serum`	52	71	29	IL-18 Serum (pg/mL)
`IL1BSkin`	8	10	90	IL-1 $β$ Skin (pg/mL)
`IL1BSerum`	8	10	90	IL-1 $β$ Serum (pg/mL)
`NLRP3bActinSkin`	26	35	65	NLRP-3/ $β$ -actin ratio (Skin)
`TGFBSkin`	22	30	70	TGF- $β$ Skin (pg/mL)
`TGFBSerum`	23	31	69	TGF- $β$ Serum (pg/mL)
`IL10Skin`	11	15	85	IL-10 Skin (pg/mL)
`IL10Serum`	21	28	72	IL-10 Serum (pg/mL)
`IL6Skin`	26	35	65	IL-6 Skin (pg/mL)
`IL6Serum`	31	42	58	IL-6 Serum (pg/mL)
`NEK7bActin`	11	15	85	NEK7/ $β$ -actin ratio
`pTDP43GAPDH`	13	17	83	p-TDP43/GAPDH ratio
`SodSerum`	26	35	65	Sod serum (% of inhibition)
Tot	306	29	71

Table 3. Details of used software. All URLs were accessed on 26 January 2026.

Name	Version	Reference Manual
`R` Environment	4.1.3	https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf
`MICE` Package	3.18.0	https://cran.r-project.org/web/packages/mice/mice.pdf
`stats` Package	4.1.3	https://cran.r-project.org/doc/manuals/r-release/fullrefman.pdf
`RSNNS` Package	0.4-17	https://cran.r-project.org/web/packages/RSNNS/RSNNS.pdf
`NeuralNetTools` Package	1.5.3	https://cran.r-project.org/web/packages/NeuralNetTools/NeuralNetTools.pdf
`ggplot2` Package	3.5.2	https://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf
`DescTools` Package	0.99.50	https://cran.r-project.org/web/packages/DescTools/DescTools.pdf

Table 4. Hardware and software specs of computing resources used for tests.

Processor type	Intel Xeon Gold 6240R CPU@2.40 GHz
Number of cores	48
OS	Linux CentOS 7

Table 5. Molecular markers’ list and parameter associations.

Marker Name	Parameter ID
`IL18Skin`	1
`IL18Serum`	2
`NLRP3bActinSkin`	3
`TGFBSkin`	4
`TGFBSerum`	5
`IL10Skin`	6
`IL6Serum`	7
`SodSerum`	8

Table 6. Mean Brier Score trends as function of NN complexity N.

N	Mean Brier Score
100	0.0100457
300	0.0100457
500	0.0109589
700	0.0095890
900	0.0146119
1100	0.0127854

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Carracciuolo, L.; D’Amora, U.; Dubbioso, R.; Fasolino, I. Mathematical Approaches for the Characterization and Analysis of Molecular Markers in the Study of the Progression and Severity of Amyotrophic Lateral Sclerosis. AppliedMath 2026, 6, 22. https://doi.org/10.3390/appliedmath6020022

AMA Style

Carracciuolo L, D’Amora U, Dubbioso R, Fasolino I. Mathematical Approaches for the Characterization and Analysis of Molecular Markers in the Study of the Progression and Severity of Amyotrophic Lateral Sclerosis. AppliedMath. 2026; 6(2):22. https://doi.org/10.3390/appliedmath6020022

Chicago/Turabian Style

Carracciuolo, Luisa, Ugo D’Amora, Raffaele Dubbioso, and Ines Fasolino. 2026. "Mathematical Approaches for the Characterization and Analysis of Molecular Markers in the Study of the Progression and Severity of Amyotrophic Lateral Sclerosis" AppliedMath 6, no. 2: 22. https://doi.org/10.3390/appliedmath6020022

APA Style

Carracciuolo, L., D’Amora, U., Dubbioso, R., & Fasolino, I. (2026). Mathematical Approaches for the Characterization and Analysis of Molecular Markers in the Study of the Progression and Severity of Amyotrophic Lateral Sclerosis. AppliedMath, 6(2), 22. https://doi.org/10.3390/appliedmath6020022

Article Menu

Mathematical Approaches for the Characterization and Analysis of Molecular Markers in the Study of the Progression and Severity of Amyotrophic Lateral Sclerosis

Abstract

1. Introduction

2. Materials and Methods

2.1. Collection of Skin Biopsies and Serum

2.2. Biochemical Assays

2.3. Mathematical Tools

2.3.1. Handling Incomplete Data

2.3.2. Interpreting Multidimensional Data by Principal Components Analysis

2.3.3. Building Predictive Model from Data by Neural Networks

3. Results

3.1. Biochemical Data Description

3.2. Mathematical Data and Computational Environment Description

3.3. Imputation Phase

3.4. Interpretation Phase

3.5. Provisional Method for Supervised Classification Based on Neural Network

4. Discussion

5. Conclusions and Future Directions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Mathematical Tools—Further Material

Appendix A.1. Handling Incomplete Data

Appendix A.2. Interpretating Multidimensional Data by Principal Components Analysis

Appendix A.3. Building Predictive Model from Data by Neural Networks

Appendix B. Mathematical Tools—Further Results

Appendix B.1. Random Forest Parameter Relevance

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI