Sensitivity Analysis of Artiﬁcial Neural Networks Identifying JWH Synthetic Cannabinoids Built with Alternative Training Strategies and Methods

: This paper presents the alternative training strategies we tested for an Artiﬁcial Neural Network (ANN) designed to detect JWH synthetic cannabinoids. In order to increase the model performance in terms of output sensitivity, we used the Neural Designer data science and machine learning platform combined with the programming language Python . We performed a comparative analysis of several optimization algorithms, error parameters and regularization methods. Finally, we performed a new goodness-of-ﬁt analysis between the testing samples in the data set and the corresponding ANN outputs in order to investigate their sensitivity. The effectiveness of the new methods combined with the optimization algorithms is discussed.


Introduction
Artificial neural networks (ANN) contain a set of parameters that can be adjusted to perform different tasks. These structures have universal approximation properties, which means that they can approximate any function in any size and, generally, up to a desired degree of accuracy [1][2][3][4].
In this article, we present a series of deep learning training and optimization strategies that have been applied to improve the performance of an ANN identifying JWH-syntheticcannabinoid-class membership. In order to increase the system sensitivity, we trained and optimized an initial model on four new architectures. For this purpose, we used the data science and machine learning platform Neural Designer. The best version was implemented in the Python 3.10 programming language for further development and improvement.
The classification efficiencies (output results) obtained for several combinations of algorithms, error parameters and regularization methods were compared. The good fit between the test samples and the corresponding ANN outputs was also analyzed. The effectiveness of the methods was analyzed and is presented in detail.

Materials and Methods
The initial input database of 150 synthetic chemicals included JWH synthetic cannabinoids, others synthetic cannabinoids and other substances of abuse. These designer drugs were divided into three classes referred to as "Class 1-JWH", "Class 2-non-JWH Cannabinoids" and "Class 3-Others". The group of positives contained 50 JWH synthetic cannabinoids, while the group of negatives included 100 compounds, i.e., 50 non-JWH cannabinoids and 50 other substances of forensic interest [5].
We used the quantitative structure-activity relationship (QSAR) method to estimate and predict the pharmacokinetics, drug-likeness and medicinal chemistry friendliness of each input compound by calculating a number of 300 molecular descriptors in terms of their physical and chemical properties, as well as 50 indices characterizing their chemical Inventions 2022, 7, 82 2 of 6 absorption, distribution, metabolism, excretion and toxicity activity (ADMET). The descriptors were selected from three blocks, i.e., topological, 3D-MoRSE (molecule representation of structure based on electron diffraction) and toxicity [6].
Only the first 150 most relevant descriptors were selected and used for the final computational and modelling stage. Hence, the input database was a matrix consisting of 150 samples × 150 variables. The shape, feature and target types of the data set, including the list of the computed and tested input molecular descriptors was presented in a previously published article [7].
The data set was divided into three subsets of samples: training, selection and testing. Hence, we used 90 training samples (60%), 30 selection samples (20%) and 30 testing samples (20%). To discover redundancies between the input variables, we used a correlation matrix, which represents a numerical value between −1 and 1 that expresses the strength of the relationship between two variables [8]. The types of layers the most frequently used in our classification model were the perceptron layer, the probabilistic layer and the scaling and bounding layers. The objective of the selection was to find the best-performing network architecture in terms of system sensitivity.
To avoid underfitting and overfitting, the neuron selection algorithm responsible for finding the optimal number of neurons in the networks was the Growing neurons algorithm [9]. The Neural Designer data science and machine learning platform was used to generate the mathematical expression represented by ANNs in order to export and incorporate them into the programming language Python 3.10 in the so-called production mode.
Our general training strategy consisted of two different concepts, i.e., the loss index and the optimization algorithms. The error was the essential term in the loss expression. The most important errors that we estimated were the sum squared error, the mean squared error, the root mean squared error, the normalized squared error and the Minkowski error. We used the L1 and L2 regularization methods, which involve the sum of the absolute values of all parameters and the square sum of all the parameters in the ANN. The loss index was measured on the data set and could be represented as a hyper-surface with the parameters as coordinates (see Figure 1) [10].
We used the quantitative structure-activity relationship (QSAR) method to estimate and predict the pharmacokinetics, drug-likeness and medicinal chemistry friendliness of each input compound by calculating a number of 300 molecular descriptors in terms of their physical and chemical properties, as well as 50 indices characterizing their chemical absorption, distribution, metabolism, excretion and toxicity activity (ADMET). The descriptors were selected from three blocks, i.e., topological, 3D-MoRSE (molecule representation of structure based on electron diffraction) and toxicity [6].
Only the first 150 most relevant descriptors were selected and used for the final computational and modelling stage. Hence, the input database was a matrix consisting of 150 samples x 150 variables. The shape, feature and target types of the data set, including the list of the computed and tested input molecular descriptors was presented in a previously published article [7].
The data set was divided into three subsets of samples: training, selection and testing. Hence, we used 90 training samples (60%), 30 selection samples (20%) and 30 testing samples (20%). To discover redundancies between the input variables, we used a correlation matrix, which represents a numerical value between −1 and 1 that expresses the strength of the relationship between two variables [8]. The types of layers the most frequently used in our classification model were the perceptron layer, the probabilistic layer and the scaling and bounding layers. The objective of the selection was to find the best-performing network architecture in terms of system sensitivity.
To avoid underfitting and overfitting, the neuron selection algorithm responsible for finding the optimal number of neurons in the networks was the Growing neurons algorithm [9]. The Neural Designer data science and machine learning platform was used to generate the mathematical expression represented by ANNs in order to export and incorporate them into the programming language Python 3.10 in the so-called production mode.
Our general training strategy consisted of two different concepts, i.e., the loss index and the optimization algorithms. The error was the essential term in the loss expression. The most important errors that we estimated were the sum squared error, the mean squared error, the root mean squared error, the normalized squared error and the Minkowski error. We used the L1 and L2 regularization methods, which involve the sum of the absolute values of all parameters and the square sum of all the parameters in the ANN. The loss index was measured on the data set and could be represented as a hyper-surface with the parameters as coordinates (see Figure 1) [10]. In order to train the ANN, we generated a sequence of parameter vectors so that the loss index was reduced at each iteration of the algorithm. In order to train the ANN, we generated a sequence of parameter vectors so that the loss index was reduced at each iteration of the algorithm.

Results
Five different optimization algorithms were used and compared, each with a variety of different calculation and storage requirements: gradient descent [11], conjugate gradient, quasi-Newton method, Levenberg-Marquardt algorithm [12] and adaptative linear momentum [13]. In order to scale the inputs, we calculated the following parameters: the minimum, the maximum, the mean and the standard deviation (see Table 1). The ANN architecture is presented in Figure 2 for version 1. The architectures of the following versions (2, 3 and 4) were also with one hidden layer perceptrons and had the same input and output layers as version 1. On the other hand, their hidden layers contained three (version 2), one (version 3) and six (version 4) nodes respectively.

Results
Five different optimization algorithms were used and compared, each with a variety of different calculation and storage requirements: gradient descent [11], conjugate gradient, quasi-Newton method, Levenberg-Marquardt algorithm [12] and adaptative linear momentum [13].
In order to scale the inputs, we calculated the following parameters: the minimum, the maximum, the mean and the standard deviation (see Table 1). The ANN architecture is presented in Figure 2 for version 1. The architectures of the following versions (2, 3 and 4) were also with one hidden layer perceptrons and had the same input and output layers as version 1. On the other hand, their hidden layers contained three (version 2), one (version 3) and six (version 4) nodes respectively.  We used the adaptive moment estimation (version 1), the Levenberg-Marquardt (version 2), the gradient descent (version 3) and the conjugate gradient optimization algorithms, as well as the growing neuron selection (all versions) method with the L1 (versions 2, 3 and 4) and L2 (version 1) regularization methods.

Discussion
The confusion matrices, calculated for each architecture and 30 testing samples, are presented in Tables 2-5 and the error results are highlighted in Table 6. The activation functions used were the hyperbolic tangent (version 1), the rectified linear (versions 2, 3 and 4) and Softmax (all versions).  We used the adaptive moment estimation (version 1), the Levenberg-Marquardt (version 2), the gradient descent (version 3) and the conjugate gradient optimization algorithms, as well as the growing neuron selection (all versions) method with the L1 (versions 2, 3 and 4) and L2 (version 1) regularization methods.

Discussion
The confusion matrices, calculated for each architecture and 30 testing samples, are presented in Tables 2-5 and the error results are highlighted in Table 6. The activation functions used were the hyperbolic tangent (version 1), the rectified linear (versions 2, 3 and 4) and Softmax (all versions).   In order to test and compare the performances of the analyzed ANNs, we used the weighted average derived from the confusion matrix, i.e., the accuracy, the recall and the F1 score (see Table 7) [14].

Conclusions
In terms of the system performance, the results obtained for the four ANNs designed to recognize the class identity of JWH synthetic cannabinoids lead to the following conclusions: 1. Accuracy [(true positives + true negatives)/total instances]: In comparison with the accuracy (93.3%) obtained for the initial ANN model presented in a previous article, the ANN-amended version 1-generated a higher score (96.7%), while the other three ANNs generated a lower score (86.7% for the amended version 3 and 90.0% for the amended versions 2 and 4).