Deep Neural Fuzzy System Oriented toward High-Dimensional Data and Interpretable Artiﬁcial Intelligence

: Fuzzy systems (FSs) are popular and interpretable machine learning methods, represented by the adaptive neuro-fuzzy inference system (ANFIS). However, they have difﬁculty dealing with high-dimensional data due to the curse of dimensionality. To effectively handle high-dimensional data and ensure optimal performance, this paper presents a deep neural fuzzy system (DNFS) based on the subtractive clustering-based ANFIS (SC-ANFIS). Inspired by deep learning, the SC-ANFIS is proposed and adopted as a submodule to construct the DNFS in a bottom-up way. Through the ensemble learning and hierarchical learning of submodules, DNFS can not only achieve faster convergence, but also complete the computation in a reasonable time with high accuracy and interpretability. By adjusting the deep structure and the parameters of the DNFS, the performance can be improved further. This paper also performed a profound study of the structure and the combination of the submodule inputs for the DNFS. Experimental results on ﬁve regression datasets with various dimensionality demonstrated that the proposed DNFS can not only solve the curse of dimensionality, but also achieve higher accuracy, less complexity, and better interpretability than previous FSs. The superiority of the DNFS is also validated over other recent algorithms especially when the dimensionality of the data is higher. Furthermore, the DNFS built with ﬁve inputs for each submodule and two inputs shared between adjacent submodules had the best performance. The performance of the DNFS can be improved by distributing the features with high correlation with the output to each submodule. Given the results of the current study, it is expected that the DNFS will be used to solve general high-dimensional regression problems efﬁciently with high accuracy and better interpretability.


Introduction
Fuzzy systems (FSs) originated from the fuzzy set theory proposed by Zadeh in 1965 [1], which are based on fuzzy rules. FSs are more interpretable and intuitive methods that can match any set of input-output data by each fuzzy rule [2]. As soft computing techniques, FSs have achieved great success in dealing with numerous problems of uncertainty, such as the seismic vulnerability assessment of buildings [3][4][5], the prediction of the irrigation water infiltration rate [6], the automatic classification of crop disease images [7], etc.
However, there are three main challenges in developing optimal FSs through the analysis of the research status: (1) Optimization: Optimizing FSs within a valid period of time to achieve higher accuracy and faster convergence is now worthy of in-depth study, which is also challenging. (2) The curse of dimensionality: The number of fuzzy rules increases exponentially with the dimensionality of the input, which leads to the computation not being able to be completed within a reasonable time. Hence, it is difficult for FSs to deal with high-dimensional problems. (3) Interpretability: The interpretability is the advantage of FSs that distinguishes them from other machine learning models. However, as the number of fuzzy rules increases, the interpretability of FSs will be affected. Therefore, how to solve the high-dimensional problem on the basis of ensuring better interpretability is one of the current bottlenecks.
In recent years, great efforts have been made to improve FSs [8]. Evolutionary algorithms (EAs) [9,10], the gradient descent (GD) algorithm [11], and GD plus least squares estimation (LSE) [12] have been proposed to optimize FSs. Although EAs can search for the optimal solution with enough iterations, the computational cost is too expensive to be suitable for the optimization of FSs. The adaptive neuro-fuzzy inference system (ANFIS) proposed by Jang formed by GD plus LSE [12] has been widely applied in national energy demand forecasting [13,14], flood sensitivity forecasting [15], geographic temperature forecasting [16], heart disease classification [17], and so on. However, the convergence speed of GD or GD plus LSE is very slow, and they easily fall into local optimal solutions.
In view of the above problems, new techniques based on the ANFIS optimized by EAs have been emerging. For example, Azar et al. proposed the improved ANFIS based on the Harris hawks optimization evolutionary algorithm and demonstrated its effectiveness on the prediction of the longitudinal dispersion coefficient of natural rivers [18]. Xu et al. proposed an ANFIS-PSO method based on an improved particle swarm optimization algorithm and successfully applied it to the evaluation of tool wear [19]. Kaur et al. combined the genetic algorithm with the ANFIS to further improve the prediction accuracy of the waterborne disease cholera [20]. In addition, a method of optimizing the ANFIS based on an improved firefly algorithm and the differential evolution optimization algorithm was proposed by Balasubramanian [21], which has been proven to be effective in the application of medical disease prediction. Ehteram et al. used three optimization algorithms (sinecosine algorithm, particle swarm optimization algorithm, and firefly algorithm) to optimize the ANFIS to improve the prediction accuracy [22]. However, all the above methods only focus on the optimization of FSs and still have challenges in solving high-dimensional data.
In order to enable FSs to process high-dimensional data, principal component analysis (PCA) [23] is widely used for dimensionality reduction. Razin et al. combined the ANFIS with PCA to predict the Iranian ionospheric time series, which can shorten the convergence time and obtain the optimal solution [24]. Phan et al. solved the problem of predicting the fracture pressure of defective pipelines more effectively through the combination of PCA and the ANFIS [25]. Meanwhile, it is worth referring to the efficient training algorithm named MBGD-RDA proposed by Wu et al. for TSK FSs in 2020 [26]. PCA was applied to constrain the maximum input dimensionality to five, and several novel techniques were proposed in MBGD-RDA. However, in terms of the above methods, the performance will be affected due to the loss of important features by PCA, especially for high-dimensional data. Therefore, the curse of dimensionality has not been fundamentally solved for FSs.
The primary target of this study was to enable FSs to effectively solve high-dimensional regression problems on the basis of ensuring the performance and interpretability. The main contributions of this paper are as follows: (1) Inspired by deep learning, the subtractive clustering-based ANFIS (SC-ANFIS) is proposed and adopted as a submodule to construct the deep neural fuzzy system (DNFS) in a bottom-up way; (2) Combined with ensemble learning and hierarchical learning, the DNFS that can solve general high-dimensional regression problems efficiently with high accuracy and interpretability is proposed; (3) The effect of the deep structure and the combination of submodule inputs on the performance of the DNFS are researched in-depth; (4) The effectiveness and superiority of the DNFS are validated on five real-world datasets with various dimensionalities.
The remainder of this paper is organized as follows: Section 2 introduces the proposed DNFS algorithm. Section 3 describes the datasets and performance indices used in the experiments. Section 4 presents the experimental results to verify the effectiveness of the DNFS. Section 5 draws the conclusion and points out the directions for future research.

The SC-ANFIS Submodel
The structure of the ANFIS is composed of an adaptive neural network and an FS. It not only inherits the adaptive learning ability of neural networks, but also keeps the interpretability of FSs. The ANFIS can adjust the parameters according to prior knowledge so that the predicted values are closer to the target values, which has achieved great success in many applications [27]. The SC-ANFIS is proposed in this paper, which applies subtractive clustering (SC) to construct a Sugeno fuzzy inference system in the ANFIS. The SC-ANFIS can effectively avoid the combinatorial explosion of fuzzy rules when the dimensionality of the input is very high. In addition, the fuzzy rules generated by SC are more consistent with the data than those obtained without clustering. The input space can be divided appropriately, and the number of membership functions (MFs) and the parameters for each input domain can be reasonably determined [28]. There are two other well-known methods to construct fuzzy inference systems: (1) grid partitioning; (2) fuzzy c-means clustering [29]. It has been proven that SC is better than other algorithms [30,31], and it was adopted as the method to generate the fuzzy inference system in the ANFIS.
There are several types of MFs in the ANFIS (e.g., triangular, trapezoidal, generalized bell, Gaussian, etc.) that can be applied [32]. The MF used in the SC-ANFIS is Gaussian, considering that it has the advantages of being nonzero, simple, and smooth and having fewer parameters compared with other MFs [33]. Furthermore, relevant studies have demonstrated that the performance of the Gaussian MF is better than others in many nonlinear complex problems [34,35]. The general ANFIS structure, which has five layers and two inputs, is as follows: Layer 1: Calculate the MF values for each input domain.
where x 1 , x 2 are the inputs and u A i (x) and u B (i−2) (x) are the corresponding MF values, which can be expressed in the Equation (2).
where c i and σ i are the parameters of the Gaussian MF that result in the different shapes of the MF for each input; Layer 2: Multiply the MF values of the inputs as the excitation intensity corresponding to the fuzzy rule: Layer 3: Normalize the incentive intensity of the fuzzy rule, which represents the weight of the rule in all rule bases: Layer 4: Calculate the output of each fuzzy rule: where {p, q, r} are the consequent parameters for each fuzzy rule; Layer 5: Integrate the output of all fuzzy rules and calculate the final output.
where O i k is the output of the k-th layer.

The Structure of the DNFS Algorithm
This paper employed the SC-ANFIS as a submodule to construct the DNFS. By dividing the high-dimensional data into several groups of m-dimensionality, the results are obtained by each submodule in a more efficient way.
The number of inputs for each submodule (m) and the number of inputs shared between adjacent submodules (n) need to be initialized, so as to construct DNFSm-n. Figure 1 shows the general structure diagram of DNFS5-2.
The basic definition of the structure of DNFSm-n is defined as follows: Layer 1: The number of inputs for each submodules is m, and n inputs are shared between adjacent submodules. Each submodule works separately and obtains its own results; Layer 2: The outputs of each submodules in the first layer are merged into a new dataset, which are applied in the second layer. The way of grouping the inputs is the same as the first layer; Similarly, the inputs of layer L consist of the outputs of each submodule of L-1. The submodules are built in the same way as the first layer to group the inputs. The submodules of each layer are built bottom-up until the inputs of the last layer are only enough to build one submodule, and the final result is obtained.

The Implementation Steps of the DNFS Algorithm
This section mainly introduces the implementation steps of the DNFS algorithm. The flowchart of the proposed DNFSm-n is shown in Figure 2. The execution process of DNFSm-n is as follows: Step 1. Data preprocessing: Each numerical feature was mapminmax-normalized, and the dataset was divided into a training set and a test set; Step 2. Define the structure of the DNFS: Determine m, n, the layers, and the number of submodules of each layer; Step 3. Group the training set and the test set: The training set and the test set are divided into several groups of m-dimensionality, and each group corresponds to one submodule; Step 4. Training submodule: Traverse each submodule of the current layer, and put the grouped training set into the corresponding submodule for training; Step 5. Test submodule: Put the grouped test set into the corresponding submodule, which has been trained to perform the testing; Step 6. Determine if the DNFS is completed: If the current layer is the last one of the DNFS structure, the current outputs are directly taken as the final output, and turn to Step 7. Otherwise, the outputs of the current layer are merged into a new dataset, which are adopted as the inputs of the next layer, and return to Step 3; Step 7. Obtain the final result.
There is a special case in Step 3: the group needs to be supplemented if there are not enough features in the grouping to constitute m-dimensional inputs. The main situations are as follows: • If this happens when the training set and the test set are grouped in the first layer, the features are selected in turn from the first group that has been divided to supplement them; • If this happens when the training set and the test set are grouped among the second layer to the last layer, all the outputs of the upper layer are sorted based on the error value in ascending order, and the outputs with the smallest error are selected according to the number of missing ones.

The Datasets
To demonstrate that the proposed DNFS can effectively solve the high-dimensional regression problems, this paper selected five representative datasets with various dimensionality from the UCI machine learning repository, and the specific information of the datasets is shown in Table 1. Sixty percent of the samples were randomly selected for training and the remaining forty percent for testing.

Performance Index
To evaluate the performance of the DNFS algorithm comprehensively, four performance indices (e.g., mean absolute error (MAE), root mean squared error (RMSE), Akaike information criterion (AIC) [36], and symmetric mean absolute percentage error (SMAPE)) were introduced. MAE, RMSE, and SMAPE mainly measure the precision, while AIC considers the precision and simplicity simultaneously [37]. They are defined as: where n is the number of samples,ŷ and y are the predicted value and the true value, respectively, and k is the number of parameters that can be optimized. The parameters of a submodule are equal to the number of antecedent parameters (A * T * S) plus the number of consequent parameters (R * (S + 1)). A and T are the number of MF parameters and the MF in each input domain, respectively. S is the input dimensions. R is the number of fuzzy rules (T S ). The parameters of the DNFS are equal to the sum of the parameters of each submodule. In order to reflect the comprehensive performance of the model, this paper also defined an evaluation method to rank each index. The final score was the sum of the scores of each index. The experimental platform of this paper was a desktop computer running MATLAB 2020a and Windows 10 Education 64x, with Intel Core i7-9700 CPU @ 3.00 GHz, 16 GB memory, and a 512 GB solid state drive.

The Effect of the Number of Submodule Inputs on the DNFS
This section mainly focuses on the effect of the number of submodule inputs on the DNFS for the first three datasets. The corresponding DNFSs with 3, 4, and 5 submodule inputs are DNFS3-0, DNFS4-0, and DNFS5-0, respectively.
The performance comparison of DNFSs with different submodule inputs on the first three test datasets are shown in Tables 2-4. The prediction charts performed by the three DNFSs on the first three test datasets are shown in Figure 3. The analysis of the results obtained was as follows: DNFS5-0 outperformed DNFS3-0 and DNFS4-0 in the prediction accuracy, which determined that its comprehensive score was always the highest among the three cases. The MAE, RMSE, and SMAPE of DNFS5-0 were smaller than the other two algorithms on the three datasets. As shown in Figure 3, it also can be seen that DNFS5-0 achieved the best fitting effect, while the other two algorithms performed relatively poor.
In terms of model complexity, DNFS5-0 had the minimum layers and submodules, followed by DNFS4-0 and DNFS3-0. It can be concluded that more submodules and layers needed to be built to decompose the high-dimensional data with fewer submodule inputs. Meanwhile, DNFS5-0 had the most parameters and DNFS3-0 the least on average. The AIC of DNFS3-0 was the minimum for the first two datasets, while DNFS5-0 obtained the best AIC on Dataset No. 3.
The average computational time that DNFS3-0, DNFS4-0, and DNFS5-0 spent on the three datasets was 31.21 s, 36.44 s, and 58.32 s respectively. It can be concluded that DNFS3-0 is the most efficient method for the three cases.
Based on the analysis above, DNFS5-0 can ensure the optimal performance with less complexity. The advantage of DNFS5-0 was further revealed with the increase of dimensionality, by which the high-dimensional data could be divided into submodules more rapidly. Therefore, the DNFS algorithm with five submodule inputs was researched further.

The Effect of the Number of Inputs Shared by Adjacent Submodules on the DNFS
The target of this section is to explore the performance of the DNFS with different shared inputs on the first three datasets.
As shown in Tables 5-7, DNFS5-2 had the minimum MAE, RMSE, and SMAPE on the three test datasets, which indicated that the best prediction accuracy can be achieved by DNFS5-2. As shown in Figure 4, it can be seen obviously that DNFS5-2 achieved the best fitting effect on the three test datasets among the three cases.
Based on the experimental results obtained, it also can be found that the structures of DNFS5-0 and DNFS5-1 were simpler than DNFS5-2. There is no doubt that the more shared inputs there are, the more layers and submodules will be under the same submodule inputs. Consequently, DNFS5-2 had the most layers, submodules, and parameters on average, which led to its poor AIC value.
As the comprehensive performance was taken into consideration, the superiority of DNFS5-2 is shown obviously. The score of DNFS5-2 was the highest among the three DNFSs, which can resolve the high-dimensional data within a valid period of time. On the whole, DNFS without shared input had a simpler structure, and that with two shared inputs had better prediction accuracy.

The Effect of the Combination of Submodule Inputs on the DNFS
To reveal the effect of the combination of submodule inputs on the DNFS, this paper introduced DNFS5-2-Random. The implementation steps of DNFS5-2-Random are as follows: each time DNFS5-2 is executed, the input of each layer is randomly scrambled. After DNFS5-2 is executed in this way 10 times, the optimal input order is taken as the final order of DNFS5-2, and the corresponding results are obtained.
Meanwhile, in order to validate the effectiveness and superiority of DNFS, MBGD-RDA, which is the latest training algorithm for TSK FSs, was introduced to be compared with DNFS. MBGD-RDA was implemented by the MATLAB implementation given in [26], and its initial learning parameters were consistent with [26], which proved to be the optimal one. Besides, the radial basis function (RBF) [38], generalized regression neural network (GRNN) [39], and long-short term memory (LSTM) [40] were also introduced to further reveal the superiority of the DNFS, which represent general machine learning models. Their implementation were mainly realized by calling the newrb, newgrnn, and trainNetwork functions of the deep learning toolbox in MATLAB 2020a, whose learning parameters were mainly selected as the default values of the functions. The analysis of the effect of different combinations of submodule inputs: By analyzing the structure of the DNFS, the combination of inputs for each submodule of the first layer played a key role in improving the performance of the DNFS. Therefore, we performed the Pearson correlation analysis on the confusion inputs of the first layer obtained in the experiments with the output variable. Figure 5 shows the correlation analysis diagram between the output and inputs sequentially and randomly on the five datasets. It can be seen obviously that the input variables with high correlation values were relatively concentrated both sequentially and in the random cases on Dataset No. 1  Through the analysis of the results obtained, it can be concluded that the features that had a higher correlation with the output should be dispersed into each submodule, so that the performance of each submodule can be balanced and the overall performance improved.   The analysis of the performances of the DNFS and the other algorithms: The performance comparison of the DNFS with the other algorithms on the five test datasets is shown in Tables 8-12. The prediction charts for different algorithms are shown in Figures 6-10. Through observation and analysis, the following information can be obtained: As shown in Figures 6-10 Meanwhile, Tables 8-12 show that DNFS5-2 had fewer parameters and a simpler structure than RBF, GRNN, and LSTM, which validated that DNFS5-2 has less complexity and higher interpretability. On the contrary, the parameters of the three machine learning models increased exponentially with the dimensionality of the input, which led to their interpretability being very poor. In addition, MBGD-RDA constrained the maximum input dimensionality five, so that its parameters were equal to 212( 5 * 2 * 2 + 2 5 * (5 + 1) = 212), while the MF was Gaussian and the number for each input domain was two. On all five datasets, the parameters of MBGD-RDA were far fewer than the other algorithms, resulting in the AIC being minimum. However, the prediction accuracy of MBGD-RDA was not optimal, which was probably because of the loss of important features due to PCA, so the performance was greatly affected.
The average computational time that DNFS5-2, MBGD-RDA, RBF, GRNN, and LSTM spent on the five datasets was 52.99 s, 50.39 s, 13.31 s, 1.23 s, and 18.22 s respectively. Obviously, GRNN was the most efficient method among all the algorithms. Although the efficiency of DNFS5-2 was relatively poor, it could achieve the best performance within a valid period of time, which could not be achieved by previous FSs. Additionally, the computational cost for each algorithm increased with the features and samples based on Tables 8-12.     The analysis of the generalization ability of the DNFS: The performance comparison of the DNFS on the five datasets for the training and the testing is shown in Figure 11. Figure 11a shows the test MAE comparison of the DNFS on the training level with the testing level, while Figure 11b,c shows the test RMSE and SMAPE comparison, respectively. It can be seen that the DNFS could achieve excellent performance both on the training level and testing level. The DNFS had similar performance on the training level and testing level, in particular for Dataset No. 1 and No. 3. It is obvious that the DNFS had excellent generalization performance with enough samples. However, the test performance would be slightly worse than that on the training level due to fewer samples. Therefore, the regularization method will be introduced in the future research to enhance the generalization ability of the DNFS.

Conclusions
FSs are well-known machine learning models, but have difficulty in dealing with highdimensional data. This paper proposed the DNFS to enable FSs to effectively solve highdimensional regression problems on the basis of ensuring accuracy and interpretability. Inspired by deep learning, the SC-ANFIS was proposed and adopted as a submodule to construct the structure of the DNFS in a bottom-up way. This paper also performed an in-depth study on the deep structure and the combination of submodule inputs for the DNFS to improve the performance of the DNFS.
The experimental results on five real-world regression datasets indicated that: The DNFS had less complexity and better interpretability. The number of parameters of the DNFS was far less than the other algorithms, especially when the dimensionality of the input was very high. It also can be concluded that the structure of the DNFS is simpler and more interpretable; 3.
The DNFS can solve high-dimensional regression problems in a more reasonable time than the previous FSs, while ensuring excellent performance; 4.
The DNFS with five submodule inputs and two shared inputs had the best comprehensive performance. Additionally, dispersing the features that had a high correla-tion with the output into each submodule, better improvement of the DNFS could be achieved.
Additionally, this paper proposes future research directions: On the one hand, how to reduce the time consumption under the same structure will be further considered. Removing the submodules with poor performance or more fuzzy rules is the current preliminary idea, which is likely to reduce the computational cost and improve the accuracy. On the other hand, how to determine the optimal input combination for each submodule is also another future direction.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: