FNNS: An Effective Feedforward Neural Network Scheme with Random Weights for Processing Large-Scale Datasets

: The size of datasets is growing exponentially as information technology advances, and it is becoming more and more crucial to provide efﬁcient learning algorithms for neural networks to handle massive amounts of data. Due to their potential for handling huge datasets, feed-forward neural networks with random weights (FNNRWs) have drawn a lot of attention. In this paper, we introduced an efﬁcient feed-forward neural network scheme (FNNS) for processing massive datasets with random weights. The FNNS divides large-scale data into subsets of the same size, and each subset derives the corresponding submodel. According to the activation function, the optimal range of input weights and biases is calculated. The input weight and biases are randomly generated in this range, and the iterative scheme is used to evaluate the output weight. The MNIST dataset was used as the basis for experiments. The experimental results demonstrate that the algorithm has a promising future in processing massive datasets.


Introduction
Feed-forward neural networks (FNNs) have gained increasing attention in recent years because of their flexible structural design and strong representational capacity. Feedforward neural networks [1], which have adaptive characteristics and universal approximation characteristics, have been widely used in regression and classification. In addition, they offers study models for a wide range of natural and artificial processes, and it has been used in numerous technical and scientific domains [2]. In the traditional neural network theory, all the parameters of FNNs, such as input weight, bias, and output weight, need to be adjusted under specific conditions. The hierarchy of the network structure, however, makes this process complex and ineffective. The usual method is a gradient-based optimization method, such as the BP algorithm, but this method usually has some problems such as local minimum, slow convergence speed, sensitive learning speed, and so on. In addition, some parameters, such as the hidden node count or the learning algorithm parameters, need to be manually adjusted. In order to solve this series of problems, as the times require, Schmidt, Kraaijveld, and Duin first proposed the FNNRW in 1992 [3]. The output weights may be evaluated and estimated using the well-known least-square approach since the input weights and biases' random distribution are uniformly distributed in [−1, 1]. Many simulation results in the literature show that the randomization model has higher performance than the fully adaptive model, and provides simpler implementation and faster training speed [4].
Theoretically, it is clear that the capability of global approximation cannot be guaranteed by the random distribution of input weights and biases [5,6]. For a variety of reasons, many random learning algorithms emerge endlessly. Ref. [7] suggested a feed-forward neural network learning method with random weights. Ref. [8] studied the sparse algorithm of random weight networks and its applications. Ref. [9] presented a random single hidden layer feed-forward neural network metaheuristic optimization research. The authors of [10] carried out a study on distributed learning of random vector function chain networks. Ref. [11] proposed a probability learning algorithm based on a random weight neural network for robust modeling.
In addition, Ref. [12] provided a complete discussion of the randomization method of neural networks. In order to ensure the universal approximation property, the development of neural network randomization methods is promoted for the constraints of random weights and biases [13]. However, data are in a period of rapid expansion due to the ongoing advancement of information technology. The resulting problem is that the number of data samples or neural network hidden layer nodes in the NNRW model becomes very large so the method of calculating the output weight is very time-consuming. In response to this problem, there have been many studies on large-scale data modeling problems in the past few decades. Ref. [14] explained how to efficiently train language models using neural networks on big datasets. Ref. [15] provided a kernel framework for energy-efficient largescale classification and data modeling. Ref. [16] examined a population density method that makes large-scale neural network modeling possible. Ref. [17] presented a framework for parallel computing to train massive neural networks. Ref. [18] presented a multiprocessor computer for simulating neural networks on a huge scale. Reducing the size of the datasets by subsampling is perhaps the easiest method for dealing with enormous datasets. Osuna E, Freund R, and Girosi F were the first to suggest using this decomposition technique [19,20]. Bao-Liang Lu and Ito, M. (1999) also proposed this method [21], which is used to solve the problem of pattern classification. However, the method used for large-scale data processing in this paper is similar to the method of dealing with large-scale data by the Bayesian committee SVM proposed by Tresp et al. [22,23]. In this approach, the datasets are split into equal-sized parts, and models are generated from each subset. Each submodel is trained independently, and a summary is used to get the final determination.
This study examines a feed-forward neural network model for big datasets that uses random weights and a decomposition approach. In this study, the data were divided into tiny subsets of the same size at random, and each subset was then utilized to generate the associated submodel. The weights and biases of the hidden nodes that determine the nonlinear feature mapping are set randomly and are not learned in the feed-forward neural network with random weights. It is crucial to pick the right interval when selecting the weights and deviations. This topic has not been fully discussed in many studies. The method used in this paper calculates the optimal range of input weights and biases according to the activation function, and each submodel initializes the same input weight and biases within the optimal range. At the same time, an iterative scheme is adopted to evaluate the output weight.
The rest of this article is divided into the following sections. Section 1 introduces the traditional random weight feed-forward neural network learning algorithm. Section 2 details the work related to this paper. Section 3 describes in detail the optimized random weight feed-forward neural network learning algorithm proposed in this paper. In Section 4, the experimental simulation results are shown, the algorithm's performance is examined and appraised, and the possibility of an engineering application is discussed. In Section 4, the experimental simulation results are given, and the performance of the algorithm is analyzed and evaluated, as well as the prospect of engineering applications. Section 5 is the conclusion of this paper and the planning for the future work.

The Related Work
This section introduces the development of feed-forward neural networks (FNN) and related works. This paper first discusses the history of artificial neural networks (ANNS) and the use of feed-forward neural networks in real-world applications. It next discusses random weight feed-forward neural networks, their optimization, and ultimately our optimization strategy. This paper first introduces the origin of artificial neural networks (ANNS) and the practical application of feed-forward neural networks, then introduces the optimization and application of random weight feed-forward neural networks, and finally presents our optimization scheme.
An artificial neural network (ANN), also referred to as a neural network (NN), is a mathematical model of hierarchically distributed information processing by imitating the behavior characteristics of animal brain neural network [24], through the relationship between various neurons, mainly by regulating the relationship between a large number of internal nodes, to achieve the purpose of data processing [25]. The logician W. Pitts and psychologist W.S. Macculloch created the mathematical MP models and neural networks in 1943. The age of artificial neural network research began when they proposed the formal mathematical description of neurons and the network structure approach through the MP model, and demonstrated that a single neuron can carry out logical functions [26]. Artificial neural networks have many model structures, and feed-forward neural networks are only one of them [27].
Frank Rosenblatt created the perceptron, an artificial neural network, in 1957. The perceptron is a simple neural network in which each neuron is arranged in layers and is only connected to the previous layer. The output of the previous layer is received and exported to the next layer, and there is no feedback between neurons in each layer. This is the earliest form of a feed-forward neural network (FNN). The feed-forward neural network is one of the most popular and rapidly evolving artificial neural networks due to its straightforward construction. The study of feed-forward neural networks started in the 1960s, and both theoretical and practical advancements have been made. The FNN can be regarded as a multilayer perceptron. With all the links between layer and layer, it is a kind of typical deep learning model. The performance of the traditional model, with large data samples and outstanding performance, can solve the problems that some traditional machine learning models cannot understand. However, deep learning models with small data samples are complex, making the process difficult to explain. The FNN also shares these characteristics, so it is mainly used in scenarios with large datasets. A feed-forward neural network-based approach for creating rocket trajectories online is presented in [28], and the trajectory is roughly estimated utilizing the neural network's nonlinear mapping capability. In [29], the study of source term inversion of nuclear accidents uses deep feed-forward based neural networks. The Bayesian MCMC technique is used to examine the DFNN's prediction uncertainty when the input parameters are unclear. In [30], with the chaotic encryption of the polarization division multiplexing OFDM/OQAM system, the feed-forward neural network is used to increase data transmission security and realize a huge key space. In [31], a feed-forward backpropagation artificial neural network is used to predict the force response of linear structures, which helps researchers understand the mechanical response properties of complex joints with special nonlinearities. In [32], the initial weight approach and the construction algorithm are combined to form a novel method of feed-forward neural network multi-class classification that can achieve high success rates. In [33], hybrid MVO and FNN were improved for fault identification of WSN cluster head data. It is clear that feed-forward neural networks are used in a variety of industries.
Feed-forward neural networks demonstrate the superiority of mathematical models in a variety of applications, but as the size of the dataset keeps growing, the feed-forward neural network's original performance cannot keep up with the demands of engineering. As a result, many researchers have focused on developing improved feed-forward neural networks to deal with large-scale datasets. The feed-forward neural network is optimized primarily from two aspects: on the one hand, random weighting by weight set in selected is used to enhance the performance of the algorithm, as mentioned in [33][34][35][36][37]; on the other hand, large datasets are processed by using the method of sample selection, as described in [38,39], in order to enhance the algorithm's performance. The method of random weight optimization is commonly used by scholars. Because it offers effective learning capabilities, feed-forward neural networks are frequently employed in mathematical modeling [34]. Recently, some advanced stochastic learning algorithms have been developed slowly, a feed-forward neural network with a random hidden nodes approach was proposed in [34]. Weights and bias are generated randomly depending on the input data and kinds of activation functions, allowing the model's level of generalization to be controlled. In [35], a training iterative solution for large-scale datasets is developed, and the regularization model is used to initially generate a learning model with improved generalization ability. When dealing with large-scale datasets, good applicability and effectiveness are achieved. In [36], a distributed learning algorithm of a feed-forward neural network with random weights is proposed by using an event-triggered communication scheme. To reduce needless transmission overhead, the method adopts a discrete-time zero-gradient and strategy solution and introduces an event-triggered communication approach. In [37], a new feed-forward neural network method for initializing weights is proposed, which linearizes the whole network at the equilibrium point, especially the initial weights and deviations. Ref. [40] offered a linear algebraic approach based on the Cauchy inequality that guaranteed the output of neurons in the active area, sped up convergence, and drastically decreased the number of algorithm iterations for sample extraction. Many scholars discuss hot topics [38] such as combining Monte Carlo simulations (MCSs) and implementing efficient sampling of feed-forward neural networks (FNNs). The authors of [39] put forward a kind of incremental learning method based on a hybrid fuzzy neural network framework from the angle of the dataset to improve the performance of the algorithm.
Feed-forward neural network optimization has drawn a lot of interest in the era of big data. To lower the sample size, feed-forward neural network optimization now primarily uses the processing of random weights, although researchers have only been able to increase the performance of these networks in isolation. In order to handle enormous datasets, this research suggests a random weighting feed-forward neural network scheme based on a decomposition approach. This scheme not only ensures the network's integrity but also the random performance of feed-forward neural networks. However, the sample feature extraction's actual applicability is not exhaustive; large-scale datasets only use random weights, which have poor feed-forward neural network performance.

FNNRW Learning Algorithm
Feed-forward neural networks have been widely used in many fields. FNNRWs are generally described as: where N is the number of hidden nodes; x = [x 1 , x 2 , · · · , x n ] T ∈ R n is the input; , ω i2 , · · · , ω in ] ∈ R n and β i ∈ R are the input and output weights connecting the i th hidden node and the output node, respectively; b i ∈ R is the bias; g(·) is the activation function; and the activation function generally adopts the common sigmoid function, as shown in Equation (2).
In FNNRWs, as shown in Figure 1, input weights and biases are distributed uniformly at random, ω ∼ U(ω min , ω max ), b i ∼ U(b min , b max ). The well-known least-squares approach may be used to analytically compute the output weights: which gives β = (H T H) −1 H T T. The least squares issue is typically poorly phrased, though. So we can use the l 2 regularization method to solve this kind of problem, i.e., where µ > 0 is a regularization factor. If µ is given such that H T H + µI is invertible, then the minimizer of Equation (5) is easily described as: where I represents the identity matrix.

Improved FNNRW Learning Algorithm
In large-scale data, as shown in Figure 2, the sample is randomly divided into m parts, s = {s 1 , s 2 , · · · , s m }. For each subset s i , derive the corresponding submodel and initialize the same input weights and biases. Calculate the hidden layer output matrix H si , and H si is a positive definite matrix. The whole problem can be described as: The ith local model's hidden output matrix and target output, respectively, are denoted by the symbols H si and T si .
( ) In this algorithm, the most common sigmoid function is used for the activation function. For convenience, the activation function can be denoted as: The parameters ω and b are used to govern the movement of the g(x) picture on the x axis. The derivative of the activation function is shown in Equation (10). When ω > 0, the derivative is also greater than zero, so the slope of g(x) is greater than zero. Similarly, when ω < 0, the slope of the activation function g(x) is less than zero, so ω can be used as the slope parameter of g(x). According to Figure 3, when x = 0, g(x) = 0.5; when x = 1, g(x) = r. When x = 1, g(x) = r, and b = 0, we get: After transformations we obtain: Assuming that the fitting curve of the activation function is not as flat as that of the sigmoid function, then: In order to determine the range of parameters more accurately, let ω 2 = s · ω 1 , where s > 1 is used to define the maximum input weight and the steepest part of the activation function. The ranges may be used to calculate the slope parameter of the i-th activation function: After substituting Equation (12) into Equation (14), we obtain: Parameter s determines the steepest part of the activation function, whose specific value is determined by the target function.
As for the determination of parameter b, when x ∈ [0, 1], according to Figure 3 we can get: Following transformations, we get: For x = 0 we get the first boundary of b:b 1 = 0; for x = 1, we get the second boundary of b:b 2 = −ω. It can be seen that bias b depends on the input weight ω, so the range of bias b can be obtained: The formula above illustrates the best decision range in the process of determining input weights and biases at random.
In the process of determining the output weight, randomly initialize the output matrix as β(0) and, respectively, calculate the local and global gradients: At the same time, calculate: Each local model is locally optimized during each iteration: The idea of Bregman divergence is presented in order to better comprehend this local model: In this algorithm, for each f s i(β), there is The regularization parameter is σ > 0. Accordingly, the Bregman divergence is According to the above equation, Equation (22) can be be changed to: Taylor's extension, however, transforms the Bregman divergence into: Thus, Equation (22) can be further transformed into: The ultimate output weight may be calculated using the preceding derivation as follows: These are a few of the approaches for improvement this study employed. Algorithm 1 displays the particular FNNS algorithm, and Figure 4 displays the flowchart of algorithm development.
Step 2: Determine the range of input weights and biases that the activation function deems to be optimum, Step 3: Randomly initialize the same input weights ω i and biases b i within the range of values.
Step 4: Calculate the H si hidden layer output matrix and initialize the β(0) output matrix at random.
Step 5: Calculate the required components such as local gradients and global gradients.
Step 6: Calculate the output weight, Step 7: if ≤ σ or reach the maximum number of iterations,then break; else go on Step 5 and Step 6 .
Step 8: Train the network using the calculated weights.
Step 9: Return result.  Figure 4. An optimized FNNRW learning algorithm flowchart. The graphic depicts each phase of the FNNRW optimization process, and the chosen approach corresponds to that step specifically. The flow chart may be used to more clearly illustrate the improved FNNRW algorithm's execution process, condition judgment at its start, and condition at its conclusion.

Results and Discussion
The effectiveness of the suggested algorithm is tested in this section. A Pentium (R) dual-core E5400 processor clocked at 2.70 GHz and operating with 2 Gb of memory was used for all tests in the MATLAB 7.10.0 (2010a) environment. The activation function used by the algorithm is the sigmoid function g(x) = 1 1+e −x . Because it is for large-scale data processing, the dataset used in this paper is the MNIST dataset. The Mixed National Institute of Standards and Technology database, also known as the MNIST dataset, is a sizable collection of handwritten digits that the National Institute of Standards and Technology has collected and organized. It includes a training set of 60,000 examples and a test set of 10,000 examples. In comparison to other types of datasets, MNIST, a publicly accessible handwritten digital dataset, has small image pixels, relatively low computing power, the ability to build a neural network with fewer layers, is amenable to computer arithmetic, and the dataset has sufficient quantity, high discrimination, and low noise. Consequently, it was selected as the experiment's dataset. The samples are initially separated into m equal subgroups and all samples are normalized in order to test the algorithm's efficacy. During the experiment, the parameters r = 0.1 and s = 3 were selected in the process of calculating the input weights and biases. Additionally, take the regularization parameter σ = 0.05, learning rate µ = 10 −3 , the threshold ε = 10 −3 . The following charts intuitively show the performance of the optimized algorithm.
When the number of subsets m is 10 or 20, respectively, Tables 1 and 2 compare the accuracy of the FNNRWs learning algorithm and the improved FNNRWs learning method. As shown in Tables 1 and 2, the accuracy of the optimized FNNRWs learning algorithm is greater than the accuracy of the FNNRWs learning algorithm as sample size or the number of hidden layer nodes increases. Figures 5 and 6 compare the training accuracy and test accuracy while displaying the accuracy of the optimized FNNRWs learning algorithm and the accuracy of the FNNRWs learning algorithm as the number of training rises, respectively. In the figure, the advantages of the optimized FNNRWs learning algorithm in terms of accuracy can be seen very intuitively. Figures 7 and 8 show the performance advantages of the optimized FNNRWs learning algorithm from the aspect of relative error by calculating the relative error.

Engineering Applications
Data have grown more quickly as science and technology have advanced. In the big data era, people may find a ton of pertinent information. For instance, environmental factors that are regularly studied offer a significant dataset in coal mine safety. The feedforward network random weighting algorithm presented in this paper is based on the decomposition method, which is more adaptable, divides a large dataset into local datasets rather than using sampling to reduce it, preserves the integrity of the original dataset, and is thus better suited for use in real-world engineering applications.

Conclusions and the Future Work
In this paper, a feed-forward neural network model with random weights based on decomposition technology is examined for large-scale datasets. It is based on the feedforward neural network with random weights (FNNRW). Each subset of the data is used to create the associated submodel once the samples are separated into subsets of the same size. The technical contribution of this study is to optimize the three parameter generating process in order to increase overall performance. The input weights and biases in FNNRWs are set arbitrarily and are not learned. The choice of the proper interval is crucial. The best value range is computed in this study using the activation function. Starting with a variety of input weights and biases will boost performance as a whole. In addition, an iterative approach is employed to get over the challenge of assessing the output weights of the random model. The MNIST dataset was used as the basis for experiments. The outcomes of the experiments demonstrate the algorithm's efficacy.
The algorithm performance of the suggested feed-forward neural network model with random weights for large-scale datasets based on decomposition technology has greatly improved; however, the following has to be done in the future: (a) Despite a minor flaw, the suggested method can still be made better. The improved approach will be used in further work to boost the algorithm's performance even more. The experimental data used in the paper are two-dimensional, and the subsequent work will start from the dimensionality to improve the algorithm's adaptability, filter the dataset to improve its quality in order to indirectly improve algorithm performance, and continue to improve the algorithm structure in order to satisfy more application requirements. (b) Only the MINIST dataset will be tested since the future data will be enormous and the dataset utilized in the experiment will be small. The performance of the method will be examined in the subsequent study using a bigger dataset. (c) Every algorithm update is eventually used in everyday life, and our original goal was to use high-performing algorithms in actual situations. In our upcoming work, we will concentrate on applying algorithms to many technical domains in addition to improving the algorithm's performance.