A Maximally Split and Adaptive Relaxed Alternating Direction Method of Multipliers for Regularized Extreme Learning Machines

: One of the signiﬁcant features of extreme learning machines (ELMs) is their fast convergence. However, in the big data environment, the ELM based on the Moore–Penrose matrix inverse still suffers from excessive calculation loads. Leveraging the decomposability of the alternating direction method of multipliers (ADMM), a convex model-ﬁtting problem can be split into a set of sub-problems which can be executed in parallel. Using a maximally splitting technique and a relaxation technique, the sub-problems can be split into multiple univariate sub-problems. On this basis, we propose an adaptive parameter selection method that automatically tunes the key algorithm parameters during training. To conﬁrm the effectiveness of this algorithm, experiments are conducted on eight classiﬁcation datasets. We have veriﬁed the effectiveness of this algorithm in terms of the number of iterations, computation time, and acceleration ratios. The results show that the method proposed by this paper can greatly improve the speed of data processing while increasing the parallelism.


Introduction
The extreme learning machine (ELM) has been extensively applied in many areas [1] due to its fast learning ability and satisfactory generalization performance.The regularized ELM (RELM) [2] is an extended variant of the standard ELM [3] which improves the generalization performance and stability of ELMs by adding a regularization term in the loss function [4].However, the dimension and the volume of data have increased significantly with the development in big data.When the number of training samples and the number of hidden layer nodes are especially large, the size of the output matrix of the ELM model will be particularly large.Therefore, the calculation of the ELM based on the Moore-Penrose matrix inverse requires humongous storage and calculations, significantly increasing the computational complexity of the ELM.
To address the above problems, several enhanced ELMs were proposed.By decomposing the data matrix into a set of smaller block matrices, Wang et al. [5] adopted a clustering technique with a message-passing interface to train the block-matrix-based ELM in parallel with the aim of improving computing efficiency.Liu et al. [6] proposed a Spark-distributed parallel computing mechanism to achieve a parallel transformation of ELMs.Chen et al. [7] used a clustering technique with GPUs to parallelize ELMs.Based on the Spark framework, Duan et al. [8] improved the learning speed of the ELM when processing large-scale data by dividing the dataset.All the methods discussed above focus on computation schemes of the RELM using parallel or distributed hardware structures.However, the matrix-inversionbased (MI-based) solution process has low efficiency and high computational complexity, leading to low convergence [9].Therefore, all the methods discussed above cannot solve the problem of the low efficiency of RELMs in the big data scenario.
The alternating direction method of multipliers (ADMM) is a powerful computational framework for separable convex optimization.It has been extensively applied in many fields owing to its fast processing speed and convergence performance [10].Wang et al. [11] used the ADMM to solve the center selection problem in fault-tolerant radial-basis-function networks.Wei et al. [12] applied the ADMM to neural networks to solve the problem of slow training for large-scale data.Wang et al. [13] applied the ADMM to SVMs to achieve distributed learning by splitting the training samples.Luo et al. [14] used the decomposability of the ADMM; then, the regularized least-squares (RLS) problem of the RELM can be split into a set of sub-problems that can be executed in parallel to achieve the purpose of improving computation efficiency.Li et al. [15] used the ADMM to solve the predictive control problem of a distributed model, which enables the model to have a fast-response ability.Xu et al. [16] applied the ADMM to a quantized recurrent neural network language training model into an optimization problem to improve its convergence speed.
One of the main problems of the classical ADMM is the convergence speed.In general, the numerical performance of the ADMM largely depends on an effective solution to the sub-problems-there can be several different sub-problem splitting representations in practical applications.Thus, a generalization of the N-block ADMM is needed because the classical ADMM algorithm is only suitable for solving two-block convex optimization problems, and the sub-problems structure cannot be fully utilized.
To further improve the ADMM convergence speed and generalization performance, several extended variants of the ADMM were presented, including the generalized ADMM [17][18][19][20] and the relaxed ADMM (RADMM) [21].Lai et al. achieved fast convergence by using a novel relaxation technique to modify the ADMM to have an ADMM with a highly parallelized structure.Based on the RADMM, Xiaoping et al. [22] proposed a maximally split and relaxed ADMM (MS-RADMM) by considering splitting model coefficients to improve convergence and parallelism.Su et al. [23] introduced a binary splitting operator into the ADMM, and the optimal solution of the original problem is obtained through the iterative calculation of the intermediate operators to achieve the purpose of improving the convergence speed.Ma et al. [24] used an MS-RADMM with a highly parallel structure to optimize a 2D FIR filter, and a practical scheme for algorithm parameter setting was provided.Hou et al. [25] utilized a tunable step-size algorithm to accelerate MS-ADMM convergence speed.However, the convergence speed of the ADMM largely depends on the choice of parameters in the iterative process.For this reason, we propose an adaptive parameter selection method which uses the improved Barzilai-Borwein spectral gradient method to automatically tune the algorithm parameters to achieve an optimal convergence speed.
For the implementation of the MS-RADMM, we propose an adaptive parameter selection method for joint tuning of the penalty and relaxation parameters.Our main contributions are as follows: (1) Improving Global Convergence: To improve the global convergence of the algorithm, a non-monotonic Wolfe-type strategy is introduced into the memory gradient method.
The global optimal solution is achieved by combining the iteration information of current and past multiple function points.(2) Solving Sub-problem: To improve the convergence speed of the algorithm, the Barzilai-Borwein spectral gradient method is optimized by adding step-size selection constraints to simplify the computational complexity of the MS-RADMM sub-problems.

Fundamentals of the RELM and the ADMM
With an increase in the volume and complexity of datasets, the size of training samples N and the number of the hidden nodes L are very large.As such, the MI-based solutions require enormous memory space and suffer from excessive computational loads.To address these challenges, the ADMM is used to handle the convex model-fitting problem of the ELM.

RELM Method
As a training framework for solving single hidden-layer neural networks [26], the ELM has a good learning speed and generalization performance.For an m-category classification problem, assuming that the training sample is x and number of hidden-layer nodes is L, the ELM model output is given by where ] is the hidden-layer output matrix, w i , s i are the input weight and bias of the ith hidden nodes, g(.) is the activation function, β is the output weight matrix, and T denotes the target output matrix of the network.The actual performance of the ELM depends on the number of neurons in the hiddenlayer.If the number of neurons is too small, the extracted information is insufficient, and it is hard to generalize and reflect the inherent disciplines of the data.If the number is too large, the network structure is too complex, thus reducing the generalization performance.
To further improve the generalization performance and the stability, regularization theory is imported into the ELM to minimize the training error and the norm of the output weight matrix β [27][28][29].The RELM solves for the output weight β in the following RLS problem min where || • || F denotes the Frobenius norm, and µ > 0 is a regularizer that controls the tradeoff between the loss function and a regularization term.However, the MI-based RELM leads to an excessive computational load, particularly in problems concerning high-dimensional data.An effective way to solve large-scale data processing problems is through parallel or distributed optimization methods.The ADMM is a powerful technique for large-scale convex optimization.

ADMM for Convex Optimization
As a computational framework for solving constrained optimization problems, the ADMM achieves good performance on convergence speed and parallel structures.The ADMM [30] decomposes a large global problem into multiple local sub-problems, and the solution of the global problem is obtained through coordinating the solutions of the sub-problems.The following convex model-fitting problem is studied: where is the target output vector, f (.) means a convex loss function, and r(.) is a regularization term.By defining equality constraints z i = a i •x i , the model-fitting problem (3) can be transformed into The augmented Lagrangian of problem (5) is where ρ > 0 is the penalty parameter, and λ i ∈ R N×m is the dual variable.
The ADMM uses the Gauss-Seidel iteration method [31] to minimize the augmented Lagrangian function of optimized variables x and z and updates the dual variable λ according to a multiplier method.The iterative solution process of the model-fitting problem is easily obtained as The global optimal solution is obtained by alternately updating variables x and z [32].

Maximally Split and Relaxed ADMM
The numerical performance of the ADMM largely depends on the efficient solving of sub-problems [33].A maximally split technique and a relaxation technique are used to speed up the ADMM convergence [34].
The MS-ADMM splits the model-fitting problem into multiple univariate sub-problems flexibly with a reasonable scale.It reconstructs the method, based on matrix operations, ensuring that there is only one scalar component in each sub-problem.This gives the MS-ADMM an ideal highly parallel structure.By considering the L-partition ADMM [35], matrix A is simplified to a column vector a i and the vector coefficient x is simplified to a scalar coefficient x i .These scalar characteristics play an important role in improving the parallel computing efficiency and the highly parallel structure of the MS-ADMM.
On the basis of the MS-ADMM, the MS-RADMM [36] is acquired by adopting a relaxation technique.It reconstructs the convergence conditions; past iterations are considered in the next iteration, which makes the MS-RADMM have linear convergence.By enlarging the equality constraint residuals (7).The MS-RADMM is expressed as where α > 0 is the relaxation parameter, magnifying the equality constraint residuals.

Scalars MS-ARADMM
The efficiency of the MS-RADMM depends strongly on the choice of the penalty and relaxation parameters.A suitable parameter selection scheme is key to improving the computational efficiency of the MS-RADMM.
We propose an adaptive parameter selection method for the MS-RADMM and obtain the MS-ARADMM.The MS-ARADMM allows for automatic tuning of the key algorithm parameters to improve the convergence speed.The convergence is measured by using the primal and dual residuals, defined as From the perspective of the convergence principle, when the algorithm approaches the optimal solution, γ k , d k residuals are close to zero.The specific termination conditions are shown as where tol represents the stop tolerance and is a constant; the specific value can be set according to the actual error range.Considering the time cost in the experiment process, the stop tolerance is set to 10 −3 in this paper, and the setting of the stop tolerance is only related to the accuracy of the error and does not depend on the dataset.

Spectral Adaptive Step-Size Rule
The spectral adaptive step-size rule is derived by studying the close relationship between the RADMM [37] and the relaxed Douglas-Rachford splitting (DRS) [38].
For problem (5), assume that a local linear model of ∂ f (x) and ∂r(x) at iteration k is given by where θ k > 0, γ k > 0 are the local curvature estimates of f and r, respectively.ψ, φ are constants.
According to the equivalence of the RADMM and the DRS, the linear model is fitted to the gradient of the target by using DRS theory for problem (13).In order to obtain the optimal step-size with zero residuals on the model problem, such that f (x) + r(x) residuals are zero, the following needs to be satisfied: The optimal penalty parameter for the linear model is given by We can readily find the optimal relaxation parameter under the optimal penalty parameter condition

Estimation of Step-Size
The local curvature estimates θ k , and γ k can often be estimated simply from the results of iteration k and an earlier iteration k 0 .The initial value of the spectral step-size can be calculated by using the local curvature estimation, and the ADMM can modify the spectral step-size by updating the dual variables in the iterative process so as to achieve the best penalty parameter and relaxation parameter. Define where σ is a scaling parameter.
When solving an unconstrained optimization problem, dual variables λ k and spectral step-size affect the convergence performance of the MS-RADMM.At present, line search is commonly used to select θ k , γ k .We can overcome the oscillation phenomenon by adopting a non-monotonic technique.However, when the initial value is taken near a local valley of the function, it is easy to obtain a local extreme value.
To avoid being trapped in a local optimum, a non-monotonic Wolfe-shaped line search strategy is incorporated into the memory gradient method [40].By combining the iteration information of current and past multiple points, the global convergence of the algorithm is improved.
The dual variable update rule is derived from Combined with the idea of the Barzilai-Borwein gradient method, the spectral stepsize θ k is readily obtained as [41] The spectral step-size γ k is solved likewise.

Parameter Update Rules
In the case where the linear model assumptions break down or an unstable stepsize is produced, we can employ a correlation criterion to verify the local linear model assumptions. Define where θ cor k are the correlations between ∆ f k , and γ cor k are the correlations between ∆λ k .The update rules of penalty and relaxation parameters are given by where ε cor is the threshold of curvature estimation and is a constant, which is set as 0.2 in this paper with reference to paper [41].The setting of this threshold further avoids the problem of inaccurate curvature estimation and ensures convergence.

RELM Based on the Scalars MS-ARADMM
The MS-ARADMM is employed to solve the convex model-fitting problem of the RELM and improves the convergence speed of the RELM.

Scalars MS-RADMM for RELM
For an m-category classification problem, calculation of the RELM objective function (3) is equivalent to (4).First, the hidden layer output matrix H is acquired by using the RELM.Then, the MS-ARADMM algorithm is used to solve the optimal output weight of the RELM.The iteration process is given by )) where H j and H i are the jth row and the ith column of the matrix H, k represents the number of iterations, and m represents the number of columns in the matrix.The schematic diagram of the specific model is shown in Figure 1.

Simulation Experiment and Result Analysis
The MS-ARADMM-based RELM is used to train single hidden-layer feedforward neural networks (SLFNs) on eight datasets.The performances of the MS-ARADMM, the MS-RADMM, and the RS-ADMM are evaluated by convergence speed and time cost.The specifications of the datasets are shown in Table 1.

Performance Analysis of Adaptive Parameter Selection Methods
According to the principle of iterative calculation of the gradient algorithm, since the computational complexity of each iteration is the same, it means that the total number of iterations is positively correlated with the time cost.In other words, the convergence performance of the algorithm can be evaluated by comparing the iterative convergence curves of the algorithm, and the time cost of the algorithm can also be analyzed by analyzing the convergence curves.
In order to verify the convergence of the adaptive parameter selection method, numerical experiments were carried out under the same environmental conditions and compared with the current popular improved Barzilai-Borwein algorithms (MBBH, NABBH, and MTBBH).The effectiveness of the method was evaluated by comparing the total number of iterations at the end of execution.
The MBBH algorithm modifies the standard Barzilai-Borwein step-size [42] to have specific quasi-Newton characteristics.However, the curvature condition is not added, and the generated approximate Hessian matrix cannot meet the iterative requirements, which affects the speed of the algorithm.
The NABBH algorithm improves the convergence speed by simplifying the computational complexity of the inverse operation of Hessian matrix [43]; that is, only the inverse matrix of the first derivative matrix of the function is calculated, and the second derivative matrix of the function is omitted.A step-size selection strategy is designed to speed up the convergence of the algorithm.However, this algorithm fails to converge if the condition of monotonic decrease is not met at each iteration.
The MTBBH algorithm realizes monotonic descent by replacing the exact Hessian with a positive-definite data matrix.However, due to adoption of a non-monotonic technique, the algorithm easily falls into local optima.
For problems that tend to fall into local extremes, a new Barzilai-Borwein-type gradient method is proposed by modifying the original Barzilai-Borwein step-size.By introducing a non-monotonic Wolfe-type strategy into the memory gradient method, the global optimal solution is obtained.The purpose of improving convergence speed is achieved by adding step-size constraints [43].In theory, the proposed adaptive parameter selection method has better global convergence and convergence speeds.
A comparison of the performances of the MBBH, the NABBH, the MTBBH, and the proposed algorithms through tests was made.Table 2 and Figure 1 show the simulation results of different methods, which demonstrate the correctness of our theoretical analysis.According to Table 2, under different constraint conditions, the proposed method is found to terminate with the least number of iterations, indicating that it has the fastest convergence speed.It is also clearly shown in Figure 2 that the proposed method has better global convergence and non-monotonicity than the other algorithms.

Convergence Analysis
The key performances of the classification model is convergence speed and accuracy.Considering the background of big data, this paper focuses on the convergence speed of the model in algorithm optimization.In order to evaluate the effectiveness of the proposed algorithm, the convergence of the proposed algorithm is evaluated by comparing the time cost, the number of iterations convergence, and the classification accuracy with the newer improved ADMM algorithm.
The proposed MS-ARADMM is compared with the MS-AADMM and RB-ADMM methods on eight datasets.The experiment is conducted by setting the same termination conditions.The evaluation indicators include the number of iterations and computational time.
The RB-ADMM algorithm decomposes the objective function of the model into a loss function and a regularization function; it uses the ADMM to transform the least square problem into the least square problem without a regularization term so as to improve the calculation speed of the model.However, this method does not fully utilize the model structure of the ADMM, leading to slow convergence.
The MS-AADMM adopts a tunable step-size to accelerate convergence.However, parameter selection plays an important role in the convergence of the algorithm.Inappropriate parameter selection may cause the algorithm to not converge.
The MS-ARADMM is realized by employing an adaptive parameter selection method to improve the convergence speed.
Given the hidden-layer output matrix H, the optimal output weights of the MS-ARADMM are calculated with (27).The output weights of the RB-ADMM and the MS-AADMM are updated by the following:

Comparison of Convergence of MS-ARADMM and RB-ADMM
The difference between the output weight updates ( 27) and ( 28) is that all iterations in the MS-ARADMM are for scalar variable updates.The update method of scalar variables simplifies the sub-problem solving, thus improving the convergence.Although the RB-ADMM can adaptively choose penalty parameters to improve the convergence to a certain extent, it suffers from several flaws.The performance of the RB-ADMM can vary wildly depending on the problem size.Furthermore, without a suitable choice of a residual balancing factor, the algorithm may not converge.Aiming at solving the problems of the RB-ADMM, the MS-ARADMM implements adaptive parameter selection by adding step-size selection constraints, thereby improving the convergence.
The simulation results are given in Table 3.As can be seen from Table 3, with the optimization of the algorithm, the time and number of iterations spent by the model to process large-scale data become less and less, which also means that the algorithm proposed in this paper has a better convergence speed.At the same time, to see the improvement effect of the MS-ARADMM algorithm, the algorithm improvement effect calculated from the results in Table 3 is given in Table 4.This can be seen from Table 4 for the convergence speed improvement, in which the convergence speed of the MS-ARADMM is increased by an average of 99.3032% compared with the RB-ADMM in the two-category datasets.In the six-category datasets, compared with the RB-ADMM, the convergence speed of the MS-ARADMM is increased by 98.4375% on average.In the ten-category classification datasets, from Table 4, the convergence speed of the MS-ARADMM is increased by an average of 96.7624% compared with the RB-ADMM.

Comparison of Convergence of MS-ARADMM and MS-AADMM
As with the calculation formula of β in MS-ARADMM, the introduction of the scalar variable update method in MS-AADMM leads to much more efficient computation.However, parameter selection must be addressed.From the MS-AADMM perspective, this manner does not take into account that relaxation techniques can further accelerate the convergence.MS-ARADMM simplifies the calculation by designing an adaptive parameter selection method to jointly adjust the penalty and relaxation parameters.
From Table 3, the convergence speed becomes faster and faster.This can also be seen from Table 4 for the convergence speed improvement; the convergence speed of the MS-ARADMM is increased by an average of 69.2445% compared to the MS-AADMM in the two-category datasets.In the six-category datasets, compared to the MS-AADMM, the convergence speed of the MS-ARADMM is increased by 71.7948% on average.In the tencategory classification datasets, from Table 3, the convergence speed of the MS-ARADMM by an average of 48.9966% compared to the MS-AADMM.
For the case of the PCMAC, Pendigits, or Optical-Digits dataset, due to the limited dimension and size of the dataset, this dataset leads to lower improvements in the convergence speed.For instance, in the ten-category datasets, the USPS dataset already achieves improvements of 83.8709%.However, the Optical-Digits dataset only achieved improvements of 14.6341%.This huge difference arises from the fact that the MS-ARADMM is suitable for large-scale optimization problems.This greatly reduces the convergence speed improvement for the Optical-Digits dataset, because the size of the Optical-Digits dataset is 64 × 5620 and that of the USPS dataset is 256 × 9298.

Convergence Rate Comparison
Implicit in the MS-ARADMM is the assumption of automatically tuning the parameters to achieve an optimal performance.On this basis, we show that the MS-ARADMM generally gives better convergence than other algorithms.
The convergence performance of different algorithms is compared on eight benchmark datasets.Table 3 and Figure 3 show the simulation results of the three algorithms.The results are in full agreement with the theoretical analysis.According to Table 3, the MS-ARADMM algorithm has the lowest computational complexity and the least iterations of all datasets among all algorithms.From Figure 3, with a maximum of 2000 iterations and an error of 10 −3 , the MS-ARADMM can meet the termination condition within the minimum number of iterations.

Parallelism Analysis
Parallelism is an important indicator for evaluating the convergence speed of the ADMM algorithms.High parallelism performance can effectively relieve the computational burden and improve algorithm efficiency.To verify that the MA-ARADMM has a better convergence speed, simulations are carried out on the datasets.The parallelism performance of the MS-ARADMM is evaluated by analyzing the GPU acceleration ratios and the relationship between the acceleration ratios and the number of CPU cores.

Parallel Implementation on Multicore Computers
Using a maximally splitting technique, the RLS problem can be maximally split into univariate sub-problems that can be executed in parallel, leading to a highly parallel structure.
To verify our theoretical analysis, experiments are conducted on the Gisette dataset on different multicore computers.The relationship between acceleration ratios and the number of cores is characterized by the acceleration ratio R, defined by the single-core runtime divided by the n-core runtime.The experiments are carried out on three multi-core computers.The hardware configurations of the three computers are, respectively, an Intel Core i7-10700 8-core CPU @ 2.9 GHz, an Intel Core i7-4790 4-core CPU @ 3.60 GHz, and an Intel Core i7-8700 6-core CPU @ 3.2 GHz.
The three computers are shown in Figure 4. From Figure 4, the relationship between the acceleration ratios and the number of CPU cores is close to the lower bound, demonstrating the high parallelism of the MS-ARADMM.

Parallel Implementation on GPU
As one of the important indexes to evaluate the convergence performance of the algorithm, the high parallel performance effectively alleviates the computational pressure and further improves the operation efficiency of the algorithm.Through internal multiprocess parallel computing, the GPU can have a speed that is one order of magnitude higher than the CPU; it also has a strong ability of floating point arithmetics, which can greatly improve the computing speed of the ADMM and shorten the calculation time.
In case of high dimensional data, MI-based RELM requires a large amount of storage and computation.To verify the high parallelism of the algorithm, parallel accelerated experiments of MS-ARADMM-based and MI-based RELMs are realized on an NVIDIA GeForce GT 730 display card.The parallel implementations on the GPU are implemented by using the gpuArray function in the MATLAB toolbox.
The MS-ARADMM-based RELM splits the model-fitting problem into a set of univariate sub-problems that can be executed in parallel.Its convergence speed is improved by the parameter selection scheme.Theoretically, the MS-ARADMM has good convergence speed and parallelism.
The simulation results from all of the datasets are given in Table 5.From Table 5, on all the datasets except USPS, Pendigits, and Optical-Digits, the computational complexity of the MS-ARADMM-based RELM is much smaller than that of the MI-based RELM.On all of the datasets, the computational complexy of the MS-ARADMM-based RELM is much smaller than that of the MI-based RELM when implemented on the GPU.The acceleration ratio of the MI-based method is about 5.3443, whereas that of the MS-ARADMM is about 23.5065-an acceleration of four times that of the MI-based method.

Accuracy Analysis
The classification accuracy is an important indicator of classifier performance.Accuracy was compared on the MS-ARADMM-based, MS-AADMM-based, and MI-based RELMs.
Table 6 compares the training accuracy and the testing accuracy of the MS-ARADMM, the MS-AADMM, and the MI-based RELMs.From Table 6, we can see that the classification accuracy by the MS-ARADMM is not affected.From Tables 3 and 6, under approximately identical training and the testing accuracy, the computational time for the MS-ARADMM is less than those of both the MI-based and the MS-AADMM-based methods.Thus, the convergence speed of MS-ARADMM is greatly improved in solving large-scale optimization problems.

Conclusions
In this paper, an MS-ARADMM algorithm is proposed to solve the RLS problem in the RELM.Its novelty is reflected in two aspects: (1) The non-monotonic Wolfe-type strategy is introduced into the memory gradient method to improve the global convergence; (2) The step selection constraint is added to simplify the computational complexity of MS-RADMM subproblems.Since the MS-ARADMM is a convex optimization method with superlinear global convergence, it can ensure a fast response and global optimal solution of the RELM, so it is more suitable to realize the distributed computation of large-scale convex optimization problems of the RELM compared with other ADMM methods.
We focused on the influences of parameters ρ and α on the convergence performance of the RELM model.To verify the performance of the proposed algorithms, we applied them to various large-scale classification datasets, and compared the simulation results with the methods implemented in Tables 2 and 3.The results confirm that the computation efficiency of the RELM model is obviously improved, especially when applied to large-scale convex optimization problems.Therefore, the MS-ARADMM algorithm could enhance the convergence speed since it has a simpler solution process.

Figure 1 .Algorithm 1 : 2 Initialization; 3 4 5 6
Figure 1.Illustration of the MS-ARADMM-based RELM.4.2.Learning Algorithm for MS-ARADMM-Based RELMBy adding step-size selection constraints in the MS-ARADMM iteration, it is ensured that the penalty and relaxation parameters can converge under the bounded conditions.The steps are shown in Algorithm 1.

Figure 2 .
Figure 2. Convergence comparison of different methods.

Table 2 .
Comparison of iterations for different methods.

Table 3 .
Comparison of the convergence speed of RELM for different algorithms.

Table 4 .
Comparison of convergence speed improvement of MS-ARADMM and different methods.

Table 6 .
Training accuracy and testing accuracy.