Comprehensive Overview of Backpropagation Algorithm for Digital Image Denoising

: Artiﬁcial ANNs (ANNs) are relatively new computational tools used in the development of intelligent systems, some of which are inspired by biological ANNs, and have found widespread application in the solving of a variety of complex real-world problems. It boasts enticing features as well as remarkable data processing capabilities. In this paper, a comprehensive overview of the backpropagation algorithm for digital image denoising was discussed. Then, we presented a probabilistic analysis of how different algorithms address this challenge, arguing that small ANNs can denoise small-scale texture patterns almost as effectively as their larger equivalents. The results also show that self-similarity and ANNs are complementary paradigms for patch denoising, as demonstrated by an algorithm that effectively complements BM3D with small ANNs, surpassing BM3D at a low cost. Here, one of the most signiﬁcant advantages of this learning technique is that, once taught, digital images may be recovered without prior knowledge of the degradation model (noise/blurring) that caused the digital image to become distorted.


Introduction
Digital image denoising is an engineering and science area that investigates strategies for recovering an original scene from degraded data [1,2]. This field has long been studied in the signal processing, astronomy, and optics communities. Many of the techniques utilised in this domain have their roots in well-developed mathematical topics such as estimate theory, ill-posed inverse problem solution, linear algebra, and numerical analysis. Modeling the degradations, which are mainly blur and noise, and applying an inverse approach to produce an approximation of the original scene are the techniques utilised for image denoising.
However, in most real-world scenarios, significant a priori information regarding the blurring system's degradation is rarely accessible. As a result, utilising limited information about the imaging system, we must estimate both the genuine picture and the blur from the degraded image attributes. In this study, we looked at picture noising [3][4][5], which is a specific sort of image degradation.
During acquisition, transmission, or recovery from the storage medium, images are frequently contaminated by noise. When an image is produced with a digital camera under low lighting settings or under other limitations/conditions, many dots can appear in the image (see Figure 1). The appearance of these dots (random behaviors of noisy signals) is denoising function's quality. Many clean patches could explain a given noisy patch if the size of the image patches is small and the noise level is large. Here, we witnessed a necessary trade-off: very small patches lead to an easily modelled function but poor picture denoising results, whereas very big patches may lead to a better image denoising result but a difficult to model function [13][14][15][16][17].
Noise is not injective and thus not invertible when applied on a clean patch. As a result, finding a perfect denoising function is difficult. This problem can be solved by lowering the noise level and increasing the size of the image patches. A given noise could be explained by the presence of fewer clean patches. Large patches can thus achieve superior denoising outcomes than tiny ones, at least in theory [18][19][20]. We showed in this research that a simple MLP with supervised learning that maps noisy patches onto noise-free ones could reach state-of-the-art digital image denoising performance. In order to make this possible, the following components were integrated [21,22]: • The MLP's capacity was chosen to be large enough to accommodate a suitable number of hidden layers and hidden layer neurons; • The image patch size was selected to ensure that a noisy image patch has sufficient information to recover a noise-free version of the original input image; • The chosen training set was large enough to allow for on-the-fly generation of training samples by corrupting noise-free patches with noise. The goal of the supervised learning algorithm [6][7][8] for digital image denoising is to eliminate noise such as this. Because a noisy image is unpleasant to look at, digital image denoising is required. As a result, some fine visual details may be misinterpreted for noise or vice versa. Different sorts of sounds impact digital photos [9][10][11].
Digital image denoising [12] is commonly thought of as a function that maps a noisy (corrupted) image to a cleaner version of that image. However, because the mapping from images to images is difficult, we chopped the image into possibly overlapping patches and learned mapping from a noisy patch to a clean patch in practice. All patches of noisy images were denoised independently using that map for image denoising of a particular noisy image, and then restored image patches were joined in such a way that we obtained an original image in denoised form (cleaner version restored image).
One thing to keep in mind is that the size of the picture patches has an impact on the denoising function's quality. Many clean patches could explain a given noisy patch if the size of the image patches is small and the noise level is large. Here, we witnessed a necessary trade-off: very small patches lead to an easily modelled function but poor picture denoising results, whereas very big patches may lead to a better image denoising result but a difficult to model function [13][14][15][16][17].
Noise is not injective and thus not invertible when applied on a clean patch. As a result, finding a perfect denoising function is difficult. This problem can be solved by lowering the noise level and increasing the size of the image patches. A given noise could be explained by the presence of fewer clean patches. Large patches can thus achieve superior denoising outcomes than tiny ones, at least in theory [18][19][20].
We showed in this research that a simple MLP with supervised learning that maps noisy patches onto noise-free ones could reach state-of-the-art digital image denoising performance. In order to make this possible, the following components were integrated [21,22]:

•
The MLP's capacity was chosen to be large enough to accommodate a suitable number of hidden layers and hidden layer neurons; • The image patch size was selected to ensure that a noisy image patch has sufficient information to recover a noise-free version of the original input image; • The chosen training set was large enough to allow for on-the-fly generation of training samples by corrupting noise-free patches with noise.

Artificial ANNs
An ANN, parallel distributed processing systems (PDPs) and connectionist systems, is a "black box" technique for predictive modeling that requires all characters defining the unknown circumstance to be provided to a trained ANN for identification (prediction) [19]. Advantages and disadvantages of ANNs are listed in Table 1. An ANN is designed to imitate the central nervous system's organisational principles, with the goal that the ANN's biologically inspired computing skills make performing cognitive and sensory tasks easier and more satisfying than with conventional serial processors. Information processing is a system that has a large number of highly interconnected processing neurons and is biologically inspired computing capacity of ANN systems such as human brains. In PDPs, these neurons collaborate to learn from the input data, coordinate internal processing, and maximise the final output. The artificial neural network (ANN) is a mathematical model of the human neural architecture that reflects its "learning" and "generalization" capacities. Artificial ANNs (ANNs) are a type of AI [1]. An artificial neural network (ANN) is a classification system that can be implemented in either hardware or software. During the training phase, it is an adaptive system that modifies its structure or the internal information that travels through the network. Because of qualities including strong non-linear mapping capacity, high learning accuracy, and good robustness, ANN is widely employed in various fields [23]. A feedforward ANNs model called a multi-layer perceptron (MLP) maps sets of input data to a set of relevant outputs. An MLP is a directed graph made up of multiple layers of nodes, each fully connected to the next. Each node is a neuron (or processing element) with a non-linear activation function, with the exception of the input nodes. Backpropagation is a supervised learning approach used by MLP to train the network. The MLP is a modified linear perceptron that can discriminate data that are not linearly separable [24][25][26]. Table 1. The ANN's advantages and disadvantages [27,28].

Advantages Disadvantages
For specific requirements, the hidden layer structure can be put up in a variety of ways.
The model's intricacy may result in over-fitting or under-fitting.
Multiple goal outputs can be set without increasing the challenge greatly.
There are no specific network design principles or recommendations.
There are a variety of challenges that can be classified as complex regression problems or classification tasks. Low learning rate and local optimal.
Multivariable and non-linear issues. Extrapolation beyond the data range performs poorly.
Problems arising from a lack of knowledge or experience. Difficulty in expressing the decision's rationale.
The link between factors is unclear. Weight and other essential factors are difficult to determine.

Problem Specific Approach for ANNs Development
The various phases in the problem-specific approach for ANN development constitute a cycle of six phases [29,30], shown in Figure 2: • Phase 1 (problem description and formulation): This phase is primarily reliant on a thorough grasp of the problem, especially the "cause-effect" relationships. Before deciding on a modelling technique, the advantages of ANNs over alternative techniques (if available) should be weighed; • Phase 2 (System Design): This is the initial phase in the actual ANN design, in which the modeller selects the appropriate type of ANN and learning algorithm for the task. Data collecting, data preprocessing to fit the type of ANN utilised, statistical data analysis, and data splitting into three unique subsets are all part of this phase (training, test, and validation subsets); • Phase 3 (System Realisation): This phase entails training the network with the training and test subsets while also analysing the prediction error to gauge network perfor-mance. The design and performance of the final network can be influenced by the optimal selection of numerous parameters (e.g., network size, learning rate, number of training cycles, tolerable error, etc.). If practicable, breaking the problem down into smaller sub-problems and building an ensemble of networks could improve overall system accuracy. The modeller is now back in phase 2; • Phase 4 (System Verification): Although network construction includes ANN testing against test data while training is in progress, it is best to practise (if data permits) using the validation subset to assess the best network for its generalisation capabilities. The goal of verification is to ensure that the ANN-based model can appropriately respond to cases that have never been utilised in network construction. Comparing the performance of the ANN-based model to that of other methodologies (if available) such as statistical regression and expert systems is also part of this phase. • Phase 5 (System Implementation): This phase entails integrating the network into a suitable operating system, such as a hardware controller or computer software. Before releasing the integrated system to the end user, it should be thoroughly tested. • Phase 6 (System Maintenance): This phase entails upgrading the produced system as the environment or system variables change (e.g., new data), which necessitates a new development cycle. • Phase 3 (System Realisation): This phase entails training the network with the training and test subsets while also analysing the prediction error to gauge network performance. The design and performance of the final network can be influenced by the optimal selection of numerous parameters (e.g., network size, learning rate, number of training cycles, tolerable error, etc.). If practicable, breaking the problem down into smaller sub-problems and building an ensemble of networks could improve overall system accuracy. The modeller is now back in phase 2; • Phase 4 (System Verification): Although network construction includes ANN testing against test data while training is in progress, it is best to practise (if data permits) using the validation subset to assess the best network for its generalisation capabilities. The goal of verification is to ensure that the ANN-based model can appropriately respond to cases that have never been utilised in network construction. Comparing the performance of the ANN-based model to that of other methodologies (if available) such as statistical regression and expert systems is also part of this phase. • Phase 5 (System Implementation): This phase entails integrating the network into a suitable operating system, such as a hardware controller or computer software. Before releasing the integrated system to the end user, it should be thoroughly tested. • Phase 6 (System Maintenance): This phase entails upgrading the produced system as the environment or system variables change (e.g., new data), which necessitates a new development cycle.

Multi-Layer Perceptron ANNs
A Multi-Layer Perceptron (MLP) is an artificial neural network (ANN) made up of simple neurons called perceptrons. It is a feedforward artificial neural network (ANN) that maps sets of input data to a set of relevant outputs. The MLP is a deep ANN since it has three or more layers of non-linearly active neurons (an input, an output, and one or more hidden layers), as shown in Figure 3. Weight is a type of data that the neural network uses to solve a problem. Weights can be set to zero or calculated using a variety of methods. The normalised initialisation is used: The various phases in problem-specific approach for ANN development.

Multi-Layer Perceptron ANNs
A Multi-Layer Perceptron (MLP) is an artificial neural network (ANN) made up of simple neurons called perceptrons. It is a feedforward artificial neural network (ANN) that maps sets of input data to a set of relevant outputs. The MLP is a deep ANN since it has three or more layers of non-linearly active neurons (an input, an output, and one or more hidden layers), as shown in Figure 3. Weight is a type of data that the neural network uses to solve a problem. Weights can be set to zero or calculated using a variety of methods. The normalised initialisation is used: where n j and n j+1 are the number of neurons in the input side and output side of the layer, respectively. layer and : bias of output layer), and computes its output. is the sum of the activation functions ℎ (hidden layer activation function) and (output layer activation function).
In a neural network, weight initialisation is a critical criterion. The weight changes represent the neural network's overall performance. If these values are 0 (since there is no direct link between input and output), the gradients calculated during the trial will also be 0, and the network will not learn. In order to discover the ideal value for the cost function, further learning attempts with varying initial weights are suggested (minimum error).
MLP is made up of numerous layers of neurons in a feedforward network (directed graph), with each layer fully connected to the one before it, each with a non-linear activation function. If MLP has a linear activation function in all neurons, which is a linear function that maps the weighted inputs to each neuron's output, then any number of layers can be lowered from the usual two-layer input-output ANNs [31].
The input layer neurons merely act as buffers, spreading the input signals out x i (i = 1, 2, . . . , n) to the hidden layer neurons. Each hidden layer neuron j (j = 1, 2, . . . , m), see Figure 3, sums its input signals x i after weighing them with the strength of the respective input-hidden layer connections w ji (weights of the input-hidden layer) and w k,j (weights of hidden and output layer) from the input layer is the bias (θ j : bias of hidden layer and θ k : bias of output layer), and computes its output. y k is the sum of the activation functions g h (hidden layer activation function) and g o (output layer activation function).
Here, Figure 4 shows the basic architecture of MLP ANN.

Supervise Learning Algorithm
We fed input data and correct data to the network in the supervised learning method. The input data are propagated in a forward manner through the network until it reaches the output neurons, where it is compared to the correct data, and the error is calculated. We do not need to change anything in the network if the output contains no or minimal errors. If there is an error, we must modify the weights in the network to ensure that the network delivers accurate output in future learning, as shown in Figure 5. Supervised learning is the term for this method of weight modification. The backpropagation training technique, often known as the generalised delta rule, is one of the most commonly used supervised learning algorithms in real-world applications. MLP-ANNs are trained using the backpropagation (error backpropagation) algorithm. The rules of supervised learning (error corrective learning) are used in this method [32][33][34]. This algorithm involves two passes through the MLP-distinct ANN's layers: forward and reverse training. A pattern activity is applied to the MLP-input ANN's neurons in forward training, and its influence propagates across the network layer by layer, producing a set of output as the network's real-time response. The network's weights (synaptic weights) are fixed in this situation. All weights are modified according to the supervised learning rule during backward training.
In order to generate an error signal, the network's real-time response is deducted from the desired response. This error signal is then passed on to the rest of the network.

Supervise Learning Algorithm
We fed input data and correct data to the network in the supervised learning method. The input data are propagated in a forward manner through the network until it reaches the output neurons, where it is compared to the correct data, and the error is calculated. We do not need to change anything in the network if the output contains no or minimal errors. If there is an error, we must modify the weights in the network to ensure that the network delivers accurate output in future learning, as shown in Figure 5. Supervised learning is the term for this method of weight modification. The backpropagation training technique, often known as the generalised delta rule, is one of the most commonly used supervised learning algorithms in real-world applications.

Supervise Learning Algorithm
We fed input data and correct data to the network in the supervised learning method. The input data are propagated in a forward manner through the network until it reaches the output neurons, where it is compared to the correct data, and the error is calculated. We do not need to change anything in the network if the output contains no or minimal errors. If there is an error, we must modify the weights in the network to ensure that the network delivers accurate output in future learning, as shown in Figure 5. Supervised learning is the term for this method of weight modification. The backpropagation training technique, often known as the generalised delta rule, is one of the most commonly used supervised learning algorithms in real-world applications. MLP-ANNs are trained using the backpropagation (error backpropagation) algorithm. The rules of supervised learning (error corrective learning) are used in this method [32][33][34]. This algorithm involves two passes through the MLP-distinct ANN's layers: forward and reverse training. A pattern activity is applied to the MLP-input ANN's neurons in forward training, and its influence propagates across the network layer by layer, producing a set of output as the network's real-time response. The network's weights (synaptic weights) are fixed in this situation. All weights are modified according to the supervised learning rule during backward training.
In order to generate an error signal, the network's real-time response is deducted from the desired response. This error signal is then passed on to the rest of the network. MLP-ANNs are trained using the backpropagation (error backpropagation) algorithm. The rules of supervised learning (error corrective learning) are used in this method [32][33][34]. This algorithm involves two passes through the MLP-distinct ANN's layers: forward and reverse training. A pattern activity is applied to the MLP-input ANN's neurons in forward training, and its influence propagates across the network layer by layer, producing a set of output as the network's real-time response. The network's weights (synaptic weights) are fixed in this situation. All weights are modified according to the supervised learning rule during backward training.
In order to generate an error signal, the network's real-time response is deducted from the desired response. This error signal is then passed on to the rest of the network. Adjusted all the weights to bring the network's real-time response statistically closer to the Electronics 2022, 11, 1590 7 of 15 target response. In order to reduce the inaccuracy, all of the weights are changed using the generalised Delta rule [35,36].
The two most commonly used neuron activation functions for the neurons in Figure 3 are Sigmoidal (it is similar to the step function) and Tansig activation functions, σ(x). Both functions are continuously differentiable everywhere and typically have the following mathematical forms: Supervised Learning Algorithm notations used are:

The Error Calculation
Given a set of training data points t j and output layer O j , we can write the error as: We let the error of the MLP-ANN for a single training iteration be denoted by E. We want to calculate ∂E ∂W l jk , the rate of change of the error with respect to the given connective weight, so we can minimise it, see Figure 6. Now consider two cases: the neuron is an output neuron, or it is in a hidden layer.

Output Layer Neurons
As, t k is a constant.
where X k is input and output of neuron j.
Therefore, the output of the neuron j with a product of weight W jk is O j . For notation purposes, δ k is defined to be the expression (O k − t k ) O k (1 − O k ), so we can rewrite the above equation as:

Hidden Layer Neurons
As, ∂X k ∂O j (W jk ) is the input of neuron k to the output of neuron j.
Therefore, run the MLP-ANN forward training with input data to obtain the desired response. For each output neuron, compute δ k . For each hidden-layer neuron, calculate δ j . Update the weights and biases as follows: Therefore, apply w + ∆w → w and θ + ∆θ → θ , where η = learning rate, usually less than 1.

ANN Parameter
When designing a neural network, a lot of distinct parameters must be determined:

Learning Rate (η)
Learning rate is a training parameter that regulates the weight and bias adjustments in the training algorithm's learning. The greater the value of η, the greater the step and the faster the convergence. The learning algorithm takes a long time to converge if the value of η is small.

Momentum
Momentum is simply a fraction µ of the prior weight update added to the recent one. The momentum parameter is employed to keep the system from settling into a local minimum, often known as a saddle point. A high momentum parameter can also aid in speeding up the system's convergence. Setting the momentum parameter too high, on the other hand, runs the danger of overshooting the minimum, causing the system to become unstable. A low momentum coefficient does not dependably avoid local minima and also slows down the system's training.

Delta Learning Rule
The delta learning rule is a learning rule for updating the weights of the inputs to artificial neurons in ANNs. It is a special case of the backpropagation learning algorithm. In this rule, the input weights are continuously updated by taking the difference (the delta) between the desired response and the real-time response. This rule updates the weights in such a way that the mean square error of the designed ANN should be minimised. It is significant to confirm that the input data set is well randomised. A well-ordered and structured data set (in this case, designed ANN is incapable of learning the problem) cannot converge the desired response. ∆w = ηδ l O l−1 w + ∆w → w

Gradient Descent (GD) Learning Rule
This learning strategy (similar to the delta rule) updates the network weights and biases in the direction of the most rapidly decreasing performance function, i.e., the negative of the gradient. It is very beneficial for functions with a lot of dimensions. The new weight vector is recalculated based on: The minus sign denotes that the new weight vector w k+1 is travelling in the opposite direction as the gradient.

Gradient Descent Backpropagation with Momentum (GDM)
Momentum allows an ANN to adapt to the latest trends in the error surface as well as the local gradient. The ANN can disregard minor characteristics in the error surface because of momentum. An ANN without momentum may become stuck in a low local minimum. An ANN can slip through such a low momentum. By making weight modifications equal to the fraction sum of the previous weight change and the new change indicated by the gradient descent backpropagation rule, momentum can be added to the backpropagation method learning. A momentum constant µ mediates the magnitude of the influence that the last weight shift is allowed to have. When µ is set to 0, a weight change is purely determined by the gradient. When µ is 1, the new weight modification is equal to the previous weight modification, and the gradient is ignored. The new weight vector w k+1 is modified as follows [37,38]: Variable learning rate gradient descent backpropagation uses a bigger learning rate η when the intended ANN model is far from the desired response and a smaller learning rate η when the planned ANN model is near the desired response to speed up the convergence time. As a result, the new weight vector w k+1 is updated using the variable learning rate η k+1 as follows: where η k+1 = β η k and β = 0.7 i f new error > 1.04 (old error) 1.05 i f new error < 1.04 (old error) .
ANNs learning iterations, which are referred to be epochs, consist of two phases [39]: • For each training pattern in ANNs, feedforward propagation simply calculates the output values; • Backward propagation is where an erroneous signal is sent backward from the output layer to the input layer. The backpropagated error signal is used to alter the weights.

Results
Self-similarity is a fundamental property of nature, but its application in image processing is limited due to the finite resolution of digital images and the presence of noise. We have used five test images, see Figure 7. As a result, see Table 2, a self-similarity-guided denoising algorithm must adapt the patch size to the scale of the image content as well as the noise level. Similar neighbors may not exist if a patch is too large in relation to its surroundings. If a patch is too small in comparison to the noise level, however, it is difficult to appropriately identify similar patches among its nearby candidates using block-matching. It is thus difficult to restore tiny scale texture patterns when there is a lot of noise.

Results
Self-similarity is a fundamental property of nature, but its application in image processing is limited due to the finite resolution of digital images and the presence of noise. We have used five test images, see Figure 7. As a result, see Table 2, a self-similarityguided denoising algorithm must adapt the patch size to the scale of the image content as well as the noise level. Similar neighbors may not exist if a patch is too large in relation to its surroundings. If a patch is too small in comparison to the noise level, however, it is difficult to appropriately identify similar patches among its nearby candidates using block-matching. It is thus difficult to restore tiny scale texture patterns when there is a lot of noise.    ANNs are a feasible answer since they are specifically trained to approximate a conditional expectation, which makes them scale and noise conscious. However, they require an even broader observation window to appropriately handle large-scale and often highly organised patterns on their own, which unfortunately comes at a considerable computational cost due to their extensive hidden layers and deep architectures.
To scale down big ANNs, one must make use of self-similarity and ANNs' complementary nature. Therefore, by using a self-similarity-based method for larger-scale patterns and small neural nets for smaller-scale patterns, one can have the best of both worlds without having to worry about the technical details of patch size. Despite this, a filter based on a hard scale classification is more prone to errors than the ideal conditional expectation decomposition's soft weighting scheme, and the accuracy of the [11] texture detector has not been affected by noise.

Importance of ANN in Various Fields
ANNs have a number of advantages that make them ideal for a variety of issues and situations [40]:

•
ANNs can learn and model non-linear and complicated interactions, which is critical because many of the relationships between inputs and outputs in real life are non-linear and complex; • ANNs can generalise-after learning from the original inputs and their associations, the model may infer unknown relationships from unknown data, allowing it to generalise and predict on unknown data; • ANN does not impose any limits on the input variables, unlike many other prediction algorithms (such as how they should be distributed). Furthermore, several studies demonstrated that ANNs could better predict heteroskedasticity, or data with high volatility and non-constant variance, because of their capacity to learn latent links in the data without imposing any predefined relationships.
ANNs are important because of some of their great properties: • Image Processing and Character Recognition: ANNs play an important role in image and character recognition because of their ability to take in a large number of inputs, process them, and infer hidden as well as complex, non-linear correlations. Character recognition, such as handwriting recognition, has a wide range of applications in fraud detection and even national security assessments. Image recognition is a rapidly evolving field with numerous applications ranging from social media facial recognition to cancer detection in medicine to satellite data processing for agricultural and defense uses. Deep ANNs [6,7], which form the backbone of "deep learning", have now opened up all of the new and revolutionary developments in computer vision, speech recognition, and natural language processing-prominent examples being self-driving cars, thanks to ANN research [16,17,20,40]; • Forecasting: Forecasting is widely used in everyday company decisions (such as sales, the financial allocation between products, and capacity utilisation), economic and monetary policy, and finance and the stock market. Forecasting difficulties are frequently complex; for example, predicting stock prices is a complicated problem with many underlying variables (some known, some unseen). Traditional forecasting models have flaws when it comes to accounting for these complicated, non-linear relationships. Given its capacity to model and extract previously overlooked characteristics and associations, ANNs can provide a reliable alternative when used correctly. ANN also has no restrictions on the input and residual distributions, unlike classical models. For example, recent breakthroughs in the use of LSTM and Recurrent ANNs for forecasting are driving more research on the subject [4][5][6]11,40].

Conclusions and Future Work
We focused on the conditional expectation in this paper because it is not only essential for understanding most patch-based denoising techniques, but it is also naturally easier to evaluate and approximate than the underlying law. Furthermore, we demonstrated through studies that small ANNs have two advantages: they can survive considerably higher noise on small-scale texture patterns than BM3D and its derivatives, and the notion of scale is clear due to their set input size. Moreover, the backpropagation learning algorithm is an efficient technique for denoising of digital images without prior knowledge of the degradation model (noise/blurring). This algorithm works perfectly once an ANN is trained properly. Here, we see more training data samples give good performance for the existing model. Since the ANN model used here is complicated if the number of hidden layers goes above the typical value, which degrades the performance of the network. One of the expected domains to work in the future is the use of multiple copy multi-layer perceptron (MC-MLP), patch selection, self-constructing ANN model, etc., to obtain better results.