2.1. Frequency Domain Filtering of Overlapping Patches
For some simplicity, but without loss of generality, we focus in this work on filtering grayscale images. Filtering of color images can be reduced to channel-wise processing of grayscale images, and therefore this simplification really does not affect anything.
Let
be an image. We may consider it as a superposition of a desired component
and a noise component
. As we focus here on additive and multiplicative noises, the respective representations of this superposition are additive
and multiplicative
. The process of image filtering consists of finding an approximation of an ideal image
such that the difference between a filtered image and the ideal (noise-free) image is minimized according to a given metric. Usually, the root mean square error (RMSE) is used as this metric.
Minimization of RMSE means maximization of the peak signal-to-noise ratio (PSNR).
where
L is the maximum possible pixel intensity value for a grayscale 8-bit image
. Specifically, PSNR is usually used to evaluate the quality of filtering.
Many classical image filtering methods [
18,
19,
20,
21,
22,
23] usually utilize linear and nonlinear filtering in the spatial domain. However, filtering in the frequency domain can also be highly effective. For example, one of the most effective filters for Gaussian and speckle noise reduction—BM3D—processes images in the frequency domain. It is a common understanding that a visible noise component in the case of additive and multiplicative noise primarily affects higher frequencies. This can be observed in
Figure 1.
Hence, filtering of this kind of noise should typically be utilized via low-pass filtering preserving low frequencies and carefully and selectively suppressing high-frequency components affected by noise. This means that we may consider the process of filtering in the frequency domain as the detection of high-frequency components affected by noise followed by their suppression. A crucial part of this process is in fact the accurate identification of the components affected by noise and their correction. It is important to keep in mind that high-frequency components also contain information about small details and object boundaries. Therefore, while we are interested in noise filtering, we need to do our best to preserve useful information. By simply suppressing all high-frequency components, we may lose important information in a filtered image.
The design of filters is a sophisticated task. Some filters work effectively for a specific type of noise but prove to be ineffective for other types. This demands either modifying an existing filter or developing a new one. Such a situation drives the search for tools that enable the efficient design of new filters by implementing complex functional dependencies between the ideal and noisy images, thereby facilitating high-quality noise removal.
We employ here a neural network as such a tool. The idea behind the use of a neural network in image filtering is based on the ability of the network to learn from its environment. As a neural network may learn a lot of things from data, there are many reasons to believe that it may learn how to detect and suppress noise while at the same time preserving image details. Neural networks have successfully been applied to image filtering for more than a decade. Successful use of MLP for noise filtering was shown, for example, in [
44,
45]. So far, all applications of neural networks in image filtering have been in the spatial domain. That is, in all these applications a neural network developed a certain spatial domain filter. Here we suggest using a neural network for frequency domain filtering. We would like to use a neural network as a tool that is able to determine which frequencies are affected by noise and which are not or less affected. Simultaneously a neural network should be able to synthesize (design) convolutional kernels for noise filtering in the frequency domain resulting from the learning process. It is natural to use the MLMVN [
47] for solving this problem, as a complex-valued network that is suitable to work with the complex-valued data in the frequency domain.
Nowadays images typically may have a large size. Thus, on the one hand, processing a large digital image as a whole using a neural network can be a highly resource-consuming task. On the other hand, to design a robust filter through the learning process, we would need to use many patches from many various images rather than some large image(s) as a whole. Thus, our idea is to focus on filtering relatively small overlapping patches. Each patch in this case should be filtered as a whole, while a resulting image should be created by averaging over all the overlapping pixels.
Hence, we focus on training MLMVN to design frequency domain convolutional kernels based on taking the Fourier transform of a noisy patch from an artificially corrupted image as an input and the Fourier transform of a respective clean patch from a noise-free image as a desired output. This approach also simplifies the adaptation of a neural network to specific data, as fragments contain significantly fewer details compared to the entire image. While processing local regions, the global context of an image is preserved through the overlapping areas of the fragments. Since fragments are much smaller than an entire image, their processing requires considerably fewer computational resources. Additionally, each fragment can be treated as an independent unit, enabling the parallel processing of multiple fragments.
Thus, our idea is basically to reconstruct the Fourier transform of a noise-free patch from the Fourier transform of its noisy version using a neural network whose input is the Fourier transform of a noisy patch and desired output is the Fourier transform of a corresponding noise-free patch. To create a respective representative learning set, many clean images should be corrupted by noise, noisy patches should be randomly selected from each image, their Fourier transforms should be used as inputs, corresponding clean patches should be selected, starting with the same coordinates from the respective clean images, and their Fourier transforms should be used as desired outputs.
We suggest employing MLMVN with a single hidden layer and an output layer with the same number of output neurons as the number of Fourier coefficients in a respective patch’s Fourier transform. With this topology of MLMVN, its two layers of neurons perform the following tasks. Every neuron in a single hidden layer develops a frequency domain convolutional kernel through the learning process. Therefore, each neuron in a single hidden layer performs a frequency domain convolution, multiplying component-wise its weights by the respective Fourier transform coefficients of a patch to be processed.
Each neuron in the output layer estimates a respective Fourier coefficient of a noise-free patch based on the outputs of all hidden layer neurons.
The selection of patches for constructing the training set and the learning process is described below in
Section 2.2. After a learning set is created, the learning process should start. After a neural network develops its weights through the learning process, it should be used to filter images. To extract patches from an image, we use a window of size
. By moving this window across the image using steps
and
(where
corresponds to the step along the
x-axis and
to the step along the
y-axis), we extract the intensities contained within the window into a patch.
As a result of this operation, a set of overlapping patches is generated if
and
. This procedure is illustrated in
Figure 2.
For the reader’s convenience, we provide an algorithm (Algorithm 1) for extracting a set of overlapping patches from an image to be processed. It should also be mentioned that the provided algorithm describes the process of building a set of patches for the actual image filtering process.
Algorithm 1: Extracting a set of overlapping patches from the input image |
- 1:
Input: - 2:
I: An image of size - 3:
m: The patch width - 4:
n: The patch height - 5:
: The step size along x-axis - 6:
: The step size along y-axis - 7:
Output: - 8:
P: A set of overlapping patches - 9:
Procedure: - 10:
- 11:
for to by do - 12:
for to by do - 13:
Extract a patch p: - 14:
- 15:
end for - 16:
end for - 17:
return P
|
Since we perform processing in the frequency domain, after a set of overlapping patches
P is built, we need to create
—a set of Fourier transforms of these patches—in the following way:
The vectorized Fourier transform
should be used as an input sample for MLMVN. As was mentioned above, we employ a shallow MLMVN with a
topology for processing frequency domain data. This means that a neural network consists of an input layer with
k inputs, a single hidden layer with
q neurons, and an output layer with
k neurons. The number of inputs
k, which is also equal to the number of neurons in the output layer, depends on the size of the image fragments being processed (
). The number of hidden neurons
q should be determined based on experimental testing.
It is important to make a remark about activation functions, which we use in our network. Two activation functions have been considered so far for MVNs—discrete and continuous ones. But both these functions project a weighted sum of MVNs onto the unit circle [
46]. In MLMVN, which is used here, all hidden neurons employ a standard continuous MVN activation function
. At the same time, neither discrete nor continuous activation functions producing an output located on the unit circle can be used for output layer neurons. As our goal is to use MLMVN to estimate the Fourier transform of a clean patch, output neurons should produce an output not necessarily located on the unit circle. This requires the use of a different type of activation function capable of producing an output with an arbitrary (not necessarily unitary) magnitude. We used three such activation functions in our experiments: a linear activation function (when a weighted sum becomes an output), a complex-valued sigmoid activation function, and complex-valued hyperbolic tangent activation function. The respective results and analysis of the network behavior with all three of these activation functions are presented in
Section 3.
As a result of processing elements of
with a neural network, we obtain a set of vectorized filtered overlapping patches
in the frequency domain. Elements of
can be used to obtain a set of filtered overlapping patches in the spatial domain in the following way:
The goal of cropping the resulting image patch in the spatial domain is to remove possible unwanted artifacts and distortions that occur while processing the boundary regions of an image. The set
is used for the “synthesis” of a filtered image.
It is important to note that after processing a patch with MLMVN, we restore a zero-frequency coefficient of the Fourier transform by setting it equal to a zero-frequency coefficient of the respective input (unprocessed) patch. This step is valid because our modeled noise has a zero mean. Thus, we can preserve a respective mean over a patch that is being processed. This is important to avoid a shift in patch intensities, which would occur if the mean value were modified.
For reconstructing an image from , we use a window of size . By “moving” this window starting from the top-left corner with vertical step t in the range and with horizontal step r in the range , and collecting all patches that lie inside the window at each step, we are able to reconstruct an image fragment of size . This should be carried out by combining the intensities of these overlapping patches. Intensities in areas where patches overlap are restored by applying an aggregation function. In this work, we employed median and mean aggregation functions. Below, an algorithm (Algorithm 2) for the “synthesis” of a filtered image from processed overlapping patches in the frequency domain is provided.
In general, the image noise filtering process proposed in this paper can be illustrated in
Figure 3.
The algorithm for the complete process of filtering a noisy image is outlined here (Algorithm 3).
Algorithm 2: “Synthesis” of a filtered image from processed overlapping patches |
- 1:
Input: - 2:
: A set of filtered overlapping patches - 3:
m: The width of a patch - 4:
n: The height of a patch - 5:
t: The width of a window - 6:
r: The height of a window - 7:
: An aggregation function - 8:
Output: - 9:
: A filtered image - 10:
Procedure: - 11:
for to by t do - 12:
for to by r do - 13:
- 14:
Collect the patches from : - 15:
- 16:
Set the image intensities in the window: - 17:
- 18:
end for - 19:
end for - 20:
return
|
2.2. Organization of the Learning Process
The learning process we utilized in this work for training MLMVN is based on batch learning with validation. The main advantage of learning with validation is that it makes it possible to verify whether a neural network has developed a generalization capability. As learning with validation is based on the minimization of the validation error, it stops when a network is capable of generalizing with a desired accuracy. This also helps to avoid overfitting, which may often affect learning based on the minimization of the learning error (that is, the error on the learning set). Overfitting may occur from repeated attempts to “memorize” a learning set by achieving a low learning error. But the actual goal of learning is to develop a generalization capability (that is, to deal with the data that were not used to adjust the weights), not to “memorize” a learning set. This leads to improvement in robustness. In the learning process with validation, a dataset is divided into three subsets: a learning (training) subset, a validation subset, and a test subset. The training subset is used to adjust the weights, while the validation subset is used to verify whether a desired generalization capability has been achieved. Then the test subset is used to verify the results of the learning process.
Algorithm 3: The process of filtering noise in a digital image using MLMVN |
- 1:
Input: - 2:
: A noisy image - 3:
m: The width of a patch - 4:
n: The height of a patch - 5:
Output: - 6:
: A filtered image - 7:
Procedure: - 8:
Normalize the image: - 9:
Split the image into patches forming the set P: - 10:
- 11:
for do - 12:
Convert from the spatial to the frequency domain: - 13:
Normalize : - 14:
Process with the MLMVN: - 15:
Restore the zero-frequency coefficient: - 16:
Denormalize : - 17:
Convert to the spatial domain: - 18:
- 19:
end for - 20:
Synthesize the filtered image using set : - 21:
return
|
Model training was performed separately for each type of noise. To design training, validation, and test sets, a set of grayscale images of different sizes was used. It was split into three subsets,
(training),
(validation), and
(test), such that
. The training dataset consists of pairs of vectorized Fourier transforms of noisy and corresponding clean image patches. The training dataset was built using set
, which comprised 300 grayscale images. It is important to note that as a part of preprocessing, we normalize the images by dividing their intensities by 255, so they have a range
after normalization. This is important because a neural network learns faster and generalizes better when it deals with normalized data. For each image
, an image
was created by applying noise (additive Gaussian noise or multiplicative speckle noise). Additive Gaussian noise was modeled with the following levels,
,
, and
, while multiplicative speckle noise was modeled with levels
,
, and
, where
is the standard deviation of the ideal (clean) image. After applying the corresponding noise,
h random patches of size
were picked from the clean image
I. In our experiments, we selected 200 patches per image. The same number of patches, of the same size and at the same spatial coordinates, were picked from each corresponding noisy image
. As a result, two sets of image fragments were created:
and
where
are spatial coordinates of top-left corner of patch and
l is the number of images in
. Each element of the
and
sets was converted from the spatial domain to the frequency domain by computing a two-dimensional Fourier transform and normalized using the factor
. After vectorizing each image fragment in the frequency domain, the following sets were built:
and
Thus, the training set for MLMVN was formed from elements of
and
in the following way:
The validation dataset was built using
g full-size images from
. Additive Gaussian or multiplicative speckle noise was applied to each image from
, thereby creating a set of noisy images
similarly to how noise was added to create a training set. Using pairs of ideal and noisy images, the validation dataset was constructed in the following way:
In our work, MLMVN training was performed using the batch learning algorithm proposed in [
60,
61]. This algorithm is based on a derivative-free approach and enables the correction of neuron weights across multiple learning samples simultaneously (that is, across an entire batch).
As noted above, correction of the network weights was performed using samples from the training dataset
L. During our experiments, we used a training dataset with 60,000 learning samples created from 300 grayscale images (as 200 patches were randomly selected from each of the 300 images). To employ the batch learning algorithm, learning samples were grouped into batches containing
b samples each. We employed a batch size of 20,000 learning samples. The justification for this choice is provided in
Section 3.2. Our learning process is based on the maximization of the mean value of the validation PSNR evaluated over the images from the validation set. During our experiments, we employed a validation set, which consists of five pairs of noisy and clean grayscale images.
After each batch learning step was completed, a validation step was performed. During the validation, image pairs from the validation dataset V were processed using the algorithm described in the previous section of this article (see Algorithm 3). The PSNR of the filtered image relative to the clean image was computed for each pair of images from V. The average PSNR value across all elements from V was compared with the threshold PSNR value, and the learning process either was stopped if the current average validation PSNR reached or exceeded a pre-determined threshold value or continued if it was below it. The algorithm for the learning process is outlined here (Algorithm 4)
The final performance evaluation of the proposed filtering approach was performed using the images from the
set. It is important to note again that these images were not used in the design of either the training set or the validation set.
Algorithm 4: The MLMVN learning process |
- 1:
Input: - 2:
L: A training set - 3:
V: A validation set - 4:
: The desired PSNR on the validation set - 5:
: The besired PSNR on a batch - 6:
n: Tha size of a batch - 7:
iterations_limit: The maximum number of iterations per batch - 8:
Output: - 9:
A trained MLMVN - 10:
Procedure: - 11:
Build a set of batches B: - 12:
- 13:
repeat - 14:
for do - 15:
- 16:
- 17:
while and do - 18:
Update the MLMVN weights - 19:
Process the samples from b with the MLMVN: - 20:
- 21:
- 22:
end while - 23:
Process the images from V with the MLMVN: - 24:
- 25:
if then - 26:
break - 27:
end if - 28:
end for - 29:
until - 30:
return A trained MLMVN
|