1. Introduction
Synthetic Aperture Radar (SAR) technology [
1] is widely used for the identification of planet surface objects and scenarios in any weather condition, such as tracking ships and oil spills, terrain erosion, drought and landslides, deforestation, and fires [
2]. For all these reasons and more, the modern challenges addressed by SAR make it an extremely important technology. Current Automatic Target Recognition (ATR) works classify targets present in SAR images, which implies the use of an image reconstruction algorithm on captured SAR echo data [
3]. This approach is computationally expensive and, therefore, is infeasible for time-sensitive applications such as defense and surveillance. Furthermore, each data transformation accumulates precision errors that reduce the overall classification precision.
Using a neural network to recognize targets directly from the SAR echo data alone is a promising alternative for object classification, as it removes the need to process the data into an image, an often time-consuming operation, and reduces the source of errors. Raw data classification is expected to achieve results with higher accuracy than SAR image classification, since the final image lacks the details still present in the echo data. The use of a neural network to process SAR echo data directly is crucial to develop the speed and cost of SAR ATR tasks and guarantee that these solutions can be implemented in compact low-cost boards.
The neural network models to be executed onboard must be carefully designed to avoid high computing requirements and energy consumption. So, the structure of the proposed neural network model was found through a design model exploration and mapped to an embedded low-cost, yet powerful, device for low energy consumption.
In SAR ATR, situations where the captured target configuration differs from the training data configuration can considerably degrade the accuracy of the results. These conditions are referred to as Extended Operation Conditions (EOC) [
4]. Situations where the target configuration is closer to matching the data on which the network was trained are referred to as Standard Operation Conditions (SOC). In this work, both conditions were considered to show the robustness of the proposal.
Training a neural network for ATR tasks requires datasets that contain the raw SAR echo data of labeled targets. Unfortunately, there is a tremendous lack of SAR datasets overall, much less ones that fit these conditions. The only proper dataset that could be found was Moving and Stationary Target Acquisition and Recognition (MSTAR). It is a dataset that contains raw data and images of a good variety of targets in various conditions. The targets are centered in each data sample, making it simpler for ATR applications. It is also the most widely used dataset in the related works, which allows a more straightforward comparison of the accuracy results.
This work proposes an optimized neural network to classify targets captured by SAR directly from the SAR echo data. A simple, yet effective, novel neural network is designed and trained to classify SAR echo data. The proposed network classified both SOC and EOC with extreme accuracy. The neural network was optimized and implemented on two single board computers, Khadas VIM3 and Raspberry Pi 5 due to their small Size, Weight, and Power (SWaP) constraints. The experimental results demonstrated that the proposed networks can classify data from the MSTAR dataset [
5] on embedded devices with accuracies above 99% in both SOC and EOC.
2. Related Work
Early machine learning algorithms for the classification of objects from SAR images focused on identifying features such as geometric features [
6], histograms of oriented gradient [
7], fusion features [
8], scattering features [
9], and statistical features [
10], among others.
Recently, with the advent of deep learning, these features are automatically learned, achieving better performance. Object classification applied to SAR has been proposed as a means to extract information from captured scenes to aid in quicker and/or automated decision-making. Some works focus on target classification based on SAR images. The work in ref. [
11] was one of the first approaches to consider Convolutional Neural Network (CNN) for SAR image classification. They proposed All-convolutional Network (A-ConvNet), a network with sparse layers to reduce the overfitting problem. They also established the configuration for SOC and EOC in most future classification works that also use MSTAR [
5]. SOC implies training with data captured at a 17° depression angle and testing with a 15° depression angle data. EOC implies training with the same type of data but testing with a 30° depression angle data. Any mention of SOC and EOC throughout this article refers to these conditions, unless stated otherwise. The network achieved an accuracy of 99.13% for SOC 87.40% for EOC. The work also explores the introduction of random noise in the input data observing a large impact on the accuracy.
To deal with scarce SAR data, CHU-Net [
12], a CNN network with dropout to avoid overfitting from scarce data, was proposed to classify SAR images. When using all data, the model achieved an accuracy close to 99%, which dropped to 94% when trained with a small subset of the training data.
Another work looking for a solution to scarce data [
13] proposed Amplitude–phase Convolutional Neural Network (CNN), a CNN that considers both the amplitude and phase of SAR data. This improves the accuracy of the algorithm in EOC when trained with scarce data compared to methods that only consider the amplitude. The CNN achieved accuracies of 98.10% and 93.57% in SOC and EOC, respectively. A two-step approach was considered in ref. [
14] to deal with a lack of training data. An initial step uses a CNN to extract features to train a second CNN for SAR image classification.
Hybrid models have also been considered for target classification of SAR images. Some considered the integration of two types of deep learning models such as ref. [
15], where a CNN was combined with a Long Short-term Memory (LSTM) model. A CNN was used to extract features, enhanced with a spatial attention module, followed by LSTM that fused features from adjacent azimuths when multiview images were present. This network achieved accuracies of 99.38% and 95.57% in SOC and EOC, respectively. Other hybrid approaches considered a deep learning model combined with traditional machine learning algorithms. In ref. [
16], a histogram-oriented gradient was combined with a CNN enhanced with the attention mechanism. Two SAR ship datasets were tested: OpenSARShip [
17] and FUSAR-Ship [
18]. Their network, HOG-ShipCLSNet, achieved accuracies of 78.15% and 86.69% with each dataset, respectively.
In ref. [
19], data augmentation techniques such as clutter transfer were applied to images in the MSTAR dataset [
5] to improve the robustness of target recognition, with a specially tuned ResNet18 network. The work achieved an accuracy of 97.2%. With non-ideal ResNet18 parameters and contrast balancing, it achieved an average accuracy of 88.5%. It also considered experiments with changing the background clutter and generating synthetic images, with a minor impact on the accuracy.
Transformers and attention mechanisms are at the core of the most recent SAR image classifiers [
20,
21,
22]. One of the most recent works [
22] achieved accuracies of up to 99.79% in SOC and up to 98.52% in EOC on MSTAR [
5]. The work also achieved an approximate accuracy of 84.25% throughout the different versions of the OpenSARShip [
17] dataset.
Deep learning-based methods are effective in identifying ships in SAR images. However, echo data have to be preprocessed with correction functions and converted to an image, which is finally processed with a target detection algorithm. Transmission of echo data should be included in the processing flow if data are processed in a ground station. For real-time target detection onboard, the computational and energy onboard requirements are high. Taking this into account, a new research direction was taken focused on ATR directly applied to the raw SAR data.
Recent works [
23,
24] have focused on ATR with raw Ground-based Synthetic Aperture Radar (GBSAR) data. GBSAR is a variation of SAR that is typically applied in indoor environments. These works used a custom-made sensor attached to a rail to capture small objects of different shapes and materials. In ref. [
23], a modified ResNet18 network was trained to perform multilabel classification on three bottles of varying materials. Various experiments were conducted, such as different weight initializations and comparing the raw data and image classification results. Raw data classification achieved the best results with a mean F1 score of 88.24%. In ref. [
24], the same modified ResNet18 was used to train on GBSAR data with different polarizations mixed into the input in different ways, such as mixing the rows of data or appending horizontal polarity data to vertical polarity data, referred to as JOIN. The previously mentioned article only used data with horizontal polarization. This article also created a siamese model that combined the results of two separate ResNets, trained on each polarization. The JOIN model resulted in the highest accuracy, which was 93.06%.
Fast Range-compressed Detection (FastRCDet) was proposed in ref. [
25], a novel lightweight network for ship detection that accepts range-compressed SAR echo data as its input. The network was conceived to detect ships onboard the SAR platform. They also proposed a network to adapt the data to the range-compressed domain. The lightweight network with 2.49 M parameters was able to detect ships with an average accuracy of 77.12%.
SAR image classification works achieve accuracies around 99% but require an SAR image formation step and complex CNN models. Works on SAR echo data classification are still in their infancy, but the results are promising. The work proposed in this paper contributes to ATR on raw SAR echo data. An optimized neural model is designed to achieve high accuracy, avoid overfitting due to scarce data, and reduce memory and computing complexities for onboard execution at low energy.
3. Proposed Method: Neural Network
The main goal of this work was to define the smallest and simplest possible neural network architecture that could produce highly accurate SAR classification results. All network sizes that were tested had the base architecture shown in
Figure 1. Each hidden layer is dense and uses ReLU as the activation function. In this diagram,
i corresponds to the number of input neurons,
k corresponds to the number of output neurons, and
j is the number of neurons in the first hidden layer. The number of outputs depends on the conditions the network is for: SOC or EOC. The
k value depends on the dataset used and corresponds to the output classes. The
j value is the main focus of the network size experiments. Values between 60 and 20 were tested. By default, the
i value is the size of a raw echo data sample in the dataset being used.
3.1. Dataset
The MSTAR dataset [
5] is widely used for SAR ATR tasks. It contains labeled raw data and images of various armored vehicles. The dataset also contains data captured in different conditions, mainly different depression angles. It was chosen for its availability, popularity, and simplicity.
The most abundant data present in the dataset had a depression angle of 17°. These data included 7 classes with 300 samples each: 2S1, BRDM_2, D7, SLICY, T62, ZIL131, and ZSU_23_4. These data were determined for training. Data with a depression angle of 15° were divided into the same 7 classes but only contained 274 samples each. Since 15° and 17° depression angles are similar, the 15° depression angle data were used for testing the network in SOC. There were only 4 classes that had data with a 30° depression angle: 2S1, BRDM_2, SLICY, and ZSU_23_4. This extreme angle was chosen for training the proposed networks in EOC.
Inspired by ref. [
11], the data with varying depression angles were used to test the robustness of the proposed network. In SOC, 17° data were used to train the network on all 7 classes, and 15° data were used to test. EOC refers to the conditions in which a target can be captured that are outside the expected, leading to lower accuracy ATR results [
26]. In this case, MSTAR data, captured with a depression angle of 30°, were used for testing a network trained in 17° angle data to see how the network classified these extreme conditions.
3.2. Preprocessing
An initial problem was identified with the data present in the dataset: the size of the samples varied between classes. The input size of the network must remain constant. In image classification tasks, it is common to resize the image to 224 × 224, not only to make the input size constant but also to reduce the processing times of training and inference. However, raw data should not be treated like an image. The chosen solution was to create a range of the data that had to at least contain the target (typically the center of the signal or, after image processing, the center of the image). This range of data is referred to as a “window”. Signals shorter than the window’s size were given 0 padding, while signals that were longer were cut.
Two of the classes, 2S1 and ZSU_23_4, had approximately the same size of 25,000 data values. The window was therefore defined as having a range of 0–25,000. The other classes were verified to confirm that a window of such a range did not cut off the target, which is the object to classify. If there are ever other datasets that share this size problem, the same method can be applied. Otherwise, in datasets with consistent raw data sample sizes, the window serves only in the case described in the following subsection.
3.3. Model Optimization
The first experiment to shorten the input size involved reducing the size of the input data by altering the start and length of the data window. The window reductions have a limit, as they must contain the most important part of the signal: the target. As an example for the MSTAR dataset, a window region of 1000–20,000 was determined to be suitable for all classes. This results in an input size of 19,000, a reduction in 6000 data values.
Figure 2 displays plots of raw data from randomly selected samples of 6 classes. It also contains a square drawn in each plot corresponding to the window region of 1000–20,000. The peaks in the middle of each signal correspond to the targets. SLICY, the class with the shortest samples, has a target that barely fits in the defined window interval. The network was trained without weight initialization or dropout.
The sampling of the input data is another aspect that was explored. This process consists of using only every Nth data value of the input, effectively reducing its size. Multiples of two between 2 and 8 were tested. In the MSTAR dataset’s case, with the shortened window size and sampling with N > 1, the input size varied between 10,000 and 2500.
4. Results
All the generated network models were trained using categorical cross entropy as the loss function, Adam as the optimizer, a learning rate of 0.001, a batch size of 32, and ten epochs. The learning rate and number of epochs were chosen to match the size of the data, which is small compared to the usual 224 × 224 image in image classification tasks.
The MSTAR dataset was used to experiment on the size of the network. A public tool called “mstar2raw” was used to separate the magnitude data from the full raw data samples. These data were then converted to .csv files.
4.1. Standard Operation Conditions
The SOC accuracy results were tested for different sizes of the architecture. The 17° depression angle data were split into training and validation groups (80/20), and the 15° depression angle data, present in the same dataset, were used for testing.
Table 1 shows the accuracy of each attempted network. The Layer 1 and Layer 2 columns show the number of neurons in the respective hidden layers of the network. The table shows smaller networks perform better. It is suggested that this is due to the size of the small dataset compared to the larger networks. However, the resulting accuracy differences are extremely small; so, they are most likely the result of a margin of error.
The experiments ended with a network with 20 neurons on the first layer and 10 on the second layer. This stopping point was chosen, because far worse accuracy results were expected to occur for very small networks, especially after the reduction steps mentioned in
Section 3.3, which are further explored in the following subsection.
This network is very simple, but the size of the input data implies a high number of parameters, since the input size is 25,000, and the input is fully connected to the neurons of the first hidden layer. The window size reduction described in
Section 3 is applied in the following experiments.
The results of the window size reduction can be seen in
Table 2.
As can be observed from the results in the table, the model size can be reduced to one third with a negligible degradation in accuracy.
Next, the sampling of the input data, as described in
Section 3, was applied to further reduce the network.
Table 3 shows the results of these experiments, in addition to having the window range of 1000–20,000. The resulting accuracy is extremely high in all experiments with SOC.
4.2. Extended Operation Conditions
Unlike the 17° and 15° depression angle data, MSTAR only contained four classes of 30° angle data: 2S1, BRDM_2, SLICY, and ZSU_23_4. Due to this lower number of classes, the network was trained with the respective classes in 17° depression angles. This is, the network was trained and tested on the same four class types. Similar to the previous experiments in SOC, several networks were trained to determine how small the network could be before noticeable drops in accuracy were observed. The 30° angle data were also used as the validation group to observe drops in accuracy. As in SOC, the initial experiments were performed with a window size of 0–25,000 and a sampling of 1. The results can be seen in
Table 4. The network sizes are the same as in SOC,
Table 1; so, those columns were omitted. However, due to the difference in conditions between the training data and the testing data, the accuracy fluctuated more than in SOC. Therefore, new columns were added that display the final and best accuracies of the network, along with the epoch in which it achieved the best accuracy.
The results show a large decrease in average accuracy starting in the network architecture, which has 30 neurons in the first layer. Therefore, the one with 40 neurons in that layer was selected for further experimentation.
Table 5 shows the results of this network with a window size of 1000–2000 and a varying sampling number. The network sizes are present in the table for ease of access.
4.3. Comparing Results to Related Works
In SOC, the smallest tested network still manages to achieve an accuracy of 99.896%. Therefore, the proposed network for SOC is the network with 20 neurons in the first hidden layer, 10 neurons in the second layer, a window size of 1000–20,000 on the input data, and a data sampling of N = 8.
In EOC, a small decrease in precision was observed in the smallest tested networks. The proposed network for EOC has 40 neurons, a window size of 1000–20,000, and a data sampling of N = 4. This was chosen for its accuracy and reduced size. The network with 50 neurons in the first layer shares the same best accuracy but is larger. The best accuracy was observed in the sixth epoch at 99.48% before decreasing to 98.87% by the tenth epoch.
Yoon et al. [
22] report values that come close, with 99.79% on SOC and 98.52% on EOC. However, an input image requires the raw data to be processed into an image first, which implies a more power-consuming and time-consuming “preprocessing” step.
From the related works that use raw data as input, the highest accuracy achieved was 93.06% on GBSAR using a modified ResNet. A common thread in these works is the use of deep neural network architectures commonly applied to image classification tasks on raw SAR data. According to our results, simple fully connected neural networks can be trained to achieve higher target classification accuracy, even in EOC.
Table 6 and
Table 7 compare the results of the proposed networks with other networks from the related works, considering SAR images and raw SAR echo data, respectively. The highest accuracies in each category (images and raw data) for each set of conditions, SOC and EOC, are in bold. Comparatively, the proposed network achieves a higher accuracy, but the difference in datasets and methodology prevent a fair comparison. The lack of source code availability for these works further hinders a fair comparisons.
From the tables, it can be observed that the proposed models are considerably less complex (number of parameters and operations for a single inference) when compared to the other two works with direct processing of raw SAR echo data. The accuracy is also higher for the proposed models. However, the datasets were different in all cases, since the datasets considered in the compared works could not be accessed.
4.4. Embedded Device Implementation
The proposed networks were first implemented on a desktop computer with an AMD Ryzen 9 7900X3D Central Processing Unit (CPU) with 12 cores, 64 GB of Random-access Memory (RAM), and an Nvidia GeForce RTX 4080 running CUDA 12.0. To test their effectiveness in more limited conditions, the networks were also implemented on a variety of embedded devices to compare each of their inference speeds and power consumption. All network implementations were performed in Python 3.8.3. The tested devices were VIM3 from Khadas and a Rapsberry Pi 5. The Khadas VIM3 possesses a quad-core ARM Cortex-A73 CPU running at 2.2 GHz, and the Rapsberry Pi has a quad-core ARM Cortex-A76 CPU running at 2.4 GHz.
Table 8 shows these metrics. The reported power consumption corresponds to the highest wattage observed during inference. Inference times were measured while inferring with batches of 32 raw data samples. EOC has a longer inference time due to the larger size of the network proposed for these conditions. The difference in power consumption between the NVIDIA Graphics Processing Unit (GPU) and the embedded devices highlights the different applications of such devices, as a flying SAR equipped platform would not be able to host said GPU effectively. The GPU would require too much power. Still, the device is reported here to compare inference times with the related works.
Some of the related works provided inference time metrics as well. These metrics would commonly be displayed in Frames per Second (FPS); so, they were converted to milliseconds for comparison purposes.
Table 9 compares the results between the related works. Only Yoon et al. [
22] showed the results for the different conditions. The proposed network of Yoon et al. [
22], besides having the highest accuracy when compared to other related works, also reports the quickest inference times. However, the network is slower than the network proposed in this article, which was executed on an inferior GPU (RTX 4080 vs. RTX 4090).
6. Conclusions
A small neural network has been proposed for onboard target detection directly from SAR echo data. This avoids the utilization of a costly SAR image formation algorithm and allows its efficient execution onboard with low power.
As the results show, the proposed network for SOC, the smallest tested network, can achieve accuracies as high as 99.896%. EOC calls for a slightly larger network, to achieve accuracies of 99.48%. Comparatively, the proposed network achieves a higher accuracy, but the difference in datasets and methodology prevents a fair comparison. In addition, the sizes of the proposed networks make them energy- and resource-efficient, facilitating their implementation in embedded devices for installation on the moving SAR platform.
Future work will be devoted to more complex scenes featuring overlapping and densely co-located targets. Additionally, different architectures can be further compared to narrow down the best methodology in these new experiments.