In this section, we will detail the activity carried out to test the performance of the system. To this aim, first, we present the testing framework we have implemented; then, we discuss the numerical results.
5.1. Testing Framework
The overall framework considered in this paper has been implemented in GNU Radio 3.9 [
16], while also exploiting external Mathworks Matlab [
17] modules for modeling the underwater channel according to the Stojanovic underwater channel model [
15] and as illustrated in the previous section. This module indeed accounts for multipath, frequency-dependent loss and Doppler shift, which typically characterize underwater channels. The simulation parameters used in the channel model are summarized in
Table 1.
The GNU Radio implementation of our system includes a classification module, as shown in
Figure 2, which leverages convolutional neural networks (CNNs) for classification. CNNs, a type of deep learning model, are designed to process grid-like data such as images, videos, and time-series. They excel at identifying patterns and capturing spatial hierarchies using convolutional layers with filters.
Widely applied in tasks like image recognition, natural language processing, and video analysis, CNNs are trained through supervised learning on labeled datasets. Their key advantages include position-invariant feature detection, fewer trainable parameters compared to fully connected networks, and the ability to learn complex data representations. In
Figure 2, the pre-processing of each image and conversion to an inference-acceptable format are highlighted. Then, this is fed to an ONNX-Runtime CNN block, and the post-processing of the result as a class label is executed to be later reused for the adaptation logic that triggers the transmission operations.
For simplicity, the standard MNIST dataset [
18] has been used instead of a dedicated dataset of fish images. This, however, does not impact the applicability of the approach. In case of a different input, e.g., an RGB dataset, the larger images would increase inference latency, yet the transmission time and energy would remain unchanged because the system still sends only the compact class label. MNIST serves as a publicly available, well-controlled benchmark for validating the complete
sense–infer–transmit pipeline. The same CNN architecture can be re-trained on RGB marine images using transfer learning [
19], without requiring any modifications to the communication chain (Although classification energy increases with image size, a lightweight CNN model running on a Raspberry Pi 4, which typically operates at an active power between 2 W and 4 W, consumes approximately 1 mJ per inference for a 28 × 28 grayscale frame. When a lightweight CNN model, such as MobileNet or a custom architecture, is optimized through quantization or model compression techniques like pruning, the energy cost typically remains below 10 mJ for a 224 × 224 RGB frame. In contrast, transmitting the same 224 × 224 × 24-bit image which is approximately 150 KB over an EvoLogics 18/34 underwater modem at around 14 kbps across a 1 km link would require keeping the power amplifier active for roughly 90 seconds, resulting in energy consumption in the order of several hundred Joules. This is several orders of magnitude higher than the cost of local inference).
We used a trained ONNX inference model from [
20], which consists of two convolutional layers, each followed by bias addition, ReLU activation, and a max-pooling layer. After the convolution and pooling stages, the output is flattened (reshaped) and passed through a fully connected (dense) layer, which outputs a 10-dimensional vector of class scores. The trained CNN inference model receives and performs the classification on the input image data. The output of the CNN model, which contains the classification probabilities, is converted back into a tagged stream using the PDU to Tagged Stream block. The result of the classification is the element with the highest predicted probability which is identified by an index.
In the rest of this section, we associate the numbers from 1 to 9 with 9 different classes of fishes, from the most common type (1) to the rarest and most vulnerable type (9). The second component of our framework, by exploiting the result of the classification process, selects the communication parameters such as packet length, gain, and modulation scheme to be utilized in the communication process. An example of the choice of the communication parameters for the nine classes is reported in
Table 2 (In order to increase the reliability of the designed system and to cope with possible classification anomalies coming from the detection of new unclassified species, multiple back-up solutions can be designed. More specifically, in case of anomalies in data classification, additional data, for example, about vocalisms of mammals, can be employed so as to assess or correct classification results by means of integrated audio data. Another approach, which can also be used as a complementary mechanism to the above described one, is to transmit, in case of misclassification, the entire raw data while setting the transmission parameters in such a way as to preserve data delivery in any of the cases. The multiplicity of possible situations, which can be incurred in an unknown scenario like the marine one, clearly identifies the importance of figuring out a tradeoff between the preservation of reliability in data monitoring and the energy efficiency in data transmission).
Table 1 groups the classes assumed to be associated to species, based on the relevance and rarity of mammals. More specifically,
Classes 1–3 (associated to common species). In this case, modulations QPSK or BPSK are considered (with higher occurrence of QPSK modulation exhibiting higher spectral efficiency) with large packet length and high gain to maximize throughput when acoustic disturbance is not a concern.
Classes 4–6 (associated to vulnerable species). In this case, modulations QPSK or BPSK are considered (with higher occurrence of BPSK modulation exhibiting lower spectral efficiency) with medium packet length and medium gain to both preserve throughput and also cope with acoustic disturbance reduction due to the vulnerability of the considered species.
Classes 7–9 (associated to rare species). In this case, a fixed BPSK modulation is considered (exhibiting lower spectral efficiency but better reliability) with low packet length and low gain to cope with acoustic disturbance minimization because of the fragility of the considered species.
Observe that our approach is general enough to account for other possible alternative ways of performing integrated sensing based on the semantics (i.e., on the meaning) of the data collected. As an example, if the underwater nodes are equipped with hydrophones, based on the sounds registered, if certain frequency components are present (e.g., those related to the vocalisms of certain species), appropriate sensors are activated, e.g., a camera sensor, and a similar classification can take place with the corresponding assignment of communication parameters. This is because such frequencies could be a hint for the presence of a group of rare mammals which, thus, can be visually tracked. A more general layout of the whole framework is shown in
Figure 3, summarizing the processing chain: the sensed data (an image in our prototype) enter the CNN for inference; the resulting label is passed to a configuration map that chooses the modulation type, packet length and transmit gain; these parameters drive the transmitter blocks, so that data traverse the underwater channel and can be recovered by the receiver.
A software-defined radio (SDR) is a radio communication system where most key components that have traditionally been implemented in hardware are instead realized through software on a general-purpose computing platform. By using software to handle functions like modulation, demodulation, filtering, and signal processing, SDRs enable dynamic adaptation to new protocols, frequencies, and standards without requiring hardware changes.
One of the most widely used tools for SDR development is
GNU Radio [
16]. GNU Radio is a free and open-source development toolkit that provides signal-processing blocks to implement SDRs. It allows users to create radio systems through a graphical user interface (GUI) or by writing Python (3.12 version) scripts. The flexibility of GNU Radio lies in its modular architecture, where users can drag, drop, and connect pre-built signal-processing blocks to design complex radio systems.
GNU Radio is supported by a variety of hardware front-ends, such as universal software radio peripheral (USRP) devices and RTL-SDR dongles.
Concerning the GNU Radio implementation used for the SDR, sketched in
Figure 4, within the communication architecture, the payload comprising the classification results is read and passed through the Stream to Tagged Stream block.
This converts the input data stream into defined packet sizes by creating boundaries, thereby enforcing the selected packet length. The CRC32 block generates a cyclic redundancy check (CRC) for the payload to ensure data integrity. A header section is generated for the payload. The Tagged Stream Mux combines the header and payload blocks and forwards them to the constellation modulator, followed by processing through a low-pass filter, fractional resampler, and gain controller. The modulation type and gain parameters are determined based on the previous choices, according to the classification results.
The modulated complex samples are then transmitted across the Stojanovic underwater channel model [
15] and received at the receiver block. After passing through the resampler, automatic gain control (AGC), and filter blocks, clock synchronization is performed by the Symbol Sync block, and, after equalization (In our system, we prefer a simplified design; thus, we choose to consider a linear equalizer. However, different choices are possible, e.g., a decision feedback equalizer (DFE), which in any case, does not imply any change into the AI-driven controller behavior), phase correction is achieved using the Costas loop. Demodulation is carried out by the constellation decoder, which passes the decoded bytes to the differential decoder. The Correlate Access Code block searches for the specific 64-bit access key pattern, and the Repack Bits block reassembles the bits in most significant bit (MSB) style before forwarding them to the CRC Check block. The CRC Check block checks the accuracy of the received bits. Upon successful detection of the access key and verification of the CRC, the bits are repacked and the information is saved.
5.2. Numerical Results
Underwater acoustic networks have strict energy constraints, requiring adaptive strategies to optimize resource utilization without reducing functionality. As illustrated in
Figure 5, we preliminarily studied the feasibility of the proposed strategy in terms of achievable energy efficiency when compared with methodologies not employing AI-driven design. Three approaches were considered, i.e., AI optimized when no effort in inference is accounted, AI with the consideration of on-node inference load, and a baseline approach. The baseline approach uses static communication settings such as default modulation, packet size, and transmission power, which result in consistently high energy consumption. The AI-optimized approach dynamically adjusts communication parameters based on classification results from inference systems, excluding the energy costs of inference itself, to isolate the impact of parameter adaptation. In contrast, the AI with the consideration of the on-node inference load approach integrates on-node lightweight models to locally classify species, accounting for the incurred computational energy costs while simultaneously performing real-time communication parameter optimization. We took the inference time details from [
21] to account for the computational overhead introduced by classification. Although on-node inference introduces overhead, the combined benefits of reduced data transmission and context-aware optimizations yield net energy savings (in the order of about 40% upon increasing the number of iterations), as demonstrated in
Figure 5. In this figure, each iteration on the
x-axis corresponds to a complete communication cycle consisting of sensing, onboard classification using the CNN, and packet transmission over the acoustic link, i.e., 3000 repeated sense–infer–transmit operations. This underscores the viability of embedding inference-driven adaptability directly into nodes, balancing computational and communication costs for sustainable operation in dynamic underwater ecosystems. In the case of RGB images, the increased computational demand can be addressed using energy-efficient hardware accelerators, such as low-power GPUs or TPUs, in combination with lightweight architectures like MobileNet or EfficientNet. Quantization and pruning further reduce model complexity, while input dimensions can be controlled through simple pre-processing techniques such as cropping or resizing to regions of interest. Achieving high accuracy in marine environments largely depends on the quality and diversity of the training data. In our case, the ability of the model to generalize effectively to complex RGB images is significantly influenced by exposure to appropriate real-world underwater datasets. In contrast, when evaluated on the MNIST dataset, our lightweight 4-layer CNN model achieves an overall test accuracy of 99.1%, with individual class accuracies ranging from 98.6% to 99.7%.
To further evaluate the system performance, after assessing its suitability in terms of energy, we also considered other key performance metrics including BER, throughput, and delay.
5.2.1. BER Analysis
In
Figure 6, we show the mean BER (in %) as a function of the number of sample points, i.e., iterations, where each iteration corresponds to one complete communication cycle. Simulation results have been collected considering a T-Student distribution, with results providing an accuracy of 95%. Confidence intervals are shown through vertical bars in the plot.
A reduction in the confidence interval has required a minimum number of iterations in the order of at least 1500 samples. Note, however, that the BER remains around an average value of approximately 1.1%. In
Figure 7, we report the detailed distribution of BER values for both the BPSK and QPSK modulation alternatives. The
x-axis highlights the BER bins and the
y-axis shows the percentage of occurrences within each bin. The majority of observed losses lie within the 0–1% range, which witnesses the reliability of the system, especially when considering the possibility to also exploit Forward Error Correction to speed up performance. Typically, in the case of BPSK, we observe a lower bit error rate as compared to the QPSK due to its intrinsic robustness in noisy conditions. In particular, in case of rare fish species, lower gain values and BPSK modulation are selected.
Similarly, in
Figure 8, we show the histograms of the BER for different classes of fishes (1 to 9) and for both BPSK and QPSK modulations. In case of common fish species (low class values), no relevant environmental constraints emerge and, thus, higher gain and packet length are set. The difference here is due to the modulation type where classes 1 and 2 exhibit the minimum losses, as compared to class 3, which uses the BPSK and overpowered gain value. Similarly, for intermediate fish species from classes 4 to 6, the BPSK exhibits a lower BER as compared to QPSK, which requires more power and is less robust in noisy conditions. For the case of rare/vulnerable fish species, from classes 7 to 9, only BPSK is employed. Although the BER remains below 1% on average, occasional spikes occur due to the combination of low gain values and large packet sizes. These results show how modulation selection, gain optimization, and packet sizing choices are inter-related and can significantly influence system reliability under varying environmental and operational constraints.
5.2.2. Throughput Analysis
Figure 9 illustrates the throughput (in bps) achieved both in cases of BPSK and QPSK modulations across various gain values, corresponding to different categories of fish species.
For rare species, low gain values combined with small packet lengths result in comparatively modest throughput levels, ranging from approximately 3700 bps to 5000 bps. Although these conditions represent a challenging scenario, the observed performance underscores BPSK robustness and ability to work even under limited power amplification. In the case of intermediate fish species, both BPSK and QPSK operate within gain ranges of approximately 5 to 9 and 6 to 9, respectively. Due to the increased packet size and slightly improved gain, BPSK throughput rises sharply in the initial portion of this range and then stabilizes. By contrast, QPSK reaches a throughput of nearly 12,000 bps at the lower end of its gain range and stabilizes around 13,000 bps as gain increases further. For common fish species, the highest gain values are employed to achieve substantially improved throughput.
BPSK stabilizes at approximately 14,250 bps, although occasional declines occur due to increased bit losses at very high gains. Meanwhile, QPSK initiates this segment around 16,000 bps and continues to show an upward trend, demonstrating its ability to leverage higher gain values and larger packet sizes for further throughput enhancement.
5.2.3. Delay Analysis
Figure 10 shows the end-to-end delay (in s) as a function of packet length (in bytes). Both BPSK and QPSK follow the general trend of decreasing delay as the packet length increases. However, BPSK consistently requires approximately twice as much time as QPSK. Smaller packets incur proportionally higher overheads than larger packets, so increasing packet size allows more useful information to be carried per packet and, thus, improves efficiency and reduces delay. Furthermore, the higher spectral efficiency of QPSK enables it to transmit more bits per symbol than BPSK, which further shortens the end-to-end delay. For packet lengths of 2 and 3 bytes (representing rare fish species), only BPSK modulation can be chosen; however, it provides the advantage of greater robustness at lower gains and in noisy conditions. For the remaining fish classes, the delay trend aligns closely with the general behavior described above.