In order to train the architecture, we first build a database from the field campaign measurements. We used Matlab R2023b from Matworks to preprocess a total of 180 days of raw data. Due to the large amount of available data, the database is organized day by day. One day of raw data contains 1.08 M samples. Each sample includes at least the three-axis accelerations of the measuring device among other measured variables, e.g., gyroscope, magnetometer and temperature among others. The obtained database contains a total number of 194.4 M samples.
Since this research focuses on the extraction of the fundamental frequency parameter and not its amplitude, and because the frequency components of the measured water flow are present in the three measured orthogonal axes, this work only processes the accelerations of the vertical axis of the instrument. For this reason, we uses the Matlab fitting functionalities to obtain the fundamental frequencies and its multiple components over only one acceleration axis for a given sample length.
As was mentioned previously, the amount of data is huge in comparison with the number of classes/bandwidth to detect. On the other hand, the comparison of the available data reveals that although it is confirmed that the amplitudes of the water flow speed change substantially between time slots, days throughout the month and/or the complete measurement campaign of available data, the fundamental frequencies behave similarly between days. For this reason, we have trained the DNN with data from a single day.
A day has 1,080,000 samples at 12.5 Hz. When a length of 32 samples is set, the number of periods are 33,750. We used the 10% of the periods for validation and a 30% for testing purposes.
Optimization
This initial approach is based on using half of the number of inputs to build the full connected layers. The question is if this first approach can be optimized because the target instrument uses an ultra-low power microcontroller and in stand-alone working mode, the power source is a 3600 mAh Li-ion battery.
A first optimization is to reduce the number of used neurons. In this sense, we have made several trials with all possible combination of neurons from 16 down to 5 the FE Stages 1 and 2. Given a combination of multiple layers, each trial takes, approximately, 1.5 min and convergence is not guaranteed. We automate the training process to repeat each combination up to 50 times. The training for a given combination of number of layers stops if the trial converge. Our goal is not to obtain the best convergence, but also to determine if a solution exists.
As expected, as each layer has a smaller number of neurons, convergence takes longer. We obtained a DNN with 5 and 10 neurons in FE Stages 1 and 2. In our experiments, the second layer cannot be reduced more. On the other hand, the first layer can be reduced by up to 4 neurons, but the lost function arise up to 1% and the number of iterations must be increased to 10 k.
A second optimization step is to reduce the number of other non-neuron functionalities. In the proposed scheme (see
Figure 4), the most expensive function in terms of computational effort are the normalization functions. However, normalization functions are critical to keep the intermediate processed data/features within a known range. Moreover, the literature pointed out that they also increase the speed of the convergence in the training process.
Our goal is not to define a new normalization function, but one open question arises whether its use is necessary. In this sense, we have studied the computing process through each layer of the DNN. First of all, the nature of the measurement process is analyzed. The acquired data are obtained from the vertical acceleration axis of a water current meter located in an ocean offshore infrastructure. Due to the gravitational tide, the water flow never becomes zero. From the point of view of the acquired data, this fact implies that the sign of the data is never swapped. In other words, the water flow meter always provides a positive or negative value, but never changes sign.
The first layer, labeled Full Connected 1 in
Figure 4, is composed by linear functions. We observed in all obtained solutions that layer weights has the same sign. The difference between those weights are close to a decade in the range of [−0.43, 0.53] and the average is 0.02. On the other hand,
Figure 10 shows the input distribution. It follows a normal distribution. Based on these characteristic, since the normalization objective is to conform a normal distribution, the input data follows a normal distribution and the processing is performed using linear function. Therefore, we propose to remove Normalization 1 layer.
The training of the modified DNN converges in the same way without the removed layer. On the other hand, after verifying the solution we observe that one of the weights of the Full Connected 1 layer in this solution has an abnormal variation. Most of the values are in the same range and there is one value that is 4 decades lower. According to our knowledge, since the range of input values is the same in all inputs, when this behavior appears it means that there are more neurons than necessary.
Based on those clues, we remove Feature Extraction Stage 1 and run the training process again. This modification converts our proposed DNN into an artificial neural network (ANN). Therefore, from a practical point of view, this third proposal greatly reduces the computational effort. The training of the ANN is carried out using from 8 to 3 neurons in the Fully Connected 2 layer.
Figure 11 plots the results for 3 neurons in the Fully Connected 2 layer.
Despite the reduction in the number of neurons, in the Fully Connected 2 layer, the training process converges using only 3 neurons. However, this convergence is achieved one time every 9 or more training trials. As expected, reducing the number of neurons pushes the training process to the limit. To have a better chance of success, we increased the total number of iterations from 5000 to 10,000. In
Figure 11, we observe one example of the convergence. In comparison with the DNN training process, regardless of whether the validation process reaches the 100% percentage in the same way, the training process suffers convergence throughout the process. The proof of this is how the blue line in
Figure 11 does not follow the smooth behavior.
The Loss function in
Figure 11 reaches the same level than the DNN training. However, its convergence trajectory presents several peaks at those points where the training encountered difficulties in optimizing the neural network. It is remarkable the disturbance around iteration 4200 where the convergence of the complete network was reduced up to the 40%. Fortunately, the convergence reaches 10% constantly and the loss function is under 0.2 after iteration 6700.