At the CNR (National Research Council of Italy) research area in Florence, Italy, with coordinates 43.818879 N, 11.201956 W and 50 m a.m.s.l, experimental equipment has been implemented and positioned on the roof of a building in an area with restricted access, at a height of about 18 m above ground level. The site is open on all sides and has no obstacles, above the horizon, represented by surrounding trees or buildings or other structures, that could prevent or distort the measurements of rainfall.
2.1. Instrument Prototype Setup and Signal Processing
The system developed in this work was designed and implemented following an easy prototyping approach to assess the effective feasibility and usability of the measuring instrument. For this reason, in this first development, the parts were put together using off-the-shelf and reused components. With reference to the architecture used in the development phase shown in
Figure 1, it seems evident that computational resources are oversized with respect to real hardware and software needs; in fact, a recycled computer tower was used as an acquisition and processing system. However, the aim of this first prototype was development and experimentation concerning such a measurement system and assessment of the feasibility and effectiveness of the processing methods proposed in this work, i.e., to test processing procedures that are novel for this type of sensor, including machine-learning approaches for sensor calibration.
The sensitive element of the prototype is a piezoelectric sensor constituted by a unimorph piezo disc with an active ceramic piezoelectric layer (30 mm) put in contact with a brass substrate plate (50 mm). It is specifically designed to work at environmental condition temperatures in the range −20–70 °C. This piezoelectric transducer is glued to the underside of a glass plate that separates the sensor from atmospheric agents. The glass plate is slightly inclined (about 15°) to allow rainwater outflow from the sensing surface. The sensor housing is then arranged by a protective box containing the sensitive element and the wiring connections. Water particles impact the upper part of the glass plate and transmit the vibrations to the underlying piezoelectric sensor. In this way, a voltage signal is generated on the poles of the piezoelectric element. The voltage signal, as a function of the volume of the impacting drop [
11], propagates trough a bipolar cable up to the electronic acquisition and control system consisting of an old Personal Computer (Compaq Evo D310). That system was equipped with an Integrated Analog Devices AD1981A AC’97 SoundMAX CODEC, an audio capturing device, used as an acquisition system directly connected to the piezoelectric transducer. The audio device is directly integrated in the motherboard of the Personal Computer and is managed through the PCI (Peripheral Component Interconnect) standard bus. In that audio acquisition system, the analog signal is sampled, quantized, and coded for conversion to digital. Specifically, the signal is sampled with a frequency of 8 kHz, and each sample is coded as a 16-bit floating point to ensure a good compromise between the available spectrum frequencies and the size of the generated file. Output data samples have a maximum value of +1.0 and a minimum value of
, and are linearly correlated with input voltages to a good approximation [
20]. Data produced by the acquisition system are then converted in WAVE audio file format for further processing. Each output file has an acquisition duration of 55 s and the sampling is repeated every minute, whereas 5 s are left for allowing the system to assemble and correctly save the file.
Figure 2 depicts an example of signal flow acquired by the PRS during a few seconds of a rainfall event. Note that, for visualisation purposes and only in this case, the signal amplitude was re-normalized with respect to the maximum recorded amplitude. The amplitude of the output samples can change depending on the software amplification factor; therefore such a factor must be set once and not changed during operations in order to obtain comparable measurements.
Once the acquisition phase is finished and the wave file is available, the signal-processing starts. A low-level signal intensity, which is comparable with the background noise of the system, is an indication of absence of precipitation. A threshold must be set as computed during no-rain periods to detect the instrumental and environmental noises that are related to the location where the system operates. The selected threshold corresponds to the maximum amplitude recorded during some selected no-rain periods. Any outlier, consisting of isolated values that far exceed the range of most recurring values should be identified and removed. Only the signal samples with an amplitude exceeding the environmental noise threshold provide information about precipitation occurrence. The time of the first sample with an amplitude exceeding the threshold is considered to be the start of a drop impact signal. Such a signal (
Figure 2) is characterized by a maximum peak followed by a damped oscillation phase with the return to the equilibrium condition. From the calculation of the relative maxima of this trend, it is possible to reconstruct the envelope of the signal and therefore the damped trend. The end of the drop signal is defined as the time the drop envelope signal no longer exceeds the threshold, namely the first relative maximum below the noise threshold. It is known that the mean length of a drop signal is approximately 5–15 ms [
21]. Therefore, a large time shifting window of 200 samples, corresponding to 25 ms, is used to identify the start and the end of the drop signal. The local maximum value is taken as the drop maximum amplitude, a quantity approximately dependent on the diameter of the drop through a power law [
12]. In this time window, multiple drop impacts can occur altering a single drop signal duration. A method to identify these multiple impacts has been implemented based on the inversion of the envelope of the signal. If the impact of a drop occurs during the descendent relaxation time before the envelope is below the threshold, the signal undergoes a rapid rise and a new maximum can be detected. The inversion time of the envelope trend is used for the end of the first drop and the start of the second one. The processing phase leads to the computation of four output parameters referring to the 55 s acquisition period, namely the mean values of the drop signal duration (mean drop duration), the normalized (i.e., referring to a maximum value imposed by the audio acquisition device) maxima of drop signal amplitudes (mean drop amplitude), the number of identified drops (number of drops), and the overall sum of the absolute values of samples in the acquisition (sample sum). The data flow diagram of the processing applied to each acquisition time is shown in
Figure 3.
For each 55 s wave file, the set of output parameters is computed and saved. An extensive dataset is then available for further processing and comparison with other measurement systems.
Figure 4 shows an example of outputs provided by the implemented instrument, together with the spatially and temporally co-located rain rate measurements obtained from a professional disdrometer system, namely the OTT Parsivel 2 described in
Section 2.2.
The correspondence between the outputs and the occurrence of precipitation seems evident, but the quantitative relationship between the values of each single output parameter with rainfall intensity is weak, although each recorded parameter is in relation with some precipitation characteristics. In fact, while the number of drops can be regarded as a proxy of the drop concentration number, assuming the existence of power law relations of the form relating the raindrop diameter D to the amplitude or the the duration of drop signal yields that the sample sum, the mean drop amplitude, and the mean drop duration can be interpreted as different moments of the drop size distribution , whose order depends on the unknown exponents of the power laws. An approach that relates multiple output parameters of the sensor to the rain rate, which can also be regarded as a moment of the drop size distribution, is therefore justified.
Additional difficulties in finding an effective analytical relationship between the sensor output signal and rainfall intensity are due to low-cost assembly and quite basic signal processing. For instance, the sensing surface, despite its inclination, becomes wet during rain events, when the water layer can cause a dumping of the drop impact so that a part of kinetic energy cannot be transferred to the piezoelectric sensor, resulting in an attenuation of the signal and a possible alteration of the measured parameters. The water on the sensor also increases the noise level of the system, as can be noticed in
Figure 4, by the trend of the sample sum quantity that increases following the beginning of the precipitation and remains at higher levels even when the precipitation stops. The environmental noise also contributes to errors in the retrieved parameters, as there is not a specific filter for this kind of noise. For example, we noticed that in the presence of strong wind, false alarms can be created or a variation of the sensor response. An analysis and attempts to resolve this type of unwanted effect would require the development of sophisticated signal-processing tools with development efforts that would risk canceling the low-cost aspect of the sensor. Based on these considerations, the proposed calibration procedure to relate impact measurements with rain rate estimation was carried out using machine-learning methods based on the set of output parameters as described above, so exploiting their connection with precipitation without any further processing.
2.2. Calibration with Machine-Learning Methods
The calibration of rainfall measurements from PRS is based on the assumption of the existence of a relationship between the sensor responses and the rainfall intensity. In most cases, the relationship is monotonous (i.e., as the voltage of the samples acquired by the sensor increases, the rain rate increases, according to either linear or power laws [
10]). Following theoretical considerations expressed above and the experience gained carrying out this work, this monotonous relationship is not always respected and valid due to many causes, such as prolonged use of the instrument in the field, external and internal noise, etc.
In addition, the response of the sensor interface due to the drop impact is not easy to model, in particular after the beginning of the precipitation when the wet layer attenuates the impact of the falling drops as discussed above. Irregular and unexpected sensor responses are more frequent when low-cost sensors are used [
17,
22].
We applied a software-based machine-learning method in the calibration of the PRS, as shown in studies in the literature demonstrating that such approaches can effectively improve the performances of low-cost environmental sensors [
23]. The availability of a co-located laser disdrometer instrument allowed a direct comparison between the reference measured instantaneous rain rate and the response of the sensor under study. Such reference data have been used in the training and test steps of the machine-learning model setup. The laser disdrometer considered in the study is a PARSIVEL (particle size and velocity) disdrometer, second generation, manufactured by OTT GmbH (Kempten, Germany). It has an optical sensor that produces a horizontal sheet of light that is focused on a single photodiode. Passing through the light sheet, particles partially blocked this light sheet, causing a short reduction in voltage in a clear sky (equal to 5 V). The amplitude of the reduction is proportional to the dimension of the drops while the fall velocity depends on the duration of the reduction. The software of the manufacturer provides the number of drops in 32 diameter size and 32 fall velocity categories, with variable widths. The particle size ranges from 0.062 to 24.5 mm, while the fall velocity ranges from 0.05 to 20.8 ms
. However, the first two size categories, which correspond to sizes less than 0.2 mm, have been left empty due to the low signal to noise. Knowing this information, the drop size distribution can be obtained and the rainfall rate can be straightforwardly computed [
8]. The PARSIVEL disdrometer rain rate measurements are averaged over a minute of acquisition. For the training phase of the ML approach, data relative to the month of April 2019 were selected. The total amount of rainfall during April 2019 was 66.8 mm, with eight rainy days. The full month dataset acquired by both the low-cost sensor and the reference disdrometer was randomly split into two parts: 75% used for the training phase and 25% for the test phase.
Most machine-learning methods are based on data analysis and empirical choice of the best regression/classification method. With this rationale, some different machine-learning algorithms were tested for the sensor-calibration process. The selected methods (implemented in the Scikit-learn framework [
24]), together with the results on the analysis on the Mean Absolute Errors (MAEs) obtained during the training and test phases are shown in
Table 1.
As expected, the joint use of the sensor output quantities and the measurements of a reference equipment for training and test showed that the instrument does not provide signal properties that are correlated with the precipitation rate. In fact, the results of machine-learning methods based on linear models (i.e., Linear Regression, Partial Least Squares Regression) show poorer performances. Moreover a Support Vector Regression algorithm was applied using radial basis as the Kernel function, i.e., an operator to remap the original sensor output signals to a new space of vectors that are linearly combined to obtain the output. The performances of the Support Vector Regression are very similar to linear models, demonstrating the difficulty of the task of linearizing the relationship between the parameters provided by the sensor and the precipitation intensity. The Multi-Layer Perceptron Regressor (MLPR) neural network algorithm, very suitable in cases of non-linear relations, was tested with 500 passes over the training data; the results of this method on MAE both in training and in test phases are comparable or even worse than the methods described before.
Even though these preliminary tests were carried out on all the listed methods of
Table 1, the final analyses will focus only on those that showed the best performances in the training and/or testing phase: Decision Tree Regression (DTR), Random Forest (RF), and K-Nearest Neighbors Regression (KNNR). The first two methods follow quite similar approaches, as Random Forest is an extension of DTR that combines multiple decision trees, with better results for the RF and DTR in the testing and training phases, respectively. The KNNR method, implemented to consider the five nearest neighbors, showed instead the worst value of MAE in the training phase, but the best one in the testing phase.
Considering these results, the selected methods were applied to a dataset completely independent from that used for the training and testing of the ML. For this dataset, reference disdrometer data were available as well. The results will be shown and discussed in the next sections.