4.2. System Structure
The structure of the proposed system is shown in
Figure 2. The data flow is shown as lines with arrows; optional components and data flows are drawn with dashed lines.
As one can see, the monitoring system consists of some autonomous measurement modules, which are installed at the ponds of a fish farm, a LoRaWAN gateway, and a server with database and dedicated software. Optionally, the system may contain actuator controllers, which allow it to make corrective actions on the monitored ponds. In most cases, those controllers have to provide simple functions, like start and stop of the aerator, closing and opening inlet or discharge, etc.
Water condition should be measured in at least three points of the pond, mentioned above. In the case of a large pond area, there may be some additional characteristic points for water parameter measurement: feeding areas, pits, bottom channels, and deep places are some of the examples. Therefore, the number of measurement modules per pond in each and every case should be determined individually.
Measured data are transmitted via LoRaWAN gateway to a local server, on which dedicated software is running; that software includes the LoRaWAN network server, database, and client software. A detailed description and functioning of the software is given below. The data obtained may be made accessible via Internet connection using Web Socket or Message Queue Telemetry Transport (MQTT). The LoRaWAN network of the proposed system has star topology; in the case when the area is very large, daisy chain topology may be used instead.
All data analysis and decision making described in
Section 4.4 can be performed on the central server. However, in order to increase system reliability, it is preferable to distribute data processing between system nodes. Measurement modules may perform basic data processing, including outlier detection and prognosis of water parameters in the measurement point. On the base of data obtained, modeling of the hydrochemical regime of the pond can also be performed on each node of the system; the model of the regime may be based on differential equations or neural network. More thorough data processing, including comparative analysis of obtained and history data and storage of data history should be performed on the server. Additional data from the mobile testing system may also be used in order to increase reliability of hydrochemical regime prediction. Such an approach allows one to decease data flow in the system; moreover, even if one or some of the modules fail, the monitoring system will continue operate as a whole. At that, the final data processing has to be provided by custom client software.
Based on data processing performed by client software, corrective actions may be initiated. In the simplest case, that is characterized by giving recommendations to service personnel; if optional actuator controllers are used, some of the measures may be applied automatically. The modeling, data processing, and decision-making process is described below.
4.3. Modeling of the Hydrochemical Regime of the Pond
A pond ecosystem in its basic form contains phytoplankton, zooplankton, and fish. Therefore, an analytical model of a pond ecosystem, based on Lotka–Volterra and Michaelis–Menton equations [
26], is defined by the following system of ordinary differential equations:
where
P,
Z,
F are biomasses of phytoplankton, zooplankton, and fish, respectively, expressed in g/m
3;
O is the DO concentration in mg/L;
are growth, trophic interaction, and mortality parameters,
are parameters of the oxygen balance,
is the DO saturation level of the water, which depends on its temperature
T, and
is the DO saturation level at 20 °C. Descriptions, typical and normal values of growth, trophic interaction, and mortality parameters mentioned above are shown in detail in
Table 3,
Table 4,
Table 5 and
Table 6. The values and ranges shown in those tables are based on published experimental and theoretical studies in aquaculture, limnology, and ecological modeling, including classical predator–prey dynamics, functional responses, and oxygen dynamics in fish ponds [
27,
28,
29]. Those sources provide reliable typical values and normal ranges for model parameters.
In state-space form, model (
1) is expressed as
where
is the system’s state vector. This model, called the P–Z–F–O model, reflects key trophic and biochemical processes in the pond and allows prediction of its state while being computationally efficient for implementation in real-time systems. One can see from (
2) that the pond ecosystem model is non-linear. At that, usually only
O (DO concentration) is measured, and thus the measurement vector is
. Values of state variables
P,
Z, and
F usually cannot be measured directly and have to be estimated. To address that problem, the application of Extended Kalman filtering [
30] is considered.
Hereinafter, notation
represents the estimate of
x at time
n given observations up to and including at time
, and
represents the estimate of
y at time
n. In the EKF, for each time-step
k, first a priori system state
and covariance
are predicted using formulae
where
is the state transition model, applied to the previous state of the system,
is the function that describes the right-hand side of (
2),
Q is the covariance of the process noise, and
P is the covariance matrix of the system. Then, innovation
, innovation covariance
, and optimal Kalman gain
are calculated, and a posteriori state
and covariance matrix
are estimated based on measured data as follows
where
I is the unit matrix,
is the observation model at time
k, and
is the covariance of the measurement noise at the same step.
In order to provide biological and physical correctness of parameter estimation, the following constraints are set: , where and are the minimal and maximal DO concentrations possible. Such constraints are standard for ecological models of water ecosystems and prevent incorrect state estimation in the process of Kalman filtering. After each EKF correction step, system state estimates are projected into the domain of valid values in order to ensure that above constraints are met.
Indirect parameters (water clarity by Secchi S, chlorophyll concentration ) are used for:
initialization of initial state ;
checking of P and Z estimation adequacy;
correction of model parameters and matrix of noise process Q.
The model that empirically connects indirect parameters with phytoplankton biomass has the form
where
and
are empiric coefficients that depend on pond morphometry (depth, area, volume, etc.);
is noise or measurement error that models random factors. Zooplankton with biomass
Z consumes phytoplankton, therefore
P concentration will decrease with the increase in
Z. After the introduction of coefficient
that characterizes how much phytoplankton is consumed by a unit biomass of zooplankton, Equation (
10) takes the form:
where
is the “effective” phytoplankton biomass, which is “visible” through water clarity and chlorophyll. If
Z is small, the model reduces to the classic one:
.
The model presented above enables the estimation of Z through an inverse solution: given that values of key parameters—S and —are known, one can calculate the proportion of phytoplankton biomass consumed by zooplankton. The indirect parameters are measured at different frequencies: key parameters are sampled 1–3 times per day, while other parameters, that may be needed for model calibration, are typically gathered only once per vegetation season. While those low-frequency measurements are not included in the measurement vector during the normal operation of the EKF, they serve a critical methodological role: they are essential for constraining the model during initialization and for the periodic refinement of its parameters in dedicated calibration phases, thereby ensuring long-term prognostic fidelity.
Use of the proposed approach to pond ecosystem modeling allows one to estimate fish pond parameters using a minimal set of sensors. Moreover, as and are updated at each step of Kalman filtering, the pond model adapts to changes in the pond using measured data .
In a formal sense, a non-linear state-space model (
2) is partially observable, as only one state variable of the vector
X (DO concentration
O) is measured directly. However, practical system observability is ensured by the structure of the ecological model and the presence of a strong functional relationship between phytoplankton biomass and the oxygen balance of the pond.
To analyze local observability of the non-linear system, an approach, based on Lie derivatives, is used. The Lie derivative matrix is defined as
where
is the Lie derivative of the observability function along the vector field
. The zeroth Lie derivative has the form
The first Lie derivative is determined by the oxygen balance dynamics
and its gradient by the state vector is
The second Lie derivative has the form
that contains non-linear combinations of variables
P,
Z,
F via their own dynamics equations. For typical modes of pond operation, vectors
,
,
are linearly independent relative to variables
P,
Z,
O, and as a result, the rank of the observability matrix
is not lower than 3 (locally and under non-equilibrium modes of system functioning). That allows one to restore states of phytoplankton and zooplankton biomass (
P and
Z) from time series of DO measurements
O.
As for fish biomass F, it has an indirect influence on the oxygen balance—via breathing and trophic interactions of higher order—and manifests itself mostly in Lie derivatives of second or higher order. Thus, F is structurally observable, but with a lower sensitivity compared to other state variables, i.e., it’s weakly observable. As a result, instant estimates of fish biomass has higher uncertainty compared to other model parameters estimates, but at the same time, long trends of F dynamics can be restored reliably.
One should note, that the P–Z–F–O model (
2) is introduced as a generalized state-space representation of dissolved oxygen dynamics. Phytoplankton and zooplankton are included as latent variables describing internal oxygen sources and sinks rather than as management targets. For practical fish farming scenarios, the model can be reduced to a simplified oxygen–fish interaction without altering the monitoring, filtering, or forecasting algorithms.
Identifiability of model parameters using only one sensor is limited; therefore, biological parameters , , , , , are set a priori using one-time field measurements. At that, EKF is used mostly in order to estimate ecosystem state and not for full online parameter identification.
As for
Q and
R, their initial values are determined as follows. The initial value of the measurement noise co-variation
R is defined by characteristics of the sensors, and typical values of mean square error for optical DO sensors are 0.05–0.1 mg/L, so
. The process noise matrix
Q models the system’s uncertainty and is estimated based on the expected speed of state variable change during one period of sampling
. The initial value of
Q is a diagonal matrix with elements
where
is a dimensionless uncertainty coefficient, usually 0.05–0.15, and
is the maximum expected value of
. In the process of filtering,
Q and
R are adaptively corrected based on the statistics of innovation
, defined by (
5). That allows the filter to adapt to season and weather changes and to compensate for ecosystem model uncertainty and sensor noise.
Thus, the proposed approach provides practically sufficient observability for monitoring tasks, early detection of critical hydrochemical states, and decision support in pond management. Even the system with one DO sensor, integrated with the dynamic pond model via EKF, can be used as a base for pond monitoring.
The study of the pond model was divided into two stages. Stage 1, presented in this article, is aimed at validation of the algorithm on a minimal system configuration. The goal of the stage is to prove that ecosystem state may be reconstructed using just one integral parameter using model (
1). That allows one to validate the core of the pond system modeling algorithm using minimal configuration, focus on the solution of the partial observability problem, and separate errors related to observability from ones introduced by additional sub-models. This approach also ensures reproducibility of the study and creates a clear benchmark for further studies.
The proposed system prototype is designed to measure DO, pH, and water temperature. That allows one to use a more sophisticated model, which takes all three hydrochemical parameters into account, thus integrating basic biological, physical, and chemical processes in the pond. Stage 2 of the study is to be devoted to the study of the complex model. The authors performed the preliminary study of the complex pond model, and it was determined that:
A complex model is the system of 6 non-linear differential equations and has more than 30 parameters, compared to 13 parameters of (
1).
Most of the mentioned parameters depend on water temperature T according to the Q10 Van’t Hoff rule, and some of parameters depend on two or three variables.
The complex model is fully locally observable, and the rank of its observability matrix is 6. Therefore, dynamics of all three biomasses P, Z, and F can be restored.
Estimated computational complexity (see
Appendix B) for the basic model with the DO sensor only is approximately 400 floating point operations per second (FLOPS) and for the complex model is 1350 FLOPS.
Therefore, the implementation of Stage 2, which involves the use of a significantly more complex three-sensor model, raises two fundamental scientific and technical challenges: the calibration of over 30 additional model parameters and overcoming the significant increase in computational complexity. A comprehensive solution to these problems requires dedicated experimental procedures and the development of novel algorithmic approaches, which fall outside of the scope of the present study. Consequently, a complete three-sensor model constitutes a separate, subsequent investigation, logically extending the results obtained from the minimal configuration.
During modeling, anomalies in the system state are detected:
If any of the state variables violates physical limitations (e.g., or O > 20 mg/L, which is impossible).
If the Mahalanobis distance of the measured data prognosis is higher than the critical level (for 90% probability and number of degrees of freedom, equal to length of ).
Anomalous values are ignored when detected, and the system switches to the simple prognosis mode described in
Section 4.4. Modeling error is estimated using root mean square error (RMSE) of the state estimation. In numerical experiments performed in MATLAB, the true state of the system
is known; therefore, RMSE is calculated as follows
where
is the true (simulated) state vector and
is the state estimate obtained using the EKF. In a real-world scenario, when the true system state is not known, estimation error is calculated using the innovation sequence
and its statistical properties. Typically, EKF reduces state estimation error by 35–60% depending on noise parameters.
Based on the modeling performed as described above for a time range of 20–30 days, critical conditions in the pond are detected:
In addition, dimensionless system stability index
is estimated as
where
and
are standard deviations and mean values of system state parameters
over the ensemble of all states modeled in the given time range.
Modeling of the fish pond ecosystem was performed using MATLAB; software may be obtained from authors by request. An example of pond modeling along with results is presented in
Section 5.1.
4.4. Data Processing and Decision Making Algorithms
DO and pH do not remain constant during a day. DO increases during the day and decreases at night [
33]. pH level also does not remain stationary: it increases during the day up to late afternoon and then decreases at night [
34]. Typical change in DO and pH in a fish pond are shown in
Figure 3.
Most current studies [
35], Ref. [
36], concentrate on DO and pH prediction using neural networks and a combination of neural networks and wavelets [
36]. Such an approach, while being effective for prediction of the hydrochemical regime, has some drawbacks:
The latter issue mentioned renders the monitoring system unreliable in the eyes of farm management, as prognosis cannot be justified and possible expenses and losses due to the prognosis error are harder to eliminate. Moreover, if hydrochemical regime parameters are measured once an hour, only 24 points collected during a day are available for analysis; for DO prognosis, 150 points were used in [
35]. Thus, use of another prognosis approach is desirable.
As a classical approach, in order to predict value of the parameter and, thus, water condition, least square approximation is used. The idea of a least square approximation is to minimize the difference between the approximation function and the measured parameter data:
where
,
are values of measurement time and measured values of the parameter in question, respectively;
n is the number of measured data
;
is the approximation curve,
are the coefficients of that curve. One should note that the moments of time
, at which measured data are registered, do not have to be equally spaced; however, in most cases, that is desirable.
Typically, approximation using (
20) is performed using three curves:
i.e., linear function, exponential one, and a cubic curve. Most processes in nature manifest either linear or exponential character, therefore curve (
21) or (
22) is typically used.
Approximation correctness is estimated using a correlation coefficient between the measured data and approximation curve, which is expressed in the form
where
;
is the mean value of the approximated data, and
is the mean value of the measured data. If the value of the correlation coefficient
r (
24) is less than 0.8…0.9, approximation results cannot be considered reliable.
In order to improve
r, automatic detection and elimination of outliers (i.e., data, measured with errors) has to be applied to measured data before approximation. In [
37], a review of outlier elimination methods is presented. Use of statistical, density-based and cluster-based outlier detection methods requires knowledge of the measured signal’s statistical characteristic, typical signal data density, or measurement data clusterization, respectively. That, in turn, requires a significant volume of data to be measured and analyzed and also requires use of a powerful MCU, preferably with a floating point unit (FPU). In the case of the distributed measurement system which monitors scalar values, distance-based outlier detection methods are the methods of choice, as these ones do not require significant computation power and large datasets.
Distance-based outlier detection methods are based on the idea that data in every measured set should not differ from a given value (or one point from another) more than by some preset value. Therefore, two approaches are commonly used:
the value is assumed to be an outlier if it is higher or lower than a preset threshold(s);
the value is assumed to be an outlier if the absolute difference between that value and adjacent values is higher than a preset threshold.
Usually, a combination of both approaches is used in practice; in the second case, the threshold may be expressed as an absolute value or as a relative one. In addition, the measured point can be considered to be an outlier if the speed of the value change is higher than a given value. For a scalar value, outlier points along the timeline may be isolated or occur in groups. If the monitored parameter is measured with low frequency, outlier values will most probably be isolated points; the only exception is the case of sensor failure. Therefore, a simple algorithm that eliminates a data point if the absolute difference between its value and adjacent values is too high is assumed to be the basis of data processing.
When data approximation is performed and assumed reliable, the time when the DO level will reach corresponding limiting values is presented in
Table 2, and the time when pH value will become higher than 8.5 for carp fish, higher than 8.0 for salmon and trout, or lower than 6.5 is calculated. The ratio of the pH level near the water surface and near the pond bottom is also taken into account (it should not be close to 1.05 [
1]). If temperature stratification is detected, then water in the pond is either mixed or discharged near the pond bottom, and aeration is applied. If any condition that leads to fish kill is met within a predefined time interval, respective corrective measures are also initiated. The decision-making algorithm for any water parameter value is shown in
Figure 4.
Pseudo code of the algorithm presented above and implemented as the MakeDecision procedure is as follows (Algorithm 1):
| Algorithm 1 Decision-making algorithm based on the predicted value of the water parameter |
- 1:
procedure MakeDecision(Measured data, curve type, critical value) - 2:
▹Measured data are usually given as an array of points , curve type is one of those defined by ( 21)–( 23), critical value for each water parameter is set according to reasoning presented above in the Section 3- 3:
Eliminate outliers in measured data - 4:
Approximate measured data with the selected curve - 5:
- 6:
if then - 7:
Calculate time at which parameter will get critical value - 8:
if < preset level then - 9:
Start corrective measures - 10:
return - 11:
else - 12:
Do not initiate corrective measures - 13:
return - 14:
end if - 15:
else - 16:
No reliable data - do not initiate corrective measures - 17:
return - 18:
end if - 19:
end procedure
|
Preset levels of time intervals for taking corrective measures should be established at the level of 3–5 h before a predicted problem and corrected in practice. Corrective measures are usually finished when water condition is normalized.
Using approximation-based prediction, one has to note the following:
After any event, that may severely change condition of water in the pond, i.e., after treating the pond with chemicals, introduction of fertilizers, aeration, very heavy rain, moving fish from one pond to another, etc., all data measured on the pond before the event should not be used for prediction of hydrochemical regime parameters. In order to obtain a correct prognosis, one has to use data collected after the mentioned event.
Care must be taken when analyzing and comparing data, collected in different ponds—especially when those ponds are from different farms. Even if parameter values are similar, that does not mean that parameter will behave similarly in both cases; e.g., different water supplies may significantly alter water parameter dynamics. The same is true for data collected in one pond but during different seasons (in summer and in winter, in spring and in summer, etc.).
If any new unusual trends manifest themselves in the data, advise of an experienced fish farming practitioner is desired, as those trends may be related to a problem that may require an additional diagnosis and elimination.
One should also note that approximation-based prediction is a tool of operative control, and as such is suitable only for a short-term prognosis (usually for a period of some hours). Therefore, in order to make a long-term prognosis (e.g., during at least 20–30 days), use of a deterministic model, presented in
Section 4.3, is one of the possible solutions. Prognosis for a one-year production cycle, if needed, should be implemented on the server, as that requires vast datasets and a significant computing power to function properly.
4.6. Software Description
Software of the system is to be organized in two levels:
lower-level firmware that provides measurement, initial data analysis, basic hydrochemical regime modeling and prognosis and transmits data to a higher level using LoRaWAN;
higher-level software that provides data collection, data storage in a database, comparative analysis, and service to clients via local network or Internet (if needed).
Lower-level firmware will be running on MCUs of measurement modules. General functions of the firmware include water parameter measurement and automated outlier elimination. The pond model with EKF and basic trend analysis should also be included in firmware. Each measurement module operates as a class A LoRaWAN device, which provides measurement data to higher-level software with a given period of time in order to save the energy of the accumulator.
Most of the time, module will sleep and will initiate measurement only at preset time moments. The measurement schedule may be changed if system prognoses fish kill (more frequent measurement), when water parameters are normalized (less frequent measurement), or by user’s request. As stated above, basic data processing, including outlier detection and prognosis of water parameters in the measurement point, will be performed by the module in order to reduce data flow. In order to decrease influence of random noise, data measured at each moment of time in the schedule will be averaged during measurement.
General algorithm of the measurement module’s firmware operation is as follows (Algorithm 2):
| Algorithm 2 General algorithm of the measurement module’s firmware operation |
- 1:
Power on module/Get it out of deep sleep mode - 2:
Read configuration (including curve type, critical value) from flash memory - 3:
Initialize sensors - 4:
Read data from sensors, average them - 5:
Turn sensors off - 6:
Use EKF ( 5)–( 9) on measured data - 7:
Add corrected measurement results to Measured data array - 8:
for each sensor do ▹Call routine that clears and processes measured data - 9:
MakeDecision(Measured data, curve type, critical value) - 10:
end for - 11:
Save processed measurement data into the local flash - 12:
Send processed measurement data to higher level via LoRaWAN - 13:
Receive configuration or commands from higher level - 14:
if There are any configuration changes then - 15:
Save changes to flash memory - 16:
else if Significant changes (chemical treatment, etc.) require data reset then - 17:
Reset data buffers - 18:
else if Other command received then - 19:
Process that command - 20:
end if - 21:
Setup wake-up timer for the next measurement cycle - 22:
Put module into deep sleep - 23:
Wait for wake-up event - 24:
Go to step 1
|
Such an algorithm of operation allows one to provide low power consumption and ensure that all data are processed at a low level of the system before sending them to a higher level. As for computation complexity, most of calculations can be performed using single precision arithmetic; moreover, fixed point may also be used—as 32 bit MCUs have good performance if 32 bit integer arithmetic is used. As for real-time feasibility, if data are measured once an hour according to recommendations presented above, processing can be considered real-time even if computation will take one–two minutes.
As a base for the firmware, the open-source LoRa Basic Modem LoRaWAN stack developed by Semtech [
40] is to be used; its main advantage is its portability between different MCUs. In order to simplify the measurement unit, Activation By Personalization (ABP) is to be used to authenticate the unit in the LoRaWAN network. User-defined static Device Address (DevAddr), Network Session Key (NwkSKey), and Application Session Key (AppSKey) will be stored in the end-device and may be changed using an external interface both during commissioning and in the process of operation. Over The Air Activation (OTAA) may also be used in the future if necessary.
Higher-level software consists of the LoRaWAN network server that receives measured data, LoRaWAN application server with a database that collects and stores those data, and a client software that may provide long-term pond modeling, data visualization, trend analysis, prognosis, and corrective measures.
In order to exclude data leak to the Internet and prevent unauthorized access to any devices that can change pond condition (e.g., inlet and discharge), use of a local network is planned. Therefore, both LoRaWAN servers with a database should run on a dedicated server hardware, which has to operate and be available 24/7/365.
As a LoRaWAN network server, a number of software packages may be used. Some of those include:
Multitech Network Server [
45], and may others.
Most of those servers store data in a cloud; some of them operate in public networks with many restrictions and require payment for commercial usage. Of those servers listed above, Chirpstack is the only open-source and free LoRaWAN Network Server.
Chirpstack is targeted on Linux, but using Docker and Docker Compose it can be also be run under Windows and Mac OS. It provides both the Network Server and Application Server and can use SQLite or Postgre SQL as a database. Both installers and ready-to-use Docker images are available on Chirpstack’s official site download page. In order to connect the LoRaWAN network server to external clients, HTTP, Web Socket protocol, or MQTT may be used; in real-life scenarios, secure versions of those protocols should be used.
Client software of the monitoring system has to provide to its end-users the following features:
trend analysis of measured water parameters;
analysis and long-term prediction of hydrochemical regime using the model of the pond;
prognosis of fish kill based on that analysis;
and, optionally, initiation of corrective actions, including aeration and water exchange in the pond.
The functions mentioned above are easy to implement on any personal computer, under either Windows, Mac OS, or Linux. Mobile software, based on Android or iOS, may also provide real-time indication of water parameters and trend analysis. Any automated corrective actions (if they are possible) should be initiated by the monitoring system and acknowledged by the user (fish farm specialist or pond owner). Those actions may include water aeration, mixing water or discharge of lower layers of water in order to break stratification, etc. One should note that if the monitoring system can initiate any corrective measures, it is preferable that any actuator mechanisms which control aerators, water inlets, water discharges, and other equipment were physically inaccessible from external networks (especially from the Internet).
At last, all results of water parameters prediction that indicate possible problems (including fish kill) and all corrective actions taken should be stored in the event log, placed on the same server hardware that hosts LoRaWAN servers. To that end, the same database server that is used to store measured data may be used.