IoT Monitoring and Prediction Modeling of Honeybee Activity with Alarm

: A signiﬁcant number of recent scientiﬁc papers have raised awareness of changes in the biological world of bees, problems with their extinction, and, as a consequence, their impact on humans and the environment. This work relies on precision beekeeping in apiculture and raises the scale of measurement and prediction results using the system we developed, which was designed to cover beehive ecosystem. It is equipped with an IoT modular base station that collects a wide range of parameters from sensors on the hive and a bee counter at the hive entrance. Data are sent to the cloud for storage, analysis, and alarm generation. A time-series forecasting model capable of estimating the volume of bee exits and entrances per hour, which simulates dependence between environmental conditions and bee activity, was devised. The applied mathematical models based on recurrent neural networks exhibited high accuracy. A web application for monitoring and prediction displays parameters, measured values, and predictive and analytical alarms in real time. The predictive component utilizes artiﬁcial intelligence by applying advanced analytical methods to ﬁnd correlation between sensor data and the behavioral patterns of bees, and to raise alarms should it detect deviations. The analytical component raises an alarm when it detects measured values that lie outside of the predetermined safety limits. Comparisons of the experimental data with the model showed that our model represents the observed processes well.


Introduction
Although humanity is constantly advancing technologically, this development influences the environment, inevitably changing it both intentionally and unintentionally. Nature runs its course, and our influence disturbs the normal natural processes, changing the balance of natural perfection. Modern approaches in agriculture, the application of pesticides, herbicides, other chemical agents, and artificial pollinators, the flowering of nature in periods, and untimely conditions have changed the ecosystem of nature itself and bee societies, which is the topic of our research. In addition to the aforementioned, diseases of bee colonies, Varroa destructor infections, the effects of pesticides and herbicides, lack of food in hives, the loss of the queen and significant losses caused by unusual changes in the environment, meteorological conditions, and the winter season also contribute to beekeeping problems. The challenge was to design and manufacture an improved monitoring and data analysis system that would process data with advanced data analysis techniques on the basis of experience gained from previous studies.
We cannot influence events in the environment of bees occurring in nature, but we can monitor, measure. and collect data. With the methodological application of software • a system for bee movement monitoring was constructed and installed on the basis of which we could correlate independent and dependent indicators; • a large set of sensors for monitoring conditions from within and outside the hive was installed, which collects a wide array of real-time parameters; • a microcontroller-based IoT device was designed and constructed, which aggregates sensor readings and uploads data to the cloud; • an AI-based computational module was created and deployed to the cloud backend, which enables real-time analytical and predictive assessment of data uploaded from the IoT device; • a web frontend app was designed and created, which enables insight into real-time data from sensors at the hive and results from the AI module, namely, analytical and predictive warnings and alarms.
All listed components work as an integrated system that gives beekeepers and biologists insight into the wellbeing of bees, and allows for the monitoring of their behavioral patterns. Data are observed and analyzed depending on meteorological conditions, time of day, season, etc. Future work includes the possibility for taking actions such as hive entrance shutdown, ventilation, suggestions for hive relocation, and engagement of the automatic feeder. All components of the proposed system are described in detail here. Section 2 lists relevant references on the topic of beekeeping that discuss various parameters impacting the behavior and wellbeing of bees. Section 3 describes the hardware of the constructed IoT microcontroller station and connected sensors. Section 4 contains the description of the web application for the real-time data monitoring and display of warnings. Section 5 describes the relevant data that were collected from the hive for the construction of the AI model. Section 6 contains the description of the applied predictive models. Section 7 contains the experimental results, and Section 8 is the conclusion.

Related Work
In recent years, beekeepers have encountered problems with mass bee deaths [1,2] and bee migrations due to climate change, and the impact of weather conditions on flowering disorders in nature in periods when bees should collect pollen. The use of various pesticides and herbicides in plant protection, and spraying at a time when bees are active in periods of disturbed climatic conditions due to high temperatures and humidity cause bees to be active in the later hours when spraying is performed [3]. This study indicates an advanced solution that could be applied for the intelligent monitoring of events around the bee habitat. It encompasses constant monitoring inside and outside a hive, real-time application, and artificial intelligence that includes a large number of dependent and independent factors influencing bee's life in the analysis. Previous works [4][5][6][7][8] provide an excellent introduction to the issue and indicate a wide range of approaches, proposals, and analyses of various data, and the necessity and importance of including the influence of many factors on the movement and life of bees. The aim of most works was to fully understand the movement, work, and life of bees living in apiculture (the hive), and which beekeeper takes care of the bees, so that they can raise, nurture, and monitor them with constant insight into the condition of bees in the hive. In that way, the beekeeper could quickly react to changes through alarms that would be triggered from intelligent monitoring if situations occur with sudden changes, deviations, or potential problems predicted by the solution from this paper.
Precision beekeeping [9][10][11][12][13][14] is a term that has appeared in recent years referring to the development of online tools for the continuous monitoring and control of bee behavior using an individual approach to society, avoiding exposing bees to additional stress and unproductive activities. As monitoring each bee colony requires expensive resources and is complex, precision beekeeping offers a solution in the form of monitoring individual bee colonies and their immediate environment.
The mentioned works include important factors indicating their individual value, such as temperature and humidity, and their influence on swarming or feeding [15,16], ref. [17][18][19] vibration and sound [20][21][22][23][24][25][26], the presence of gases [27,28], rain and wind [29], the amount and intensity of daylight, and UV and IR radiation indices; this paper covers all these factors together. There are also time series of recording and data collection, which were performed in hourly or daily time series in the mentioned works. The choice of hardware solutions that affect the accuracy of data in previous analyses and approaches [15,27,30,31] differs from the approach in this paper, where we relied on advanced methods and data analysis. It is very important that analysis includes all dependent and independent influencing factors due to the complexity of the obtained results and different methods of inference.
Regarding the application of artificial intelligence, the authors in [7] used a decision tree algorithm to classify the state of the hive. In order to maximize the identification of crucial colony activities, including healthy and unhealthy conditions, ten hive status classes were selected for this multi-class classification task. Our AI approach differs from the Electronics 2022, 11, 783 4 of 21 mentioned one, because we try to solve the regression problem and to draw a conclusion about the status of the hive based on the activity of the bees, i.e., whether the conditions in the hive are healthy or unhealthy. Similar approach was presented in [59,60] where the deep neural networks were used to classify bee swarm activity from audio signals.
The monitored parameters are on a broad spectrum to indicate even the slightest significance of any time element, or any deviation or disturbance in relation to the natural environment in which bees normally function.
Parameters inside and outside the hive were monitored in very precise sequences of temperature, humidity, air quality (presence of various gases, smoke, carbon monoxide, etc.), noise, presence of different frequencies of sounds, shocks, vibrations, UV factors, IR factors, intensity and variations of daylight, wind intensity, all in correlation with the frequency (entrance and exits) of bees. Furthermore, one of the goals of the research was to use this system to indicate the range of influences of different factors and parameters, their intensity, and the mutual correlation of factors.

System Overview
The system consists of several hardware and software components ( Figure 1). The main IoT unit, located at the hive, collects data from multiple sensors in and around the hive, and from a bee counting circuit located at the hive entrance. The main unit is based on Arduino Mega 256 and ESP32 microcontroller boards. Data from the sensors and the bee counting circuit are timestamped and transmitted to the cloud database via a cellular modem. In order to prevent data loss, they are also saved on a local memory card. crucial colony activities, including healthy and unhealthy conditions, ten hive status classes were selected for this multi-class classification task. Our AI approach differs from the mentioned one, because we try to solve the regression problem and to draw a conclusion about the status of the hive based on the activity of the bees, i. e., whether the conditions in the hive are healthy or unhealthy. Similar approach was presented in [59] and [60] where the deep neural networks were used to classify bee swarm activity from audio signals.
The monitored parameters are on a broad spectrum to indicate even the slightest significance of any time element, or any deviation or disturbance in relation to the natural environment in which bees normally function.
Parameters inside and outside the hive were monitored in very precise sequences of temperature, humidity, air quality (presence of various gases, smoke, carbon monoxide, etc.), noise, presence of different frequencies of sounds, shocks, vibrations, UV factors, IR factors, intensity and variations of daylight, wind intensity, all in correlation with the frequency (entrance and exits) of bees. Furthermore, one of the goals of the research was to use this system to indicate the range of influences of different factors and parameters, their intensity, and the mutual correlation of factors.

System Overview
The system consists of several hardware and software components ( Figure 1). The main IoT unit, located at the hive, collects data from multiple sensors in and around the hive, and from a bee counting circuit located at the hive entrance. The main unit is based on Arduino Mega 256 and ESP32 microcontroller boards. Data from the sensors and the bee counting circuit are timestamped and transmitted to the cloud database via a cellular modem. In order to prevent data loss, they are also saved on a local memory card.
A web application connects to the cloud database to enable the display of real-time and historical data. A decision-making system (DMS) also runs on the server, performing real-time data analysis and parameter prediction. This component can detect deviations from nominal parameters and accordingly generate alarms.  A web application connects to the cloud database to enable the display of real-time and historical data. A decision-making system (DMS) also runs on the server, performing real-time data analysis and parameter prediction. This component can detect deviations from nominal parameters and accordingly generate alarms.
There are numerous solutions and tools for monitoring the movement of bees inside and outside the hive based on semiconductors, optical sensors, and photoresistors, for example [61,62], Arnia [63], Beecheck [64], the bee counter [65], and the honeybee counter [66]. Some solutions have exhibited problems due to a chosen approach to counting. The bee counting circuit presented in this work is based on a set of two photoreflecting resistors per gate, where both resistors must be triggered to detect one pass. Depending on the order of activation, the direction of movement in or out of the hive is determined. In order to avoid congestion, the circuit contains 24 gates, enabling bees to simultaneously enter and exit through all of them.
The precise measurements of bee movements are the basis for reaching conclusions about the condition of hives, and they are related to every action. However, to obtain more precise movement results, bee movement data must be tied to dependent and independent variables inside and outside the hive.
An active beehive with the IoT main unit and sensors used for data collection in this research is shown in Figure 2. There are numerous solutions and tools for monitoring the movement of bees inside and outside the hive based on semiconductors, optical sensors, and photoresistors, for example [61,62], Arnia [63], Beecheck [64], the bee counter [65], and the honeybee counter [66]. Some solutions have exhibited problems due to a chosen approach to counting. The bee counting circuit presented in this work is based on a set of two photoreflecting resistors per gate, where both resistors must be triggered to detect one pass. Depending on the order of activation, the direction of movement in or out of the hive is determined. In order to avoid congestion, the circuit contains 24 gates, enabling bees to simultaneously enter and exit through all of them.
The precise measurements of bee movements are the basis for reaching conclusions about the condition of hives, and they are related to every action. However, to obtain more precise movement results, bee movement data must be tied to dependent and independent variables inside and outside the hive.
An active beehive with the IoT main unit and sensors used for data collection in this research is shown in Figure 2.

Main Unit Architecture
As seen in Figure 3, the scheme of the data collection system consisted of a microprocessor-controlled IoT base station.

Main Unit Architecture
As seen in Figure 3, the scheme of the data collection system consisted of a microprocessor -controlled IoT base station.
The base station was connected to sensors for measuring parameters and data. The sensor sets in charge of the conditions in the hive were specially arranged in several levels following the structure of the frames and floor in the hive. Sensors for measuring parameters outside the hive were placed in the outer part of the system, but they are protected from direct meteorological influences that could lead to measurement errors. The bee counting sensor array is located at the entrance to the hive, where there are gates with photoresistors for the passage of bees to detect their movement. The entire system is controlled by the main unit microprocessor, which communicates with microcontrollers and initiates the collection of data that are forwarded via GPRS to the cloud system and web database. The data are also written into local storage. The base station was connected to sensors for measuring parameters and data. The sensor sets in charge of the conditions in the hive were specially arranged in several levels following the structure of the frames and floor in the hive. Sensors for measuring parameters outside the hive were placed in the outer part of the system, but they are protected from direct meteorological influences that could lead to measurement errors. The bee counting sensor array is located at the entrance to the hive, where there are gates with photoresistors for the passage of bees to detect their movement. The entire system is controlled by the main unit microprocessor, which communicates with microcontrollers and initiates the collection of data that are forwarded via GPRS to the cloud system and web database. The data are also written into local storage.
The system consists of sensors and microelectronic components, preferably designed to avoid interruptions in operation, since it involves a large number of sensors and auxiliary modules operating at different voltage levels. The system was designed with low power consumption in mind, enabling a self-sustainable operation via solar power and a battery. Figure 4 shows the schematic of the hardware and the main unit of IoT base station, where the central component is an Arduino Mega microcontroller, expanded with an extension module to accommodate all necessary electrical connections. Most sensors were attached over the industry-standard SPI and I2C buses available on the Arduino Mega. The system consists of sensors and microelectronic components, preferably designed to avoid interruptions in operation, since it involves a large number of sensors and auxiliary modules operating at different voltage levels. The system was designed with low power consumption in mind, enabling a self-sustainable operation via solar power and a battery. Figure 4 shows the schematic of the hardware and the main unit of IoT base station, where the central component is an Arduino Mega microcontroller, expanded with an extension module to accommodate all necessary electrical connections. Most sensors were attached over the industry-standard SPI and I2C buses available on the Arduino Mega.
The system was installed in such a way to avoid disturbing the bee ecosystem and prevent the impact of direct exposure to the weather in order to avoid measurement errors. For example, individual sensors that are exposed to direct sunlight are protected by clear glass without UV stabilizers to avoid measurement errors. Sensors for the detection of gases, frequencies, and noise were placed in such a way that they could record without interference and without being affected by weather conditions or direct sunlight.
The bee counting sensor array consists of devices for detecting the frequency of the movement of bees at the entrance to the hive in several corridors in order to smoothly monitor the movement of entering and leaving the hive. These are photoreflective resistors in which reflection is interrupted during movement; thus, the direction of movement is detected. The ESP32 microcontroller board controls the operation of these sensors.
Sensors inside the hive were positioned in such a way that they could function without the danger of being obstructed by bee wax, as bees wax any unknown elements inside the hive to protect the colony.
Collected data from the hive represent the microclimate of the living environment of bees and are valuable because they allow for the differences in measurements with values obtained outside the hive to be observed.
In the background of the main unit of the IoT system that collects data from the bee counting array and the measurement system with sensors there is a trained algorithm in charge of eliminating errors in measurements if they occur.  The system was installed in such a way to avoid disturbing the bee ecosystem and prevent the impact of direct exposure to the weather in order to avoid measurement errors. For example, individual sensors that are exposed to direct sunlight are protected by clear glass without UV stabilizers to avoid measurement errors. Sensors for the detection of gases, frequencies, and noise were placed in such a way that they could record without interference and without being affected by weather conditions or direct sunlight.
The bee counting sensor array consists of devices for detecting the frequency of the movement of bees at the entrance to the hive in several corridors in order to smoothly monitor the movement of entering and leaving the hive. These are photoreflective resistors in which reflection is interrupted during movement; thus, the direction of movement is detected. The ESP32 microcontroller board controls the operation of these sensors.
Sensors inside the hive were positioned in such a way that they could function without the danger of being obstructed by bee wax, as bees wax any unknown elements inside the hive to protect the colony.
Collected data from the hive represent the microclimate of the living environment of bees and are valuable because they allow for the differences in measurements with values obtained outside the hive to be observed.
In the background of the main unit of the IoT system that collects data from the bee counting array and the measurement system with sensors there is a trained algorithm in charge of eliminating errors in measurements if they occur.

WebAPP for MAP
A web application was developed that shows current sensor readings from the hive ( Figure 5). Measured values are stored in a web database in real time.

WebAPP for MAP
A web application was developed that shows current sensor readings from the hive ( Figure 5). Measured values are stored in a web database in real time. The web interface contains indicators for predictive and analytical alarms. Th dictive alarm is activated by the prediction algorithm on the basis of bee movement details in the following section).
The movement of bees is most often caused by feeding, meteorological factors, activities in relation to the same factors, and human activity. In this way, we formu directly and indirectly dependent factors, and their interdependence.
Cells that display values in the application are dynamic and change colors in re to the displayed values. The analytical alarm is triggered when the measurement approaches the critical value, for example, in the case of high temperature and hig midity. When the temperature value inside the hive exceeds 35 °C [68] or the relativ midity is nearly 90%, the alarm is triggered. A push notification is sent informing changes in the hive.
Before the predictive modeling (AI) module is described, an overview of the col dataset from previous steps is provided. All collected variables, and which sensor sponsible for collecting which data are described. Additionally, required data cle The web interface contains indicators for predictive and analytical alarms. The predictive alarm is activated by the prediction algorithm on the basis of bee movement (more details in the following section).
The movement of bees is most often caused by feeding, meteorological factors, daily activities in relation to the same factors, and human activity. In this way, we formulated directly and indirectly dependent factors, and their interdependence.
Cells that display values in the application are dynamic and change colors in relation to the displayed values. The analytical alarm is triggered when the measurement value approaches the critical value, for example, in the case of high temperature and high humidity. When the temperature value inside the hive exceeds 35 • C [68] or the relative humidity is nearly 90%, the alarm is triggered. A push notification is sent informing about changes in the hive.
Before the predictive modeling (AI) module is described, an overview of the collected dataset from previous steps is provided. All collected variables, and which sensor is responsible for collecting which data are described. Additionally, required data cleaning and variable transformations are described.

Dataset Description
Data were collected during fall months, but in the paper, 20 successive days in October 2021 were used for analysis. Measurements were performed in 5 min intervals. Taking into consideration a small number of exits from the hive and small changes in weather conditions in a period of 5 min, especially in the observed period, time intervals were consolidated into 24 h. Final input features were obtained as the average value of all related values that belonged to the observed hour. Output values were obtained as the sum of all exits and entrances to the hive in that hour. Table 1 shows the structure of the dataset with all used variables and their descriptions. The entire dataset and code source is publicly available at https://gitlab.com/mali_banekg/beeactivityforecast (accessed on 15 January 2022). The BEECNT_message OUT variable represents the number of bees that came out of the hive, while BEECNT_message IN represents the number of bees that entered the hive. These two variables were used as output in our models. In this way, we connected dependent and independent indicators of bee movement.
Counting the bees' entrances to and exits from the hive, and measuring the environmental conditions inside and outside of the hive are important for alarm initialization, and complement the results of other parameters that indicate the frequencies of movement of Electronics 2022, 11, 783 9 of 21 bees obtained from the sensory measurements of the immediate environment. Without measuring all the above factors, especially weather conditions, the number of bees in and out would not be of greater significance and would only be a statistical detail. All these variables collected from the environment to which the bees belong could be used for a model development that can very precisely predict bee movements.
Some of the used variables and output variable Bee_IN (y) are shown in Figure 6. On the basis of the time-series shape, it is obvious that some of these variables were important for our model, such as lux or outside humidity (when the humidity value is high, the bees do not leave the hive).
The BEECNT_message OUT variable represents the number of bees that came out of the hive, while BEECNT_message IN represents the number of bees that entered the hive. These two variables were used as output in our models. In this way, we connected dependent and independent indicators of bee movement.
Counting the bees' entrances to and exits from the hive, and measuring the environmental conditions inside and outside of the hive are important for alarm initialization, and complement the results of other parameters that indicate the frequencies of movement of bees obtained from the sensory measurements of the immediate environment. Without measuring all the above factors, especially weather conditions, the number of bees in and out would not be of greater significance and would only be a statistical detail. All these variables collected from the environment to which the bees belong could be used for a model development that can very precisely predict bee movements.
Some of the used variables and output variable Bee_IN (y) are shown in Figure 6. On the basis of the time-series shape, it is obvious that some of these variables were important for our model, such as lux or outside humidity (when the humidity value is high, the bees do not leave the hive). During the feature engineering phase, we took the advantage of the fact that there are periods during the day (24 h) when bees are not active, and a new binary variable is During the feature engineering phase, we took the advantage of the fact that there are periods during the day (24 h) when bees are not active, and a new binary variable is created that represents part of the day, daylight (5 a.m. to 6 p.m.) or night (other hours). If any activity was detected during the night period, we replaced them with the 0 value. This could usually happen around the border hours, when few exits or entries are detected.
Output columns BEE_IN and BEE_OUT were transformed by using the square root transformation (this is the so called power transformation) because in time-series analysis, this transformation is often considered to stabilize the variance of a series. Logarithmic transformation was skipped because some of the values were equal to 0. Figure 7 shows the original time series of the bee exits (red) and the time series after square root transformation is applied (blue). It is obvious that the number of outings on certain days was drastically reduced. This can be explained by the fact that weather conditions were probably worse that day, for example, it was raining or it was windy. transformation (this is the so called power transformation) because in time-series analysis, this transformation is often considered to stabilize the variance of a series. Logarithmic transformation was skipped because some of the values were equal to 0. Figure 7 shows the original time series of the bee exits (red) and the time series after square root transformation is applied (blue). It is obvious that the number of outings on certain days was drastically reduced. This can be explained by the fact that weather conditions were probably worse that day, for example, it was raining or it was windy.

Methodology
Time series data is a collection of observations obtained through repeated measurements over time, as is the case here. Unlike regression predictive modeling, time series also adds the complexity of a sequence dependence among the input variables. The recurrent neural networks are a powerful type of neural network designed to handle sequence dependence. The principal advantage of RNN over ANN is that RNN can model a collection of records (i.e. time collection) so that each pattern can be assumed to be dependent on previous ones. On the other hand, comparisons against ETS (error, trend, seasonal) and ARIMA demonstrate that (semi-) automatic RNN models are not silver bullets, but they are nevertheless competitive alternatives in many situations [69].
In this paper, we tested above-mentioned approaches, which are the most common and the most promising methods in time-series forecasting, in order to predict bee exits from and entries to the hive. This information may be important during periods when fruits and vegetables are sprayed, so that we can close bee hives when high activity is expected. First, we started from traditional approach ARIMA [70]. ARIMA is an acronym that stands for autoregressive integrated moving average, which is a generalization of the simpler autoregressive moving average that adds the notion of integration. After that, we tested two more advanced approaches, Facebook Prophet [71] and recurrent neural networks (LSTM) [72]. In the following subsections, a short description of these techniques is given. For more details, we refer readers to the original papers.

Methodology
Time series data is a collection of observations obtained through repeated measurements over time, as is the case here. Unlike regression predictive modeling, time series also adds the complexity of a sequence dependence among the input variables. The recurrent neural networks are a powerful type of neural network designed to handle sequence dependence. The principal advantage of RNN over ANN is that RNN can model a collection of records (i.e. time collection) so that each pattern can be assumed to be dependent on previous ones. On the other hand, comparisons against ETS (error, trend, seasonal) and ARIMA demonstrate that (semi-) automatic RNN models are not silver bullets, but they are nevertheless competitive alternatives in many situations [69].
In this paper, we tested above-mentioned approaches, which are the most common and the most promising methods in time-series forecasting, in order to predict bee exits from and entries to the hive. This information may be important during periods when fruits and vegetables are sprayed, so that we can close bee hives when high activity is expected. First, we started from traditional approach ARIMA [70]. ARIMA is an acronym that stands for autoregressive integrated moving average, which is a generalization of the simpler autoregressive moving average that adds the notion of integration. After that, we tested two more advanced approaches, Facebook Prophet [71] and recurrent neural networks (LSTM) [72]. In the following subsections, a short description of these techniques is given. For more details, we refer readers to the original papers.

ARIMA
An ARIMA model is a class of statistical models for analyzing and forecasting timeseries data. The model is fitted to time-series data to either better understand the data or predict future points in the series, known as forecasting. The model acronym was obtained after the key aspects of the model itself: • AR: Autoregression. A model that uses the dependent relationship between an observation and some number of lagged observations. • I: Integrated. The use of the differencing of raw observations (i.e., subtracting an observation from an observation at the previous time step) in order to make the time series stationary. • MA: Moving average. A model that uses the dependency between an observation and a residual error from a moving average model applied to lagged observations. A standard notation used for ARIMA is ARIMA (p,d,q), where parameters p, d, and q can only be integer numbers denoting the lag order (number of lag observations included in the model), the degree of differencing (number of times that the raw observations are differenced), and the order of moving average (size of moving average window), respectively. ARIMA works only with stationary time series. A stationary time series is one whose properties do not depend on the time at which the series is observed. One way to more objectively determine whether differencing is required is to use a unit root test. These are statistical hypothesis tests of stationarity that were designed for determining whether differencing is required. For this purpose, the Dickey-Fuller test was used ( Table 2). The results of the test for output variables BEE_OUT and BEE_IN are presented below. We could overwhelmingly reject the null hypothesis of a unit root at all common significance levels. In other words, the observed time series were stationary.

Facebook Prophet
While ARIMA is autoregressive forecasting that fits a linear regression line with the lag values and error terms, Facebook Prophet is a procedure for forecasting time-series data on the basis of an additive model where nonlinear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well. This is based on generalized additive models (GAMs), which provide a general framework for extending a standard linear model by allowing for nonlinear functions of each of the variables while maintaining additivity. Just like linear models, GAMs can be applied with both quantitative and qualitative responses.
In this model, three main components were used: trend, seasonality, and holidays. They were combined in the following equation.
where g(t) is the trend function that models nonperiodic changes in the value of the time series, s(t) represents periodic changes (e.g., weekly and yearly seasonality), and h(t) represents the effects of holidays that occur on potentially irregular schedules over one or more days. Error term t represents any idiosyncratic changes not accommodated by the model. The detected components for the entire BEE_OUT time series, trend, daily behaviour, and the influence of the added regressors are shown in Figure 8. Similar graphics were obtained for the entire BEE_IN time series (Figure 9). Facebook Prophet is very popular in time-series forecasting because it is robust to outliers, missing data, and dramatic changes in time series, whereas ARIMA is prone to white noise and nonstationary signals. The existence of outliers and missing data in such use cases is certain, bearing in mind that equipment may sometimes break down.
Here, we explore the problem of flexibly predicting Y on the basis of several predictors, X1, . . . , X p. Possible input variables were carefully selected from Table 1. More information of the selected features is provided in the results section.
where is the trend function that models nonperiodic changes in the value of the time series, represents periodic changes (e.g., weekly and yearly seasonality), and ℎ represents the effects of holidays that occur on potentially irregular schedules over one or more days. Error term represents any idiosyncratic changes not accommodated by the model. The detected components for the entire BEE_OUT time series, trend, daily behaviour, and the influence of the added regressors are shown in Figure 8. Similar graphics were obtained for the entire BEE_IN time series (Figure 9). Facebook Prophet is very popular in time-series forecasting because it is robust to outliers, missing data, and dramatic changes in time series, whereas ARIMA is prone to white noise and nonstationary signals. The existence of outliers and missing data in such use cases is certain, bearing in mind that equipment may sometimes break down.

LSTM Model
A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows for it to exhibit temporal dynamic behavior. They are distinguished by their memory, as they take information from prior inputs to influence the current input and output. While traditional deep neural networks assume that inputs and outputs are independent of each other, the output of recurrent neural networks depends on prior elements within the sequence. While future events would also be helpful in determining the output of a given sequence, unidirectional recurrent neural networks cannot account for these events in their predictions.
There are three types of vanilla recurrent neural network: simple (RNN), gated recurrent unit (GRU), and long short-term memory unit (LSTM). The difference among them is shown in Figure 10, but we omit the details because they are outside the scope of this paper. Long short-term memory (LSTM) networks were invented by Hochreiter and Schmidhuber in 1997 [72], and they set accuracy records in multiple application domains. Here, LSTM cells were used for the time-series modeling. Electronics 2022, 11, x FOR PEER REVIEW 13 of 21 Here, we explore the problem of flexibly predicting Y on the basis of several predictors, 1, . . . , . Possible input variables were carefully selected from Table 1. More information of the selected features is provided in the results section.

LSTM Model
A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows for it to exhibit temporal dynamic behavior. They are distinguished by their memory, as they take information from prior inputs to influence the current input and output. While traditional deep neural networks assume that inputs and outputs are independent of each other, the output of recurrent neural networks depends on prior elements within the sequence. While future events would also be helpful in determining the output of a given sequence, unidirectional recurrent neural networks cannot account for these events in their predictions.
There are three types of vanilla recurrent neural network: simple (RNN), gated recurrent unit (GRU), and long short-term memory unit (LSTM). The difference among them is shown in Figure 10, but we omit the details because they are outside the scope of this paper. Long short-term memory (LSTM) networks were invented by Hochreiter and  Schmidhuber in 1997 [72], and they set accuracy records in multiple application domains.
Here, LSTM cells were used for the time-series modeling.

Experimental Setup and Evaluation
In order to test the robustness of the models, a time-series cross-validator was used in the experiments. The TimeSeriesSplit class from scikit-learn library provides a very simple interface to split time-series data samples that are observed at fixed time intervals into training and test sets. In each split, test indices are higher than before; thus, shuffling in the cross-validator is inappropriate. In other words, this cross-validation object is a variation of Kfold, where in the k-th split, it returns the first k folds as the training set, and the (k+1)-th fold as the test set. a. ARIMA: In our experiments, different values for the p, d, and q parameters were tested, and the ARIMA model with the smallest RMSE error was selected for further testing. For p, parameter values of 0, 1, 2, 4, 6, 8, and 10 were tested, while d and q values were tested for values ranging from 0 to 3. A combination of parameters (p, d, q) that showed the best performance of the ARIMA model for BEE_OUT and BEE_IN outputs was (p, d, q) = (10, 0, 2); for BEE_IN, the combination of (0, 0, 2) was selected.

Experimental Setup and Evaluation
In order to test the robustness of the models, a time-series cross-validator was used in the experiments. The TimeSeriesSplit class from scikit-learn library provides a very simple interface to split time-series data samples that are observed at fixed time intervals into training and test sets. In each split, test indices are higher than before; thus, shuffling in the cross-validator is inappropriate. In other words, this cross-validation object is a variation of Kfold, where in the k-th split, it returns the first k folds as the training set, and the (k+1)-th fold as the test set. Parameters with the greatest influence on movements used to produce the prediction model were temperature and relative humidity inside and outside of the hive, the presence of rain, air quality, the range and intensity of daylight, UV radiation, and night and day shifts.
The forecast for the entire BEE_OUT time series is shown in Figure 11. This figure is given only to show that Facebook Prophet can successfully learn from the observed time series. In the results, the complete time-series forecast and presented metrics are based on previously invisible data (test dataset). a. Recurrent neural networks: The first step is to prepare the BEE dataset for the LSTM. This involves framing the dataset as a supervised-learning problem and normalizing the input variables. The same variables used by the Facebook Prophet algorithm were also used here. The supervised-learning problem is framed as predicting the bee exit a. Recurrent neural networks: The first step is to prepare the BEE dataset for the LSTM. This involves framing the dataset as a supervised-learning problem and normalizing the input variables. The same variables used by the Facebook Prophet algorithm were also used here. The supervised-learning problem is framed as predicting the bee exit or entrance at the current hour (t) given the bee exit or entrance measurement, and weather conditions at the prior time step. After this transformation step, the ten input variables (input series) and one output variable (bee exit or entrance at the current hour) are We defined the LSTM with 50 neurons in the first hidden layer, and 1 neuron in the output layer for predicting bee activity. The input shape was one time step with 10 features. Mean absolute error (MAE) was used as the loss function and the efficient Adam version of stochastic gradient descent. The model was fit for 50 training epochs with a batch size of 20. Lastly, we monitored both training and test loss during the training phase. At the end of the run, both training and test loss were plotted. Resulting loss curves during the training and validation phases for the BEE_OUT and BEE_IN outputs are shown in Figures 12 and 13, respectively.
Separate time-series forecasts on the test set for each fold are shown in Figure 14. The machine learning (ML) applied to the time-series data, in this case, recurrent neural networks, is an efficient and effective way to analyze the data, apply a forecasting algorithm, and derive an accurate forecast.  Separate time-series forecasts on the test set for each fold are shown in Figure 14. The machine learning (ML) applied to the time-series data, in this case, recurrent neural networks, is an efficient and effective way to analyze the data, apply a forecasting algorithm, and derive an accurate forecast. Separate time-series forecasts on the test set for each fold are shown in Figure 14. The machine learning (ML) applied to the time-series data, in this case, recurrent neural networks, is an efficient and effective way to analyze the data, apply a forecasting algorithm, and derive an accurate forecast.  All aggregated results are shown in Table 3. The best results were achieved by using recurrent neural networks, where the average RMSE on the test sets was 426.49 for BEE_OUT time series; for the BEE_IN time series, RMSE had a value of 378.464. All aggregated results are shown in Table 3. The best results were achieved by using recurrent neural networks, where the average RMSE on the test sets was 426.49 for BEE_OUT time series; for the BEE_IN time series, RMSE had a value of 378.464.  Table 4 shows the summarized optimal parameters for all investigated methods.

Conclusions
Comparisons of the experimental data against the model showed that our model represents the observed processes well. This is indicated by the results shown in the figures. According to the obtained results, the best model could achieve reliable bee activity prediction, with an error of only 8.9 missed bees per hour for bee exits from, and 7.8 missed bees per hour for bee entrances in a hive. We expect to see higher errors per hour when measurements are produced in the spring and summer months, and that additional feature engineering can help in model improving.
Apiculture presents complex problems pertinent to the life and wellbeing of bees. This paper presented a complete system for the monitoring and predictive analysis of honeybee activity, which addresses complex problems arising in beekeeping. Our aim was to improve existing solutions and create a fully developed system that would address some existing shortcomings.
The presented system is based on the application of IoT data collection and monitoring, machine-learning algorithms for beehive activity prediction, and remote control via IoT that enables undertaking certain corrective actions inside hives.
The increased number of sensors in the presented system is an important improvement over existing solutions. Each individual parameter influences bees in a different way and amount; however, when observed together and simultaneously, they provide more complete insight in the analysis of the results.
The application of advanced MAP enables the detection of sudden deviations and disruptions to the normal life of bees, and the prediction of potential disturbing changes. We showed that, by applying advanced algorithms, high-precision predictions on a daily basis are possible. In this way, by employing a real-time monitoring application and push notifications of potential changes, the beekeeper has real-time insight into the conditions of the hives, and can react adequately to prevent unwanted outcomes.
There are some limitations to our approach. For example, the testing phase was conducted on two beehives, and the main data were collected from one hive that had not been moved during the experiment. The experiment was conducted during a period in which there was no food from flowers, and when bee activity was less than that during spring.
In future work, the system will be upgraded with appropriate weight sensors, oxygen/carbon dioxide sensors, thermal sensors, automatic bee-feeding, ventilation, and gate-closing systems, and connectivity with other applications and solutions.
In future papers, we will provide extensive research that includes analysis of the influence of microwaves and the presence of electronic components. It is also necessary to include time as a special factor in reaching conclusions because, from a longer time instance, we come to experiential conclusions, since every change, measurement, or analysis requires some time to pass for the results to be qualitative.