Correlation of Climatic Factors with the Weight of an Apis mellifera Beehive

: The bee Apis mellifera plays an important role in the balance of the ecosystem. New technologies are used for the evaluation of hives, and to determine the quality of the honey and the productivity of the hive. Climatic factors, management, ﬂowering, and other factors affect the weight of a hive. The objective of this research was to explain the interrelationship between climatic variables and the weight of an Apis mellifera beehive using a vector autoregressive (VAR) model. The adjustment of a VAR model was carried out with seven climatic variables, and hive weight and its lags, by adjusting an equation that represents the studied hive considering all interrelationships. It was proven that the VAR (1) model can effectively capture the interrelationship among variables. The impulse response function and the variance decomposition show that the variable that most inﬂuences the hive weight, during the initial period, is the minimum dew point, which represents 5.33% of the variance. Among the variables analyzed, the one that most impacted the hive weight, after 20 days, was the maximum temperature, representing 7.50% of the variance. This study proves that it is possible to apply econometric statistical models to bee data and to relate them to climatic data, contributing signiﬁcantly to the area of applied and bee statistics.


Introduction
Through pollination, bees play an important environmental and economic role in human society [1]. Among all species, Apis mellifera is the most important [2,3]. It is estimated that global food production having an annual value of between USD 235 billion and 577 billion relies on direct contributions from pollinators [4]. In Europe alone, the economic influence of bee pollination is valued at 14.6 billion EUR per year [5].
However, the increase in the mortality of different bee species has caused concern among researchers, making it an important topic [6][7][8]. The decrease in the number of bees can harm the reproduction of plants, fruits, and seeds, which may have an impact on ecosystem processes, such as carbon sequestration and soil water retention [2], and have related economic impacts.
The decline in the bee population is caused by a number of factors, reinforcing the need for understanding of the impact of stressors on various causes and interactions between these factors [9]. Several factors can lead to bee mortality, such as climate change, pesticides, viruses, mites, pathogens, and reduced food resources and nests [10][11][12][13].
The creation of new research tools is an important step in the prediction of the impact of the management and survival practices on beehives [14,15]. As a result of the development of new technologies known as the Internet of Things (IoT), researchers have begun to add sensors and scales to hives, allowing remote control of the swarms [16,17]. This technological revolution became known as "precision beekeeping", where data such as begun to add sensors and scales to hives, allowing remote control of the swarms [16,17]. This technological revolution became known as "precision beekeeping", where data such as weight, temperature, humidity, wind, and rainfall are monitored [18]. Weight and temperature sensors have been used in longitudinal field studies of commercial colonies to detect differences in colony growth and behavior at different locations [19].
One of the main interests in studying the variation in the weight in a hive is to identify the impact on it of factors such as climatic effects [20]. The hive undergoes variation in its weight due to harvest, storage, food consumption (honey and pollen), development of new offspring, internal variation in water consumption, relative humidity, and the swarming, disappearance, and death of bees [21,22]. Ref. [11] showed clear evidence of a link between climate variability and winter bee mortality. In addition to reflecting the health and well-being of the swarm, the beehive weight represents the productivity of the colony [23][24][25].
In most cases, the use of sensors and, in particular, scales, in hives is undertaken in experimental hives, especially for scientific studies. One possible reason for the lack of use of these systems on a large scale in apiaries is the high price of installing them in all hives [26]. Data collected from the hives are used for various statistical applications, and mainly for the evaluation of variables that interfere with the swarm's behavior [16].
VAR (vector autoregressive) modeling, widely used in the field of econometrics, has not yet been applied to the hives of the Apis mellifera bee. This finding is based on a survey of international journals, published in the last thirty years, in the Scopus and Web of Science databases, which have made a significant and innovative contribution to the bee sector. Using simultaneous equations, VAR modeling helps in understanding the correlation and autocorrelation of the series, i.e., in the investigation of the interrelationship between the variables [27]. The use of these vectors contributes to the analysis of the complex structure of the relationship between the time series [28].
The research problem is presented in the form of a question: are VAR models capable of helping to understand the interrelationship between climate variables and the weight of Apis mellifera beehives? In the context presented, the objective of this article is divided into scientific (cognitive) and utilitarian elements. As a cognitive objective, we intend to adjust a VAR model capable of explaining the interrelationships between climatic variables and the weight of Apis mellifera beehives. In utilitarian terms, it is expected that this methodology will help beekeepers in decision making, especially at the right time for honey collection.

Materials and Methods
The steps for the development of the study are presented in Figure 1. In Figure 1, under Data Definition, the climatic variables that affect the development of the hive were established. The climatic variables that affect the development of the In Figure 1, under Data Definition, the climatic variables that affect the development of the hive were established. The climatic variables that affect the development of the hive are mainly temperature [29,30], dew point [31,32], humidity [21,33], wind speed [34,35], and pluviometric precipitation [36,37]. In step 2, climatological and hive weight data were collected. The data represent daily averages from hive sensors and historical climatological data that affect their development. The historical climatological data were taken from the Weather Underground website (www.wunderground.com, accessed on 19 January 2022), for the city of Asheville, NC, USA, where the hive is located. The city of Asheville has a humid temperate climate with mountainous topography [38]. Asheville is located at 35.49 • N, 82.61 • W, with an altitude of 650 m (3.28 ft) [39]. The region where the city is located has mild winters, cool summers, and annual rainfall of 130-200 cm/year (4.26-6.56 ft/year) [40]. The forests that make up the mountains of Asheville are populated by species of beeches, oaks, and pines [41]. These plant species contribute to the production of honey and resin (propolis), including in the hive used in this study.
The hive weight data were taken from the Hivetool website (hivetool.net, accessed on 18 January 2022), from 17 July 2017 to 18 November 2018, totaling 490 days. In these data, it is not well specified whether the species is Apis mellifera ligustica or Apis mellifera carnica. Table 1 presents all the variables used in the research, including their unit of measurement, abbreviation, and origin. In the next step, data adequacy was performed. Initially, control charts of the mean and standard deviation were applied to the hive weight variable to identify outliers, removing them from the database if necessary. The application of control charts assumes that the data are under statistical control. The limits for accepting the data are calculated according to Equations (1)-(3) [42].
where UCL corresponds to the Upper Control Limit; CL is the Central Line, equivalent to the Midline; LCL is the Lower Control Limit; X is the mean; and S is standard deviation. This technique was applied only to the beehive weight, because the data came from sensors installed in the hives, subject to failures caused by the environment and signal sending.
In the case of missing data for the variables, the regression method was used to replace incomplete values, thus completing all periods in the database with justifiable values. The regression method used was the norm.predict function of the multivariate imputation by chained equations (Mice) library of the R software, which is a linear regression with predicted values, using the code: "imp < −mice (mydata, method = "norm.predict", m = 1)".
To establish a degree of relationship and prove the correlation between variables, Pearson's linear correlation coefficient was used. This coefficient is defined by the ratio between the covariance and the square root of the product of the variances of X and Y [43].
The next technique was the development of the vector autoregressive (VAR) model. Before starting the VAR technique, the stationarity conditions of the series were checked, with the application of unit root tests: Augmented Dickey-Fuller (ADF) [44], Phillips Perron (PP) [45], and Kwiatkowski Phillips Schmidt Shin (KPSS) [46]. If the result shows a non-stationary condition, differences must be applied in order to make it stationary [47].
Another prerequisite for the VAR is to verify the relationship between the study variables using the Granger causality test (1986) [48][49][50]. Granger's causality test proves if there is a relationship between the variables that will compose the model [51]. If any variable has no causality, it is possible to remove it from the statistical model without causing any loss of explanation.
After verifying Granger's causality, it is possible to estimate a vector autoregressive (VAR) model. This model was developed by Sims [52]), and makes it possible to analyze the relationship between exogenous or endogenous variables, without distinction. The VAR model can be considered an extension of the auto-regressive (AR) model, which enables the prediction of values from two or more historical series through models of simultaneous equations [53]. The general equation for the VAR model is shown in Equation (4).
where Y t is the vector of the analyzed variable at time t, and C i corresponds to the constants, ij represents the autoregressive coefficients, where i and j = 1, 2, 3, . . . , n and s = 1, 2, 3, . . . , p. y t refers to a n-dimensional time series vector, where y 1t , . . . , y nt , and t = 1, 2, 3, . . . , ε it represents white noise, for i = 1, 2, 3, . . . , n. Residues are considered white noise when they present the following characteristics: zero mean, uncorrelated, and constant variance.
In order to estimate the order of these models, certain criteria are used, such as the Akaike information criterion (AIC) [54], Bayesian information criterion (BIC) [55], and Hannan Quinn (HQ) information criterion [56]. The selected model is the one that obtains the lowest value for the BIC criterion due to the large amount of observations in this study; otherwise the indicated criterion would be the AIC [57].
The impulse response function analysis, step 5, allows identification of the behavior of the endogenous variable after the application of random disturbances in the exogenous variables, and enables the visualization of the time of these effects on the main variable [58]. In this research, the representation of the impulse response function is via a graph, which promotes the identification of the impacts on the variables and the dissipation time.
Another way to interpret the effects between model variables is through the variance decomposition [59]. This technique stipulates a relative impact contribution for each variable in the system, with the objective of stipulating the relative effects of each one on the main variable in terms of variance [60].

Results and Discussion
The control charts were introduced to the data from the hive weight, using Equations (1)- (3). No outliers were identified as needing to be removed.
Correlation analysis was undertaken between the variable of interest, beehive weight, and the other variables, with a significance level of 5%. The result appears in Table 2. The variables are arranged in order of the correlation, from highest to lowest. Table 2 shows that the variables MaWS and MiWS do not have a statistically significant correlation (p < 0.05) with the variable beehive weight and, therefore, were removed from the study. The next step in the analysis was to adjust the VAR model to determine the interrelationship between the variables. In Figure 2, the variables are presented on a level basis.   In Figure 2, a stationary pattern in most variables is demonstrated, although there are oscillations, which are the reflexes of climatic variations, during the period analyzed. The stationarity of the variables, verified through the ADF, KPSS, and PP unit root tests, is shown in Table 3. Table 3. Stationary tests of the series on a level basis. In Figure 2, a stationary pattern in most variables is demonstrated, although there are oscillations, which are the reflexes of climatic variations, during the period analyzed. The stationarity of the variables, verified through the ADF, KPSS, and PP unit root tests, is shown in Table 3.  Table 3 shows that all variables are stationary in level terms. The next step was to verify the relationship between the study variables using the Granger causality test, with a significance level of 5%, as shown in Table 4. The variables are arranged in order of the correlation, from highest to lowest. The symbol ↔ indicates bidirectional causality between the variables analyzed, whereas the symbol → represents unidirectional interaction between the variables presented.
In Table 4, all variables show a Granger causality relationship, obeying the significance of 5%, making it possible to estimate the VAR model. Table 5 demonstrates the order of the VAR model.  Considering the sample size with 490 observations, the BIC criterion determines the number of lags of the endogenous variables to be included in the model. Taking this criterion into account, the lowest value was found for lag 1. This value also contributes to the HQ criterion.
The parameters of the VAR (1) model were estimated. The calculation of the beehive weight (BW) for a given period is given in Equation (5).
The coefficients of the adjusted VAR model, the values of the calculated t statistics, and the standard error of each coefficient are available in Table A1 in Appendix A.
Once the appropriate model was established, the impulse response function analysis was performed; this presents the behavior of the beehive weight variable after disturbances caused in MaT, MiT, MiH, MaH, MiDP, Pp, and MaDP, under the addition of two standard deviations, according to Figure 3.
In Figure 3, it is possible to observe that the response of the beehive weight variable after the application of random disturbances in the MaT variable presents a decay until the fifth instant; after the tenth period, it grows again, showing growth for 60 days. MaT is the variable that causes the most fluctuation in beehive weight since, between all factors, it is the one that takes the longest time to stabilize.
The response of the beehive weight variable to disturbances in the MiT variable initially shows growth until period 3, before stabilizing and showing stabilized growth after moment 5. It is perceived that the variable MiH imposes an increased response until moment 5 for the beehive weight variable, which grows smoothly until the 28th period, and later shows high stability.
The beehive weight variable shows an increased response until moment 5 and high stability after period 15, due to shocks applied to the MaH variable. The beehive weight response to disturbances in the MiDP variable declines sharply until period 5, maintaining a smoother decay until moment 35, and stabilizing afterwards.
The application of random disturbances in the Pp variable represents fast growth to the second period, before stabilizing until the fifth period, growing smoothly until moment 30, and then showing high stability. The MaDP variable promotes an increased response until moment 2 in the beehive weight variable, before decreasing smoothly until period 45 and then stabilizing.  The application of random disturbances in the Pp variable represents fast growth to the second period, before stabilizing until the fifth period, growing smoothly until moment 30, and then showing high stability. The MaDP variable promotes an increased response until moment 2 in the beehive weight variable, before decreasing smoothly until period 45 and then stabilizing. The next step is the variance decomposition, for which the results are shown in Table 6.  The next step is the variance decomposition, for which the results are shown in Table 6. According to Table 6, 87.26% of the variance in the beehive weight variable is explained by itself on the first day, followed by the minimum dew point (MiDP), which represents 5.33% of the variance, and maximum humidity (MaH), which explains 2.60%.
Over time, after the twentieth day, the variable that most impacts the hive weight variance is the maximum temperature, and the second biggest impact is caused by the minimum dew point (MiDP). The variables having the greatest impacts on the beehive weight variance on the twentieth day remain the most significant at the last moments of the analysis, after 60 days. The beehive weight variable explains 83.61% of its own variance, the maximum temperature variable represents 7.50%, and minimum dew point represents 3.21%. Over time, the proportion of beehive weight variance explained by the beehive weight variable itself gradually decreases, and the influence of other variables on its variance increases, especially the maximum temperature (MaT).
According to Equation (5), a higher minimum dew point makes it possible to increase the beehive weight. The dew point is defined as the temperature at which the water vapor condenses [61,62], and is considered a good indicator of temperature and humidity in the environment [63]. The minimum dew point means that water vapor condenses at lower temperatures and with a higher degree of humidity in the air. Therefore, the lower the dew point, the higher the humidity of the air and the lower the temperature. Extremely low temperatures have a negative impact on the development of the swarm [64,65].
The greatest relationship between beehive weight and the maximum temperature (MaT) is related to the normal functioning of the colony and, according to Equation (5), very high temperatures also negatively affect the weight of the cluster. Temperature has great importance for insects because it influences work activity and is directly linked to the regulation of the insects' growth and metabolism [66]. Apis mellifera strictly controls the internal temperature of its hive, keeping it between 91.4 and 96.8 • F [29,30].
The internal temperature control of the nest is performed by the bees themselves. In cold periods, the bees first make their cluster more compact to reduce heat loss, in addition to using metabolic heat to keep the hive warm [67,68]. In hot periods, the workers continuously vibrate their wings, thereby creating a draft to cool the nest. When the heat is very high, the workers collect quantities of water to be evaporated internally in the nest, together with the ventilation provided by their wings [69,70]. Another strategy is to vacate the colony and group outside the entrance, while some workers inside the colony continue the tasks of ventilation and evaporation [71].
This method of temperature control practiced by bees is called thermoregulation, in which bees use their body parts to regulate the internal temperature of the nest [72]. For the swarm to be successful in its development, bees depend on the realization of efficient thermoregulation [73].
The greater the work of the working bees to control the temperature, the greater the energy expenditure; consequently, they need a greater amount of food, such as nectar and pollen, to replace this expenditure [74,75]. This higher consumption of food by the hive decreases the reserves of honey, which may cause a reduction in the harvest period.

Conclusions
This research aimed to adjust a vector autoregressive (VAR) model to explain the interrelationships between climatic variables and the weight of an Apis mellifera hive. The objective was achieved successfully; thus, climatic variables were used to establish a VAR model that was capable of explaining the influence of temperature, dew point, humidity, and precipitation on the beehive weight. The beehive weight is directly related to the amount of honey produced and the health of the hive, such that healthy bee workers produce more.
The VAR (1) model adjusted to the climatic changes and beehive weight enabled the study of the impulse response function, which showed how the external interference in one of the variables has a propagation effect, and the variable's intensity, enabling identification of the impacts on the variables and the dissipation time. The variance decomposition showed the relative interference in the main hive weight variable of the variables of temperature, dew point, humidity, and precipitation. The impulse response function and variance decomposition prove that the development of the hive has a stronger relationship with the variables of temperature and dew point.
This research made three important contributions to the literature: (i) it demonstrated the possibility of applying statistical models to bee data and relating them to climatic data; (ii) it statistically determined the climatic variables having the greatest impact on the development of the hive (beehive weight); and (iii) it supported a discussion about the main impact variable on the weight of the hive, i.e., the temperature. This study provides an empirical basis and a discussion with the objective of having an impact on the academic community and professionals in the area, given the relevance and importance of the research topic.
Some limitations may be included and reviewed in future studies, such as: (a) the use of more hives as databases; (b) the use of hives from different locations; (c) the use of data on the source of honey production, that is, access to honey-bearing plants, including number of species, area, and efficiency; and (d) the employment of other statistical models, such as the Autoregressive Moving Average with Exogenous Inputs (ARMAX) model. This article can contribute both to academic research on statistical models of hives, and to professionals who work directly with this insect and intend to understand the impact of climatic variables on the development of the Apis mellifera hive.