3.1. Statistical Evaluation
Subsequently, after obtaining the average daily temperature for each day over the three-year period, separately for each station, a comparison was performed among the nine means. Each of the nine average daily temperatures was compared to each observation of the day. In order to achieve this, certain statistical measures of error calculation were utilized. The term “error” is used in a statistical sense, referring to the statistical deviation of each observation from the calculated mean rather than implying “incorrectness.” Several statistical indices have been utilized in the meteorological literature for evaluating forecasting processes, and in this analysis, six of the most popular statistical errors are employed, as shown in
Table 2 [
4].
The statistical indices were applied to the data from the three stations for each of the eight calculated means on each day of the sample. An example of the derived indices values is given in
Table 3 for Hellinikon station. Due to the fact that six statistical indices represent different properties of the deviations, a ranking was performed based on their relative values for each averaging approach in an effort to identify the averaging method that systematically leads to smaller overall deviations. Specifically, the error measures of each averaging method have been evaluated using the classical method of assigning penalties, where a score of 1 represents the mean with the smallest error and therefore the smallest penalty. Similarly, the mean with a score of 9 has the largest measure and the highest penalty, indicating the largest deviation and the worst statistical outcome.
In
Figure 1 (left), the ranking of the statistical indices is represented for Hellinikon station as calculated for each averaging method. The bar length signifies the rank value, while the colored notes indicate the statistical index used. The same analysis was performed for Elefsina and Tatoi stations, and the trends of the results were almost identical, at least with respect to the averaging methods that rank best or worst. Based on this classification, it is evident that the means with eight observations (ARITH-8, GEO-8, and HARM-8) yield the lowest ranking (smaller errors) compared to the methods that are based on fewer observations. Statistically and climatologically, this is expected since a larger sample of observations optimizes the mean.
Moreover, it is shown that the geometric mean of the eight observations is the best overall method, with minimal difference from the harmonic mean of the eight and a clear superiority compared to the arithmetic mean based on the same number of observations. In an effort to summarize the information extracted in the analysis, an overall ranking was also calculated based on all six indices (values ranging from 1 × 9 to 6 × 9) for each station, and the results are presented in
Figure 1 (right). Based on this, all station analysis data seem to support the same outcome: that the geometric mean of eight performs slightly better (even more so for the Hellinikon station), while the harmonic mean of eight also follows the same ranking, especially for the Elefsina and Tatoi stations. These conclusions confirm the typical rule that applies to the three means: the harmonic mean is always smaller or equal to the geometric mean, and the geometric mean is always smaller or equal to the arithmetic mean, i.e., HARM ≤ GEO ≤ ARITH.
Moreover, it is evident from
Figure 1 (right) that the arithmetic means of two and four observations are associated with larger errors. Specifically, the ARITH-4 mean exhibits the highest error ranking for all three stations. Similar results are obtained for the most commonly used ARITH-2, which is based on the difference between the maximum and minimum values and ranks second to last in performance. The difference in deviations among these two arithmetic means (based on 2 and 4 observations) and their corresponding geometric and harmonic means is apparent, with the harmonic mean exhibiting superior performance among the four-observation means and the geometric mean among the two-observation means.
The means of eight observations are clearly superior from a statistical perspective, with the geometric and harmonic means outperforming the arithmetic mean. In the case of four observations, the preferred mean is the harmonic mean, followed by the geometric mean. Based on the results, it is evident that the arithmetic mean of four observations has the highest measure of error and is therefore not recommended. In the case of the most widely used method that is based on the maximum and minimum values, the results favor the geometric mean with a significant advantage in almost all statistical error measures compared to the other two means.
In conclusion, in order to better capture the daily temperature variability, the mean daily temperature calculation approach is desirable to be based on as many observations as possible and not solely on the maximum and minimum values. If, however, we need to use these two observations, the geometric mean is recommended. Furthermore, in the case that a limited number of observation reports are available per day based on specific time intervals, the harmonic mean is suggested.
3.2. Applications
As mentioned earlier, there are several applications for the mean daily surface temperature. The conventional approach of temperature averaging is not only utilized to calculate the daily average mean temperature but also serves as the basis for deriving various temperature-based climate indices. These indices include heating degree days, cooling degree days, etc., which hold significant importance in assessing residential heating and cooling requirements as well as agricultural activities. By capturing the spatiotemporal variations of these indices, valuable insights can be obtained regarding the temporal patterns and geographic distribution of heating and cooling needs and the suitability of different regions for agricultural practices [
5].
The impact of the averaging method on heating degree days is briefly demonstrated in this section. According to the American Society of Heating, Refrigerating, and Air-Conditioning Engineers (ASHRAE) [
6] method, the degree days are defined as the difference between the mean daily temperature (T
mean) and the reference temperature (T
b). Therefore, the equation for heating degree days (HDD) is as follows:
T
mean is obtained from the commonly used method through maximum and minimum daily temperatures (ARITH-2), while the reference temperature is set to 15.5 °C. The reference temperature has been established in European Union countries, as recommended by the United Kingdom’s Meteorological Service. This specific value is suitable for regions located at moderate latitudes and is followed by the European Environmental Agency. In this application, HDD is additionally calculated with the daily mean temperature as derived through the Formulas (3)–(10). The objective is to quantify how the use of a different temperature averaging method can affect the amount of HDD and, consequently, the heating allowance that could be attributed to the specific area. For Elefsina station, the annual HDD as calculated by (11) is given in
Table 4.
The arithmetic means of two and four observations yield the lowest degree-day values, which consequently lead to a smaller economic impact on heating allowances. However, from a statistical perspective, these means exhibit poorer statistical error measures as they do not adequately capture the temperature fluctuations within a day. Such an important application with socioeconomic ramifications should be based on the most representative and reliable climatological input.