What Can You Do with 100 kWh ? A Longitudinal Study of Using an Interactive Energy Comparison Tool to Increase Energy Awareness †

Reducing the use of energy is important for several reasons, such as saving money and reducing impact on the climate. However, the awareness among non-experts of how much energy is required by different activities and appliances is generally low, which can lead to wrong prioritizations. In this study, we have developed an interactive tool to increase “energy awareness”, and performed a longitudinal study to evaluate its effect. A group of 58 students first did a test to benchmark their current energy awareness, where their current knowledge of energy used for 14 different activities, such as driving vehicles and using home appliances, was measured. They then tried the interactive learning tool for 10 min. Next, they did the same test immediately after trying the tool, then again one week after trying the tool, and finally again six months after trying the tool. The results showed a significant learning effect in energy awareness with a “huge” effect size of 2.25 immediately after the intervention, a “very large” effect size of 1.70 after one week, and a “large” effect size of 0.93 after six months. The results further showed that the respondents consistently underestimated what 100 kWh could be used for, and especially so for appliances and activities requiring little energy. Before the intervention, on average they underestimated how much 100 kWh could be used for by 95.2%, and six months after the intervention the underestimation was 86.8%.


Introduction
Energy use is an important topic for sustainability and climate change.While energy efficiencies can be created by optimization and dematerialization, promoting conservation behavior remains an important part of saving energy and avoiding rebound effects [1].There has been much research during the last decade on using persuasive technologies and eco-visualizations to promote energy conservation.Energy conservation is a dominant topic in using persuasive technologies for sustainability and sustainable HCI (Human Computer Interaction) research [2].Technologies such as smart meters allow users to measure and provide accurate feedback on energy consumption at both household level and appliance level.Different research activities have used this data with a persuasive intent to promote energy conservation.Many of the smart meter technologies, visualizations and applications created in this research area used quantitative data of energy using watt hour (Wh) or kilowatt hour (kWh) as the main unit [3][4][5][6][7][8].
While kWh is a familiar term for everyone paying electricity bills or selecting an appliance, it is a fairly abstract unit that may be difficult to relate to, as explored by Karjalainen [9].Even if abstract, it is still a relevant unit.Wood and Newborough have argued that "The kWh is already familiar to most consumers and although few understand this unit, thorough comprehension is not necessary for an effective display" [10].However, this lack of understanding of qualitative information is a part of a general lack of energy literacy [11,12].While users know that using a lightbulb all the time, or charging their mobile phones requires energy, the lack of energy literacy or awareness needed to understand the differences in scale may lead to lack of or suboptimal conservation behavior, for example by removing a mobile phone charger during the night in order to conserve energy, but keeping underfloor heating on during the summer.
This article explores the understanding among consumers of energy, and the use of simulation as a way of increasing awareness.A prototype called "KiloWhat?" was created that uses simulation and play to practice and learn about energy consumption, exploring the following research questions: • RQ1: What is the learning effect of using such a tool?• RQ2: Was the energy use under-or overestimated before and after the intervention?
An experiment was performed with 58 engineering students.The experiment contained five steps (see Method section for more extensive description): (1) An assessment of their current energy understanding of kWh information as a baseline.
(2) Using and experimenting with the KiloWhat prototype for 10 min.
(3) A second test immediately after using the tool to see how the level of knowledge had changed.(4) A third test one week later to see how much knowledge had been lost.(5) A final test six months later to see how much knowledge had been lost.This article presents the design of the KiloWhat prototype, the method and results from the test, and discusses the implications of the results for designing education services concerning energy consumption.

Design of the Prototype
The prototype is an online service available at http://kilowh.at/.The main goals of the design are to: (1) Make quantitative kWh information easier to relate to by providing a learning experience where the users can translate kWh into everyday activities; (2) Help users to learn differences in scale between the energy consumption of different activities by allowing the users to play and compare between them.
The prototype is based on the code and idea of a similar prototype named "carbon.to"available at http://carbon.kilowh.at/,which explored the same concept for increasing greenhouse gas literacy [13].
The prototype provides the possibility of exploring and comparing the energy use of 16 different activities such as driving an electric car or using a television set.The activities are divided into five different categories.The values provided were average values gathered from a search in research databases and should only be seen as rough estimates since there can be large differences within one category.
(1) Energy generation: hours of solar panels, kg of coal, hours running in a treadmill.
(2) Home and appliances: washing machine loads, hours with a light-emitting diode (LED) lamp on, hours with a incandescent light bulb on, hours with a fridge on, hours with Wi-Fi on, mobile phone charges, hours watching television.(3) Transportation: km driving a gasoline car, km driving an electric car, km driving an electric assisted bike.
(4) Heating: hours heating a house with electric heaters, hours heating a house with geothermal pump.This category was later removed from the analysis since we found out that some participants interpreted it as averages over a year while other interpreted it as having a radiator turned on at maximum effect all the time.(5) Food: hamburgers (energy needed to produce).
The questions are presented in Appendix A and the reference values used are listed in Appendix B. When accessing the site, it shows by default 1 kWh compared to one of the activities selected randomly.The users can change the unit in the right side for comparing 1 kWh to other activities and increase or decrease the amounts using the plus and minus buttons on the top-right corners.The users can also change the unit on the left side to compare different activities against each other or to see what would be needed to generate that amount of energy.The interface is shown in Figure 1.
Sustainability 2018, 10, x FOR PEER REVIEW 3 of 11 (4) Heating: hours heating a house with electric heaters, hours heating a house with geothermal pump.This category was later removed from the analysis since we found out that some participants interpreted it as averages over a year while other interpreted it as having a radiator turned on at maximum effect all the time.(5) Food: hamburgers (energy needed to produce).
The questions are presented in Appendix A and the reference values used are listed in Appendix B.
When accessing the site, it shows by default 1 kWh compared to one of the activities selected randomly.The users can change the unit in the right side for comparing 1 kWh to other activities and increase or decrease the amounts using the plus and minus buttons on the top-right corners.The users can also change the unit on the left side to compare different activities against each other or to see what would be needed to generate that amount of energy.The interface is shown in Figure 1.The main learning strategy used in this prototype is simulation, "enabling users to observe immediately the link between cause and effect" [14].The intent is that by comparing kilowatt hours against different everyday units, and those units against each other, the users can simulate the energy use of those everyday actions and gain knowledge about them.
The software is a fork of the aforementioned carbon.to.It is developed using Ruby on Rails and JavaScript, the code is available as open source in GitHub (https://github.com/zapico/kilowhat).
As a first design iteration, the prototype was tested with 20 users.The users tested the site and provided feedback.From this feedback, several improvements were added, including a better grouping of activities using colored categories, a unification of time units to hours and the inclusion of several extra activities.

Method
In order to evaluate the effect of the prototype, an experiment was designed.The respondents were 58 first-year engineering students who had recently started a five-year educational program The main learning strategy used in this prototype is simulation, "enabling users to observe immediately the link between cause and effect" [14].The intent is that by comparing kilowatt hours against different everyday units, and those units against each other, the users can simulate the energy use of those everyday actions and gain knowledge about them.
The software is a fork of the aforementioned carbon.to.It is developed using Ruby on Rails and JavaScript, the code is available as open source in GitHub (https://github.com/zapico/kilowhat).
As a first design iteration, the prototype was tested with 20 users.The users tested the site and provided feedback.From this feedback, several improvements were added, including a better grouping of activities using colored categories, a unification of time units to hours and the inclusion of several extra activities.

Method
In order to evaluate the effect of the prototype, an experiment was designed.The respondents were 58 first-year engineering students who had recently started a five-year educational program within media technology.As part of an introductory course module about "Introduction to scientific research" they were required to participate in one scientific experiment.Out of the 58 students, 48 participated fully in the five parts of this experiment.In order to check which students had participated, the participation was not anonymous, but the students were informed that the email addresses collected would not be used to identify individuals.The students were also informed that the results would be used for research, and that they would be presented with the resulting paper with comments as a part of the course module.The only requirement was to participate; their actual results on the tests did not in any way affect the respondents grades or credits.
The respondents were given instructions by email, and they could do the assignment anywhere and anytime within the given time window.The email told the students about the aim of the prototype, "to increase energy awareness", an introduction to why increasing energy awareness is important, and the setup of the study.
Their first assignment consisted of three tasks that they were required to complete in a sequence within a window of two days, and the expected total time requirement was 30 min.The first task was to answer a questionnaire where they should give their best guess what 100 kWh corresponded to for 16 different activities, for example "How many kilometers riding with an electric bike", "How many hours powering a TV" or "How many charges of a mobile phone" (see Appendix A).In order to help the students understand roughly how much 100 kWh corresponds to, they were also informed that the price of 100 kWh electricity is roughly 100 Swedish crowns or about €10.The coincidence that 1 kWh roughly equals 1 Swedish crown makes it easy for users to assess if an estimate is reasonable, since money is relatively easy to relate to.They were instructed not to look up the answer anywhere while answering the questionnaire.The second task was to try out the prototype for exactly 10 min after watching a short screencast introducing the prototype.The third task was then to answer a copy of the first questionnaire.After one more week and then again after six months they were required to answer the same questionnaire in order to evaluate how much of the learning effect remained.
The first part of the analysis was to evaluate the learning effect.After receiving all responses to the questionnaire, the answers were edited [15], where the respondents whom we had good reasons to believe did not follow the instructions were removed from the analysis.Out of the 48 respondents who had answered all four questionnaires, four were removed since the log files showed that they had spent less than 10 min in trying out the prototype.These were removed from the analysis.Furthermore, we found three respondents whom for several questions had answered incorrectly by a factor of exactly 100, and their average results were also off by a factor close to 100.This led us to believe they had answered what 1 kWh would have corresponded to instead of 100 kWh.However, these were still included in the analysis, since they had used the prototype as intended but misunderstood how it worked, thereby actually decreasing their energy awareness rather than increasing it.
The answers from the remaining 44 respondents were then analyzed where both p-value and effect size of the learning were calculated.
The first relevant measure was the order of magnitude that the responses were distant from the correct value i.e., a response 10 times higher than the correct response is as good or bad as a response 10 times lower.Therefore, a transformation was performed on the responses, where the values of the student response was transformed by the function abs(log 10 (X/C)) where X was the student's answer and C was the "correct" reference answer.For example, if the correct answer was 10 h, then a student answer of both 5 or 20 h would have yielded a value of 0.30; if the student's answer was either 1 or 100 h both would have yielded a value of 1. Finally, the results of all 14 questions were averaged for each student.The resulting distributions were tested for normality using the Anderson-Darling normality test and were found to be roughly normally distributed and, therefore, paired t-tests could be used for analysis.
The second relevant measure, used to determine if the respondents underestimated or overestimated the energy use of the appliances and activities, was the same function but without using the absolute value.That means that in the examples above, an underestimation of the energy use would result in values of 0.30 and 1 respectively, but an overestimation of the energy use would have resulted in the values −0.30 and −1 respectively.

Results
The results of the two research questions are accounted for below.

RQ1: What Is the Learning Effect of Using Such a Tool?
The results from the first questionnaire (T0) showed that the energy awareness was low, as expected.The respondents' estimates were mostly incorrect for the activities that consumed least energy, such as Wi-Fi and charging mobile phones.The average estimation among the 616 data points generated by the 44 students who answered all four questionnaires as intended was 30.0 times higher or lower than the correct value (calculated as 10 avg(abs(log10(X/C))) ).Their estimates were improved substantially in the questionnaire answered immediately after trying the prototype (T1), where the average estimate was instead only 2.62 times higher or lower than the correct value.A paired-samples t-test on the average score of each individual showed that scores were significantly better for test T1 immediately after trying the prototype (M = 0.419, SD = 0.416) than for the baseline test T0 before trying the prototype (M = 1.476,SD = 0.493), t(43) = 11.277,p < 0.001, Cohen's d = 2.248 which corresponds to a "huge" effect size [16].
The results after one more week (T2) was on average 5.66 times higher/lower than the reference value, which a significant drop compared to the results of the second questionnaire, but a paired-samples t-test showed that the scores still were significantly higher at test T2 (M = 0.753, SD = 0.480) than at test T0 before trying the prototype (M = 1.476,SD = 0.493), t(43) = 8.014, p < 0.001, Cohen's d = 1.697, which corresponds to a "very large" effect size [16].
Finally, after six months (T3) the average results were 11.87 times higher/lower than the reference value, and a paired-samples t-test showed that the scores were still significantly higher (M = 1.074,SD = 0.437) than at test T0 before trying the prototype (M = 1.451,SD = 0.505), t(43) = 4.590, p < 0.001, Cohen's d = 0.929 which corresponds to a large effect size [17].
These statistics are presented in Table 1.The results above used the absolute value of the deviations from the correct reference value, which is suitable for discussing the learning effect since an overestimation of a value should be considered equally incorrect as an underestimation.From a sustainability point of view, however, it is also important to know if respondents underestimate or overestimate how much energy appliances and activities use, both as a baseline (i.e., the energy awareness without an intervention) and to see if this possible bias can partially or fully be removed by using a tool such as described in this study.
When taking into account the direction (i.e., underestimation or overestimation) of the answers provided by the respondents, the results from the first questionnaire (T0) showed that what the 100 kWh could be used for was severely underestimated.The average value of the estimations (calculated as avg(log 10 (X/C))) was −1.319 as can be seen in the row MEAN(T0) in Table 2.This corresponds to the fact that they underestimated how much the 100 kWh could be used for by, on average, 95.2%.
The results from the questionnaire answered immediately after using the prototype (T1) showed a significant improvement in energy awareness (t(43) = 9.664, p < 0.001, Cohen's d = −1.865),which corresponds to a very large effect size [16].The average value of the estimations (calculated as avg(log 10 (X/C))) had increased from −1.302 to −0.191.This corresponds to that they now underestimated how much the 100 kWh could be used for on an average by 35.6% as compared to the previous 95.2%.
The results from the questionnaire answered one week after using the prototype (T2) also showed a significant improvement in energy awareness compared to T0 (t(43) = 7.434, p < 0.001, Cohen's d = −1.615),which corresponds to a very large effect size [16].The average value of the estimations (calculated as avg(log 10 (X/C))) was now −0.373 which was an increase from T0 (−1.319) but a decrease from T1 (−0.191) as could be expected.This corresponds to them now underestimating how much the 100 kWh could be used for on an average by 57.6%.
Finally, the results from the questionnaire answered six months after using the prototype (T3) also showed a significant improvement in energy awareness compared to T0 (t(43) = 3.624, p < 0.001, Cohen's d = −0.753),which a medium effect size [17].The average value of the estimations (calculated as avg(log 10 (X/C))) was now −0.878 which was an increase from T0 (−1.319) but a decrease from both T1 (−0.191) and T2 (−0.373) as could be expected.This corresponds to them now underestimating how much the 100 kWh could be used for on average by 86.8%.The results are shown in Table 2.
Finally, looking into more detail into the 14 different use cases, the use cases where the 100 kWh would provide "a lot" of the unit in question (hours, kilometer or pieces) were consistently underestimated more than where it would provide "a little".For example, Q7-Q9 asked about how many kilometers an electric bike (Q7), an electric car (Q8) or a petrol-driven car (Q9) could travel using 100 kWh, and in all four questionnaires the distance that could be traveled by an electric bike was underestimated the most, followed by the electric car, which in turn was followed by the petrol-driven car.
Two different types of correlations were calculated.The first was the Pearson correlation of the averages of log 10 (X/C) for the students' answers to each question, with log 10 (C) for the corresponding question for each of the four questionnaires.The second was the Pearson correlation for Cohen's d (the effect size) for each of the questions with log 10 (C) for the corresponding question for each of the four questionnaires.
The results in Table 3 shows a very strong negative correlation for all calculations.This means that the respondents not only underestimated most results even after the intervention, but also that the level of underestimation is strongly inversely correlated with the energy use required, or to put it in other words that the energy use of appliances and activities that require little energy is overestimated.However, the learning effect after trying the prototype was also higher for these appliances or activities as can be seen in the strong inverse correlation with the effect sizes.

Conclusions and Discussion
In this paper we have implemented and tested a tool for increasing energy awareness.We set up two research questions: The results first confirmed our view that energy awareness in general was very low.However, spending 10 min with the tool increased energy awareness significantly, with an effect size after one week of 1.697 and, more importantly, a learning effect size of 0.929 after six months.Effect sizes for learning interventions were studied by Hattie [18], who in a study of more than 800 meta analyses about learning came to the conclusion that the average learning intervention had an effect size of 0.4.Cohen classifies an effect size of at least 0.8 as large, and a learning effect size of 0.929 after six months is therefore quite substantial.

RQ2:
Was the Energy Use Under-or Overestimated before and after the Intervention?
We found that the respondents generally underestimated how much 100 kWh could be used for, and that this was especially the case for equipment requiring small amounts of energy like a Wi-Fi router or to charge a mobile phone.The underestimation was highly inversely correlated with the energy required by the appliance or activity, but the learning effect was also greater for these low-energy appliances/activities.

General Discussion
While the focus on numbers and rational approaches to sustainable behavior has been criticized [19], increasing the capacity of understanding electricity information is an important precondition in allowing better energy decisions, and it will become more important with current trends of electric mobility and small-scale electricity production.This study confirmed that the understanding of quantitative energy information was low, even with a group of engineering students.This low energy understanding can affect energy decisions and create misguided efforts with low energy saving potential.The tool used in this study shows how providing anchor points that the users could refer to, increased their energy awareness, as having reference points to concrete everyday actions can help in having a more intuitive understanding of quantitative energy information.
The learning effect was, as shown, quite substantial even after six months, but as expected it was much higher shortly after the intervention.An increased long-term learning effect could likely be achieved if a spaced repetition [20] learning strategy was used, where the prototype was used again after, for example, one month and six months.The spacing effect, where studying efforts spread out over a longer period of time leads to better learning, has been shown to be "one of the most dependable and replicable phenomena in experimental psychology" [21].If the society would be interested in increasing energy awareness among the population, the allocation of 3 times 10 min of high school students' time for exercises such as this would, therefore, likely have a strong long-term effect.
The tool discussed in this article does not focus on changing behavior but on the energy awareness or ability of understanding energy information.This could be seen to be more relevant for people already interested in energy issues and their behavior, while it may not be of interest for other users, as motivation varies [22].The group used in this study could not be said to be either especially environmentally friendly, nor the opposite, and it could be expected that people with a genuine interest in environmental issues would put more effort into using the tool and thereby achieve better learning.
However, a potential risk with using a tool like this is that it could have adverse effects.Since the respondents in all four tests overestimated the energy use in most use cases, increased energy awareness could make the users realize how little energy many appliances and activities actually use, and thereby it could result in a rebound effect where the users of the system instead start using more energy than before.

Limitations and Future Research
One limitation of this study is that the respondents were not representative of society in general.They were young and many of them lived with parents and/or did not pay for energy use.They were also relatively well educated and as engineering students could be expected to have a better knowledge about energy and kWh than most other people in their age group.
The comparisons used in the presented tool are mostly short-term actions focused on individuals, such as using an appliance or driving a car.It could be interesting to explore the use of interactive tools for supporting the understanding of energy information in a bigger scale.This could include longer time frames (such as energy used in a year), and bigger space frames (such as energy used in a city or a country).
Future research will also include comparing this kind of learning with more conventional learning methods such as studying from a list.As we saw in our original study [23] the students in general preferred this interactive learning from a more conventional learning method, which in itself can motivate the development of tools like this, but the possible differences in learning remains to be evaluated.

5. 1 .
RQ1: What Is the Learning Effect of Using Such a Tool?

Table 1 .
Statistical analysis of the four tests T0-T4 for each of the 14 questions.Was the Energy Use Under-or Overestimated before and after the Intervention?

Table 2 .
Results of paired t-tests for the three different questionnaires.