1. Introduction
Japan significantly lacks domestic reserves of fossil fuels and heavily depends on the import of substantial amounts of crude oil, natural gas, and other energy sources to meet its energy demands. Furthermore, Japan previously relied on nuclear power to meet up to one third of its electricity demands before the serious 2011 earthquake and tsunami at Fukushima Daiichi Nuclear Power Station. Consequently, Japan has been facing some issues in the decline in its energy self-sufficiency ratio, the increase of electric power costs, and increasing amounts of greenhouse gas emissions from the increase of imported crude oil, coal, and liquefied natural gas (LNG) (Agency for Natural Resources and Energy [
1]). Particularly, the energy self-sufficiency ratio of Japan dramatically declined from 19.9% in 2010 to 6.0% in 2014, which is low compared to other OECD (The Organization for Economic Co-operation and Development) countries. The amount of greenhouse gas emissions from the electric power field increased by 83 million tons, from 374 million tons in 2010 to 457 million tons in 2014. Consequently, much attention has shifted to the promotion of renewable energy as an emergent response to satisfy Japan’s future energy needs. Renewable energy is an important source that potentially has low CO
2 emissions during electricity generation and contributes to the energy self-sufficiency ratio. Inf 2014, the renewable energy ratio in the amount of generated electric power in Japan was 3.2% (excluding hydroelectric power), representing an increase from 0.03% in 1973 and 1.1% in 2010, but is still low compared to other major countries (e.g., Germany, Spain, United Kingdom, United States, and France) [
1]. However, the supply of renewable energy in Japan is unstable due to low capacity. Furthermore, the problems of limitation of nuclear power plants and maximization of the usage of existing LNG-based thermal power plants are not easy to solve for the immediate balancing and enhancing of the electricity supply. Therefore, it is necessary to focus on the perspective of electricity demand—one of the potential ways for managing electricity needs by lowering electric power consumption. This can also help to reduce not only the dependence on imported energy resources from foreign countries (e.g., fossil fuels) but also the problems of environmental degradation or greenhouse gas emissions (e.g., CO
2). Demand response is expected to reduce consumers’ electric power consumption through incentives of dynamic pricing and real-time information feedback via smart meters.
In addition, the introduction of electric power market liberalization from the Japanese Government since April 2016 allows customers to select from multiple supplier companies competitively selling electricity (Agency for Natural Resources and Energy [
2]). The new market system can provide great advantages for customers in lowering the prices and empowering their choices of electricity suppliers; however, many of these suppliers only sell locally and mainly in large cities.
Nushima Island (
Figure 1) was selected as the experimental study site because it is a remote island located in the south of Hyogo Prefecture, where the demand for electric power heavily relies on the Kansai Electric Power Company in Osaka, Japan. The total population of this island is about 500 people with 231 households (Statistical data in 2012, Hyogo Prefecture Website [
3]). The current problem for Nushima Island is that it is far from the mainland, thus it is difficult for supplier companies to deliver electricity to its communities. Electric power shortages might occur on Nushima as well as more than 6500 other islands that have the same characteristics as Nushima. Therefore, it is important for consumers in these regions to adjust their consumption in the case of electric shortages and accommodate the fluctuations of renewable energy sources such as solar photovoltaic (PV) generation for future energy needs.
To the best of the authors’ knowledge, there is no previous study that has explored the treatment effects of dynamic pricing on energy-saving behavior using the propensity score analysis with panel data in field experiments. Therefore, the fundamental aim of the study was to assess the consumer behavior in electric consumption through the experiment of dynamic pricing using the propensity score analysis approach. The main differences between this study and Thoa et al. [
4] include the objectives and data analysis methodology. The former difference means that this study focused on relatively short-term response by dynamic pricing while Thoa et al. [
4] emphasized the persistence of dynamic pricing effects. The latter difference means that this study employed propensity score matching to solve a selection bias issue while Thoa et al. [
4] used conventional panel analysis by dividing the households into control and treatment groups randomly.
The following hypotheses were considered: the incentive of dynamic pricing has negative effects on electric power consumption and the tariff or specific deduction rate related to dynamic pricing has a positive correlation with energy-saving effect. With respect to the results of this experiment, a smart-energy community’s model that is both environmentally friendly and resistant to the electric power market’s instability may be established.
  3. Experimental Design
The field experiment of dynamic pricing on Nushima Island aimed to assess the possibility of the community’s self-control of its electric energy demand through dynamic pricing as well as the possibility of dynamic pricing policy according to solar PV generation potential.
The experiment was carried out on Nushima starting in 2012 through a five-year project. Fifty households were randomly assigned in five districts including South district, Central district, North district, East district, and Tomari district. In December 2012, smart meters were first installed in these fifty participating homes (
Figure 3).
In May 2013, tablet PCs that could provide real-time feedback on electric power consumption were distributed to those participants. A variety of real-time feedback information types, which calculated each household’s electric power consumption and per-capita electric power consumption, as well as ranking of the levels of electric power consumption among participating households in the experiment, was displayed on the tablet PCs (
Figure 4).
From the summer (August and September) of 2015 and the winter (January and February) of 2016, dynamic pricing was introduced. In this stage, the control and treatment groups were randomly selected based on the official household list of each district guided by local leaders. A control subject would receive a smart meter, a tablet PC, and a reward of 5000 Japanese Yen in summer at the end of the experiment while a treatment subject would receive a smart meter, a tablet PC, and a reward of 5000 Japanese Yen as well as a monetary incentive for energy conservation. In details regarding the monetary incentive, each treated participant was allocated 7000 points and points were then subtracted according to their electric power consumption. They could exchange their remaining points into cash at the end of the experiment (one point was equal to one Japanese Yen). There were three subtraction rates of dynamic pricing of 20, 30 and 40 points. These rates changed daily based on the weather forecast and were differentiated based on the PV power generation potential. The rate of 20-point deduction (20 points per kWh per person) was defined when weather forecast for both the preceding and current days included “sunny”. The rate of 30-point deduction (30 points per kWh per person) was defined when weather forecast for either the preceding or current day included “sunny”. The rate of 40-point deduction (40 points per kWh per person) was defined when weather forecast for neither the preceding nor current day included “sunny”. The reason we included the weather condition on the preceding day to decide those deduction rates is that it would have some influence on the remainder of a virtual battery to be charged by photovoltaic solar generation. Furthermore, data regarding hourly electric power consumption were collected based on the one-second interval data from the installed digital smart meters. In addition, the household characteristics of treatment and control groups were collected through a pre-experimental questionnaire survey. Secondary data from recorded meteorological data were also included. The result of this experiment is reported in Thoa et al. [
4].
In the final stage of the project, dynamic pricing was continuously conducted from 20 July to 16 August 2016. The panel data collected from a baseline survey (14 days before the dynamic pricing experiment was introduced), during the 14-day experimental period, and a follow-up survey (14 days after the treatment) spanned treated and control subjects. We implemented the experiment for relatively short period because it aimed to investigate three short-term effect of dynamic pricing under the hottest and sunniest weather condition when the gap between photovoltaic solar electricity supply and demand is usually widened.
The experiment proceeded as follows:
- -
- Firstly, all confirmed participants were told in advance about the experiment and were asked to freely select to be a control or treatment subject after receiving a detailed explanation. A control subject would receive a smart meter, a tablet PC at the beginning of the experiment, and a reward of 2000 Japanese Yen at the end of the experiment while a treatment subject would receive a smart meter, a tablet PC, and an initial 7000 points at the beginning, which would then be subtracted from based on their actual electric power consumption during the experiment. The treatment subjects could exchange their remaining points into cash at the end of the experiment (one point was equal to one Japanese Yen). We took a different approach to divide them into control and treatment group compared to the previous experiments in summer 2015 and winter 2016 when the group was randomly divided. This selection strategy was chosen considering that some households in Japan can choose either conventional fix electricity tariff system or time-variant dynamic tariff system. We aimed to investigate the different effects on the two groups taking both current choice options and selection bias into account. 
- -
- In terms of monetary incentive, namely dynamic pricing, three deduction points or tariff rates of 20, 40 and 80 were set up. These rates changed daily based on the weather forecast and were assumed based on the PV power generation potential (high tariff on rainy days and low tariff on sunny days). The rate of 20-point deduction (20 points per kWh per person) was defined when weather forecast for both the preceding and current days included “sunny”. The rate of 40-point deduction (40 points per kWh per person) was defined when weather forecast for either the preceding or current day included “sunny”. The rate of 80-point deduction (80 points per kWh per person) was defined when weather forecast for neither the preceding nor current day included “sunny”. 
- -
- Then, experimental data of 22 control participants and 28 treated participants regarding hourly electric power consumption, frequency of access to tablet PCs, and weather data were recorded. In addition, household characteristics were collected through a pre-experimental questionnaire survey.  Figure 5-  depicts the experimental procedure in July and August 2016. 
  4. Research Method and Data Description
The previous field experiment in the summer of 2015 and winter of 2016, which was assigned by means of administrator selection or local leaders who decided which households should get control or treatment, was defined as a quasi-experiment with the lack of random assignment. Meanwhile, the experiment in summer of 2016 divided those confirmed households again into control and treated groups based on their intentions, which could be motivated by some individual or economic factors. This could meet a challenge of selection bias or hidden bias that may lead to bias estimations, since true experimental designs are not always possible. (“Hidden bias is essentially a problem created by the omission in statistical analysis of important variables, and omission renders nonrandom the unobserved heterogeneity reflected by an error term in regression equations” [
11] (p. 357).) Therefore, this study applied the propensity score analysis with non-parametric regression developed by Heckman et al. [
12,
13] with the fundamental aim to improve the problems of selection bias in investigating changes of a consumer’s behavior by an application of monetary incentive namely dynamic pricing. 
This method is also called kernel-based matching which allows estimations of average treatment effects for the treated using information from all possible control subjects within a predetermined span. A three-step analytic process was employed. The best conditioning variables that are speculated to be causing an imbalance between treated and control groups and the propensity scores P(X) were investigated in the first step. Then, an analysis of weighted mean differences using kernel or local linear matching through the non-parametric regression were employed to match on P(X). Finally, sensitivity analyses and balancing test based on the matched samples were conducted.
Firstly, the propensity scores were estimated based on the predicted probability for all observations derived from the fitted regression model or propensity score model. In detail, the propensity score model of binary logistic regression was conducted. A binary logistic regression describes the conditional probability of receiving treatment as follows:
      where 
Di denotes treatment variables including Treateffect, Dum_20_elas, Dum_40_elas, and Dum_80_elas. 
Xi is the observable vector of control variables.
After an estimation of propensity scores, a matching algorithm must be defined to estimate the missing counterfactual for each treated observation. In this study, to take advantage of the panel data, the kernel-based matching algorithm (including kernel and local linear matching), which was developed from non-parametric regression methods, was used to identify the treatment effect for the treated (ATT). Specifically, kernel matching uses a kernel estimator for constructing the weighted mean for a focal point while the local linear matching using local linear regression (or lowess) with a tricube kernel function for constructing a smooth local linear regression to produce the smooth curve [
12,
13]. These approaches allow one-to-many matching by calculating the weighted average of the outcome variable for all control cases and then comparing that weighted average with the outcome of the treated cases. The difference between two terms provides an estimate of the average treatment effect for the treated (ATT) which is given by:
To estimate a treatment effect for each treated case 
i of treatment subjects, the average of the outcome 
Y1i (denoting the outcome for treatment group) was compared with an average of the outcome 
Y0i (denoting the outcome for the control group) for matched case 
j of the control subjects in the untreated sample. Matches were constructed based on the term of 
W (
i, 
j), which is defined as the weight derived from the distance of propensity score 
P(
X) estimated by the binary logistic regression on covariates 
X between a treated case 
i and each untreated case 
j. The 
W (
i, 
j) was determined by non-parametric regression methods. According to Fan [
14], local linear regression is expected to have more promising sampling properties and a higher minimax efficiency compared with kernel matching. Therefore, the local linear regression estimator was deployed in this study to determine the value of 
W (
i, 
j) and then the average treatment effect for the treated after matching on 
P(
X). In general, this method uses propensity scores derived from multiple matches to calculate a weighted mean that is used as a counterfactual. This implies that kernel-based matching using local linear regression constructs matches using all individuals in the potential control sample in such a way that it takes more information from those who are close to matches and down-weights more distal observations. Furthermore, the matching procedure ensures that the treated subjects will be matched to the control subjects that are most similar to them in terms of characteristics and, therefore, dissimilar subjects and outliers have no influence on the treatment effects.
Upon completing the matching estimation, sensitivity analyses and balancing test for propensity score matching were employed in the final step to check the robustness and adequacy of the results. Particularly, sensitivity analyses of different bandwidth specifications and different trimming schedules were used to confirm the results and test the sensitivity of findings to variations. In terms of bandwidth analysis (which is defined as the fraction that is used to determine the number of observations that falls into a span), three values of 0.01, 0.05, and 0.8 were used. Regarding trimming analysis (which is considered to impose a common support by dropping treated observations whose propensity scores fall outside the lower end of the common support region and non-treated observations whose propensity scores fall outside the upper end of the common support region), three trimming schedules to discard 2%, 5%, and 10% of study observations at the two ends were used [
11]. Furthermore, the balancing test was applied to check whether the propensity score is an adequate balancing score or the overall quality of estimation. Among the variety of balance tests, the standardized test of differences was employed in this study. The test was first mentioned by Rosenbaum and Rubin [
15] to check the balance between the treated and control group using the following formula for the standardized differences:
      where, for each covariate, 
 and 
 are the sample means for the full treated and control groups, 
 and 
 are the sample means for the matched treated and control groups, and 
VT(
X) and 
VC(
X) are the corresponding sample variances. 
Bbefore(
X) and 
Bafter(
X) are defined as the percentage of the standardized difference or bias between the treated and control groups before and after matching, respectively. The standardized difference is considered the size of the difference in means of a conditioning variable 
Xi between the treated and control group, divided by the square root of the variances in the original samples, which allows comparisons in the differences in 
X before and after matching. They also suggest that the matching quality can be evaluated by a reduction in the standardized difference. If the differences remain, then either the propensity score model should be estimated using a different approach, or a different matching algorithm should be used, or both.
The authors believe that using all observations in full panel data may provide the best matching since subjects with different demographic characteristics may reach the same action or behavior in hourly electric consumption in the same experimental period while subjects with the same demographic characteristics may vary in their action or behavior during the pre-experimental period and during the experimental period.
  4.1. Data Description
This study used panel data of hourly electricity consumption among households 14 days before the experiment and 14 days during the experiment. 
Table 1 presents the brief definition and source of measurement of outcome variable, treatment variables, and control variables in the study.
Regarding the outcome variable of Lneleccon, the logarithm of hourly electric power consumption was used to estimate the percentage of change in electricity consumption between the control and treated group.
With respects to treatment variables, the study used four main treatment variables of Treateffect, Dum_20_elas, Dum_40_elas, and Dum_80_elas to estimate the treatment effects of dynamic pricing with specific deduction rate on consumer behavior change in electric power consumption. 
In terms of control variables, some demographic variables associated with the number of people in family, the number of air conditioners, the number of refrigerators and commercial refrigerators were expected to have positive impacts on electric power consumption. In addition, electric power consumption was expected to be more when households are living in wooden houses. Moreover, the variable related to real-time feedback (i.e., frequency of access to tablet PCs) was hypothesized to have a negative impact on electric power consumption. This means that the more a household accessed the tablet PCs to check the real-time feedback information and real consumption, the more energy they saved. Some other variables refer to weather conditions, namely the cooling degree hour, hourly mean wind speed, and hourly mean temperature, that were also expected to affect consumers’ electric power consumption.
The statistical descriptions of the hourly electric power consumption, the frequency of access, and demographic characteristics between control and treated groups in the study are shown in 
Table 2.
To estimate the differences between the control and treated participants, the difference test in means was used. The 
p-value of 
t-statistics in 
Table 3 shows that there is a statistically significant difference between the control and treated participants in major variable of hourly electric power consumption and all observable variables at the either significant level of 0.01 or 0.05. More specifically, the average hourly consumption of the control participants is significantly higher than the treated participants. The numbers of air conditioners, refrigerators, and commercial refrigerators used by the control participants are substantially more than those used by the treated participants. Additionally, the control participants tend to live in wooden houses compared to the treated participants. On the other hand, the number of household members in the control group and the frequency of access to tablet PCs are clearly smaller than those in the treated group. Furthermore, the ownership of a happy-e contract (discount after 22:00) within the control group is significantly smaller than that of the treated group. Consequently, these differences imply that there is a clear existence of selection bias in the experimental design. The employment of the propensity score analysis approach is therefore necessary to mitigate the problem of selection bias in estimating the effects of treatment in the study.
  4.2. Difference Test in Means among Treatment Variables
In term of the treatment variable of Treateffect, the result in 
Table 4 reports that the average difference in the hourly electric power consumption between the treated participants during the experimental period and of both during the pre-experimental period and the control participants is statistically positive at the significant level of 0.01. This means that the treated participants’ hourly electric power consumption during the experimental period is 9.5% lower than theirs during the pre-experimental period and the control participants’ consumption.
In terms of the effects of dynamic pricing with three different rates of deduction point, the result in 
Table 5 shows that the average difference in the hourly electric power consumption between the treated participants in 20-point deduction days during the experimental period and both themselves during the pre-experimental period and the control participants is statistically positive at the significant level of 0.01. This implies that the treated participants’ hourly electric power consumption in the 20-point deduction days during the experimental period is less than theirs during the pre-experimental period as well as the control participants’ consumption by approximately 4%.
Similarly, the result in
Table 6 indicates that the average hourly electric power consumption of the treated participants in 40-point deduction days during the experimental period is significantly lower than that of themselves during the pre-experimental period as well as that of the control participants by 7%.
The result in 
Table 7 presents that the average hourly electric power consumption of the treated participants in 80-point deduction day during the experimental period is significantly lower than that of themselves during the pre-experimental period as well as that of the control participants by 8.5%.
According to the results of the difference test in means, the incentive of dynamic pricing via real-time feedback with specific deduction rate demonstrates significant impacts on participating consumers’ energy-saving or reduction of their electric power consumption.
  6. Conclusions and Policy Implications
The study estimated the effects of dynamic pricing via real-time information feedback on electric consumer behavior on a remote island of Japan and explored the potential utility of solar resources as a basis for energy policy.
Using a panel data of 50 households in the pre-experimental and experimental period, propensity score analysis with non-parametric regression (i.e., local linear regression) was employed to measure the treatment effects of the monetary incentive of dynamic pricing via real-time feedback on hourly electric power consumption. The results obtained from the propensity score analysis approach reveal that the incentive of dynamic pricing had statistically significant effects on the reduction of consumers’ electric power consumption. More specifically, treated participants tended to reduce their electric power consumption during the experimental period by 9.6% compared to both themselves during the pre-experimental period and the control participants.
Furthermore, the results confirmed that the higher the tariff (e.g., point deduction rates) is, the less electric power the household consumes. Particularly, in terms of the incentive of the 20-point-deduction rate, treated households are more likely to reduce their consumption, by 5.0%, during the experimental period than both themselves during the pre-experimental period and the control households. With respect to the incentive of the 40-point-deduction rate, treated households tend to reduce their consumption by 7.2% during the experimental period compared to themselves during the pre-experimental period and the control households. Regarding the 80-point-deduction rate, treated households are also more likely to reduce their consumption, by 9.8%, during the experimental period than themselves during the pre-experimental period and the control households.
In addition, the propensity score analysis approach with local linear matching has been reported to precisely estimate the treatment effects of dynamic pricing with specific deduction rates when using panel data with non-randomization due to the utility of information from all control participants for matching with treated participants. These major results are also consistent with previously mentioned literature that show dynamic pricing has substantial effects on consumer behavior change by reducing their electric power consumption.
The results of this study suggest the policy implications of a demand management system to accommodate the solar energy output fluctuation by shifting consumption to days having more solar radiation and reducing electric power consumption at the same time. The limitations of this study include a relatively short experiment period and a small sample size. An additional larger and longer experiment is required to accurately depict the real-life behavior of the consumers.