Electricity Consumption Forecast Model Using Household Income: Case Study in Tanzania

: When considering the electrification of a particular region in developing country, the electricity consumption in that region must be estimated. In sub-Saharan Africa, which is one of the areas with the lowest electrification rates in the world, the villages of minority groups are scattered over a vast area of land, so electrification using distributed generators is being actively studied. Specifically, constructing a microgrid or introducing a solar system to each household is being considered. In this case, the electricity consumption of each area needs to be estimated, then a system with enough capacity could be introduced. In this study, we propose a household income electricity consumption model to estimate the electricity consumption of a specific area. We first estimate the electricity consumption of each household based on income and the electricity consumption of a specific area can be derived by adding up them in that area. Through a case study in Tanzania, electricity consumption derived using this model was compared with electricity consumption published by TANESCO, and the validity of the model was verified. We forecasted the electricity consumption in each region using the household income electricity consumption model, and the average forecast accuracy was 74%. The accuracy was 87% when the electricity consumption in Tanzania mainland was forecasted by adding the predicted values.


Overview
In 2018, 860 million people, or approximately 11% of the world's population, do not have access to electricity. In comparison, 46% of Africa's population does not have access to electricity and, in particular, 55% of the people in sub-Saharan Africa do not have access to electricity. Sub-Saharan Africa is considered as one of the least electrified areas in the world [1]. In this study, we focused on sub-Saharan Africa and considered how to proceed with electrification. Research into electrification methods is therefore important, including how much electric energy is necessary in the electrified area. Equipment must be introduced in accordance with the electricity demand of the area.
In sub-Saharan Africa, where the population is spread over a large area, the introduction of distributed power sources into each region has often been considered. In such cases, the electricity consumption in a specific area instead of the whole country must be considered. In this study, we constructed a model (household income electricity consumption model) to estimate the electricity consumption according to household income. Electricity consumption in a specific area can be estimated by multiplying the number of households.

Literature Review
Many studies have estimated the electricity demand or consumption of a country, also for the purpose of electrification in developing countries. Several methods can be used to estimate electricity demand or consumption, and Bazilian et al. [2] categorized the methods into six types: Trend method, End-use method, Agent-based models, Time series method, Econometric method, and Neural network techniques. For the end-use method (or engineering based method), the following studies have been conducted. da Silva et al. [3] estimated the electricity consumption of the industrial sub-sector of pulp and paper by 2035 in Brazil based on the hierarchical structure using FORECAST software. Debnath et al. [4] estimated household energy consumption, but not limited to electricity, in Bangladesh based on the number of typical appliances owned and the operating hours. Regarding the econometric method, Amarawickrama and Hunt [5] used six econometric models to forecast peak demand and electricity consumption in Sri Lanka by 2025. In models using neural networks, Azadeh et al. [6] used an artificial neural network (ANN) to forecast annual electricity consumption of energy intensive manufacturing industries in Iran. Ogcu et al. [7] estimated monthly electricity demand in Turkey using ANN and support vector regression, and verified the prediction accuracy. Inglesi [8] calculated electricity consumption in South Africa by 2030 using the Engle-Granger methodology. Morales-Acevedo [9] estimated electricity consumption of 2050 in Mexico by using a model with the evolution of population, gross domestic product per capita, and energy intensity. Rosnes and Vennemo [10] expressed the electricity consumption of sub-Saharan Africa by the sum of market demand, suppressed demand, and new connections. McQueen et al. [11] assumed that each household's electricity demand follows the gamma distribution, and estimated the maximum electricity demand on distribution networks using Monte Carlo simulation. Regarding the relationship between income and electricity consumption discussed in this study, McNeil and Letschert [12] and Daioglou et al. [13] constructed a model that describes the relationship between household income and appliance ownership rate using the Gompertz curve. In order to determine the capacity of minigrid for electrification, there are methods of estimating the electricity consumption by survey, but it is pointed out that the accuracy is low.
Blodgett et al. [14] compared survey-predicted electrical energy use to actual measured consumption of customers of eight minigrids in rural Kenya. They showed that the predictions of the general survey approach were poor, which had a mean absolute error of 426 Wh/day per customer on a mean consumption of 113 Wh/day per customer and also an alternative data-driven proxy village approach, which uses average customer consumption from each minigrid to predict consumption at other minigrids, was more accurate and reduced the mean absolute error to 75 Wh/day per customer. Hartvigsson and Ahlgren [15] compared load profiles and performance metrics based on interviews and on measurements relating to a rural minigrid in Tanzania. This study showed distinct differences between load profiles based on interviews and measured data. Moreover, suggestions for how the interview process can be improved were presented.
The purpose of this study is to predict the electricity consumption necessary for electrification in a specific area. Many studies estimated electricity consumption for a whole country and these methods are not suitable for forecasting electricity consumption in a specific area of a country. The method of forecasting electricity consumption based on the historical data of electricity consumption cannot be applied to new areas that are not electrified yet or where no historical data exist. One study, in which the electricity consumption of the household was estimated from the number of owned home appliances [4], has been used as a reference. For the relationship between income and the number of home appliances owned, we referred to the models in [12,13]. The electricity consumption of each appliance was assumed to follow the gamma distribution; the demand at each home was also assumed to follow the gamma distribution in [11]. In order to estimate electricity consumption using the proposed model, it is sufficienent to conduct a survey of income to the target area, and a survey of income and number of appliances owned for already electrified areas similar to the target area.

Modelling Electricity Consumption per Household by Income
We developed a new model for estimating annual electricity consumption of a certain area, named the household income electricity consumption model. By summing the electricity consumption of some group of households, we estimated the electricity consumption of the group. Figure 1 depicts a flowchart of the model. Our model uses statistical data to consider the change in the number of appliances owned as income increases and derives electricity consumption as a probability distribution. To construct the household income electricity consumption model, we required data on household income, food expenditure, and the number of electric appliances owned in a target area. First, we derived the relationship between household income and the number of owned household appliances. We assumed that the annual energy consumption of each household's appliances follows the gamma distribution, as the frequency of use, the amount of electricity consumed, and preferences do not change with income. To calculate annual electricity consumption, we considered the rated output of home appliances and the amount of time they are used in a year; we assumed, in this model, that neither will change due to income. Let E be the set of electric appliances to be evaluated in the model. The elements of E are refrigerator, washing machine, TV, and so on. Although it is possible to formulate a model that can accurately estimate electricity consumption if E consists of as many products as possible, there is a limit to the actual data that can be acquired. However, it is sufficient if E covers products that are generally owned by high-income groups in the target area and have a significant influence on household electricity consumption. Appliances with high electricity consumption that are not owned by many households in the high-income group do not need to be included in E, and conversely, appliances that are owned by many households in the high-income group with quite low electricity consumption are also not included in E. Therefore, when constructing the model, E must consider whether it contains sufficient appliances.

Appliance Ownership Model
We modeled what kind of appliances households tend to have according to income in the target area. In [12,13], the relationship between household income (or expenditure) and the ownership rate of electric appliances was examined. The rate was approximated using the Gompertz curve. In this study, we used the Gompertz curve based on these results. For household income I and arbitrary a ∈ E, the number of owned appliances per household S(a, I) is given by: where α and β are coefficients that determine the shape of the Gompertz curve. S(a, I) increases monotonically and converges to S max (a) as I increases. S max (a) is the number of appliances in a household when it has sufficient income. In this study, S max (a) is the average number of a owned by the group of the top several percentages of income, for example, 1% of the incomes in the data given. We perform a regression using Equation (1) to obtain α and β.

Electricity Consumption per Household Estimation Model
We assumed that the annual electricity consumption of households (kWh/year/hh) for each a ∈ E follows a gamma distribution. Among continuous probability distributions, we examined the appropriate one to express the annual energy consumption. Even for the same appliance, differences exist in the usage time for each household and in the rated output of different products. Therefore, the annual electricity consumption varies by households, but it is at least zero, which is the case of no use. We assumed that households with a large number of hours of annual use and a large rated output of the products owned therein will have a considerably large annual electricity consumption. However, this is limited to a few households, and the greater the electricity consumption, the fewer such households. Many households are expected to consume a similar amount of electricity each year. Based on the above discussion, two non-negative and long-tail continuous probability distributions-the gamma distribution and the log-normal distribution-were considered. However, the log-normal distribution tends to have a relatively thicker base so is more likely to produce unusually large values of annual energy consumption.
To simplify the model, we assumed that the electricity consumption of each home appliance is independent. A gamma distribution Gamma(κ, θ) is a type of continuous probability distribution for which the expected value of the variable in question is greater than 0. It is characterized by two parameters-a shape parameter κ > 0 and a scale parameter θ > 0-and the probability density function is characterized by where Γ(κ) is the gamma function. κ and θ are coefficients that determine the shape of the probability density function. A distribution function is expressed by where γ(κ, x/θ) is an incomplete gamma function. In a gamma distribution, and mean µ and SD σ derive κ and θ.
Once κ a , θ a is found for each a, the distribution Gamma(κ a , θ a ) of electricity consumption is determined. As we obtain the number of ownership for each a and income I from the appliance ownership model (Section 2.1) S(a, I), the household electricity consumption D(I) (kWh/year/hh) with income I is where sampling X a ∼Gamma(κ a , θ a ) for a ∈ E. Though it is possible for a relationship to exist between income and electricity consumption for some appliances, we assumed there is no relationship between them for each appliance to simplify the model. However, the electric electricity consumption above is insufficient as a sample. We must set an upper bound on electricity consumption defined by owned appliances and by payable money for the electricity bill. The former is simple and the upper bound is 24 × 365 times the rated output of the owned all home appliances. For the latter, the upper bound is the amount of household income minus food expenditure. If an amount exceeding the upper bound has been sampled, this is solved by resampling from uniform distribution (In resampling, we assume uniform distribution from zero to the upper bound. When sampling from the same gamma distribution, it has a possibility not to obtain a value below the upper bound even after repeated so many trials.). To use electricity, households have to pay for the electricity bill (In case households purchase SHS (Solar Home System) at home, households also have to pay for the cost of SHS. In this paper, we call the necessary cost to use electricity "electricity bill".) in reality. The amount of electricity consumption derived only from the number of appliances owned tends to be too large, and the upper bound derived from electricity bill expenditure should be considered as set. If we plan to invest equipment for electrification based on electricity consumption derived without the upper bound, then there will be an equipment overinvestment. The purpose of setting the upper bound is to avoid overinvestment by setting the upper bound in consideration of electricity bill expenditure. The upper bound obtained here is better to be small in the set of all upper bounds. On the other hand, if the upper bound value obtained here is lower out of the set of upper bounds and we plan to invest based on this, the amount of electricity will be insufficient. Therefore, the upper bound should be set higher.

Case Study on Mainland Tanzania
We estimated the electricity consumption in each region using actual income data from mainland Tanzania, and verified the accuracy compared with the actual electricity consumption data. The United Republic of Tanzania is located in East Africa, and is one of the least developed countries in the world [16] with a GDP of approximately USD 58.001 billion (2018) [17] . The pupulation of Tanzania is 44.93 million, and the population density is 51 persons/km 2 (2012) [18]. The electrification rate is 37% (2018) [1]. The electricity in mainland Tanzania is supplied by TANESCO (Tanzania Electric Supply Company Limited, Dar es Salaam, Tanzania). Tanzania was a suitable country for case study because of the low rate of access to electrification and low economic development. The United Republic of Tanzania consists of mainland Tanzania and Zanzibar. We conducted a simulation case study only on the mainland. A model was constructed based on household income data obtained from work in [19], and the electricity consumption predicted from this model was compared with actual electricity consumption data described in [20] to verify the validity of the model.

Income Data
We examined the changes in electricity consumption corresponding to changes in household income based on theory in Section 2. In this study, we calculated household income using the data from [19]. The data included socioeconomic surveys such as agricultural production, non-agricultural income, and consumption. The data included at least 5000 households, categorized based on the following sources of income; agricultural business, small business, wages, and other income, considering the periodic variation in income for each category. We corrected these variations for each year, and used the sum of each household's income for each category of household income.

Upper Bound of Electricity Consumption
We derived the relationship between household income and food expenditure so that the total of gross income minus food expenditure would be the upper bound of the electricity bill expenditure. Here, we identified the ratio of food expenditure to household income. We excluded non-electrified households to build models considering only electrified households. Data on food expenditure in the past week were obtained from work in [19], and these data were used to calculate the amount of food expenditure for one year. We then used this result to obtain the ratio of food expenditure to household income.
We regressed the log curve results in the relationship between household income and food expenditure as follows; let x be the annual household income and y be the ratio of food expenditure to total household expenditure, then y = −0.12 log (x) + 1.40 (5) where R 2 is 0.31. Figure 2 shows data plots and the obtained curve. We used the TANESCO T1 electricity unit cost of 306 TSH/kWh (USD 0.14/kWh) to convert electricity bill expenditure into electricity consumption. As described in Section 2.2, the upper bound should be set to an appropriate or a larger value, and the following were assumed in this case study.
(1) Allocate all income except food expenditure to pay for the electricity bill.
For the equation for calculating food expenditure, R 2 = 0.31, which is quite small. Where income is low, many actual values deviate above the value obtained from the equation and, conversely, actual values after middle income often fall below.
It is converted to electricity consumption using the electricity cost of TANESCO.
In (1), in reality, it is hard to think that households pay such a large electricity bill. This does not contradict the setting that derives the upper bound value to a larger value. Regarding (2), in the low-income group, which is a large number of people in Tanzania, many actual households that pay more for food than derived from the equation. Therefore, adopting the value derived by the equation means that food expenditure is lower than in reality, namely, household can pay more for electricity bill. This leads to setting the upper bound higher. For (3), TANESCO's electricity bills are inexpensive considering electrification of un-electrified areas. In other words, assuming the same electricity bill expenditure, the electricity consumption will be calculated higher, which leads to setting the upper bound to a larger value. For all three assumptions, the upper bounds are set to increase electricity consumption, and when planning electrification based on the consumption calculated derived from the upper bound, electricity shortages will not occur.

Relationship between Household Income and Number of Household Appliances
The difference in the number of household appliances owned according to household income is discussed in Section 2.1. It is important to consider E when constructing this model. In this simulation, there are 14 items of E, of which 11 are from the results of the questionnaire in [19] asking each household what they own, and 3 are generally deemed important for estimating electricity consumption: washing machines, water heaters, and lights [21]. Here, E = { a complete music system, computer, dish antenna decoder, fan/air conditioner, iron (charcoal or electric), radio and radio cassette, refrigerator or freezer, sewing machine, telephone (mobile), television, DVD player, washing machine, water heater, and lights }.
Based on the data in [19], we only analyzed the households that are electrified, which included 1091 households. Household income and the ownership of 11 items out of the elements a of E were arranged in income order, and we assumed the average ownership of the top 10 households based on income as S max (a), top 1% of the data. Then, by calculating the average value of household income and ownership per 100 households in order of income, we performed a regression on the Gompertz curve against them and estimated α and β, respectively, and obtained a result. There were no data for washing machines, water heaters, and lights in [19]. Therefore, for the value of S max (a), we adopted the high-income value described in [21]. For appliances without data in [19], we adopted α and β of appliances that seem to be similar in ownership change and for which data existed. Therefore, for α and β, the value of a washing machine was assumed to be the same as the value of an iron (charcoal or electric), the value of a water heater was assumed to be the same as the value of a refrigerator or freezer, and the value of lights was assumed to be the same as the value of a telephone (mobile). Table 1 shows estimated parameters. R 2 is the coefficient of determination, and for most a, the data are represented by the Gompertz curve model. Figure 3 shows the relationship between household income and the number of owned household appliances.

Relationship between Household Income and Electricity Consumption
We then prepared for the distribution of electricity consumption (gamma distribution) for each electric appliance. To achieve that, we created the standard rated output of each household electrical appliance and the rated output of a heavy user's product, regardless of income and specific hours of the day. In this model, although the number of ownership of household electric appliances changes according to income, the type of product and its use is independent of income and depends on the users, which we assumed are stochastically distributed. There is a tendency to hold high-grade products according to income; however, these are not necessarily products with large outputs and their frequency of use is low. The consumption for electricity is expressed by the output of each product and usage time. Therefore, we prepared the rated output for two patterns, standard consumption and heavy user consumption, and the average usage time per day for each electric appliance (Table 2). We prepared this table referring to the information in [21], products sold in [22], and, if there were no data, we made reasonable assumptions. In this case study, we compare with the electricity consumption of TANESCO in 2010. Therefore, we assumed values reflecting trends as of early 2010s. We derived the parameters of electricity annual consumption in each gamma distribution using the standard "Consumed electricity (kWh/year)" in this table as the mean µ, and the difference between standard "Consumed electricity [kWh/year]" and heavy users as 2σ.

Prediction of Household Electricity Consumption Using the Model
With regards to the model, we constructed a graph and confirmed how the household annual electricity consumption changes with income. We sampled 5000 electricity consumptions for each income level in increments of USD 500, ranging from USD 250 to USD 14,750. Then, that sampling values were plotted and drawn to box plot charts.
For each income level and appliance a ∈ E, the number of possessions was derived from the Gompertz curve and Table 1 as discussed in Section 2.1. Then, the electricity consumption of a was derived from the gamma distribution and Table 2, as discussed in Section 2.2. By summing the multiplication of the number of possessions and the amount of electricity consumption for all a, we obtained the sampling of the household electricity consumption of that income. If it exceeded the upper bound, it was resampled from the uniform distribution. The simulation was implemented in Java, and Apache Commons Math [23] was used as a library for random number generation. A plot of the relationship between annual household income in USD and annual electricity consumption in kWh/year is shown in Figure 4. If households' annual income is USD 10,000, then the median of annual consumption is 3000 kWh, after which it increases gradually.

Model Evaluation
We compared the electricity consumption derived with the value of electricity consumption reported by TANESCO [20] to verify the validity of the model (Here, as the electricity consumption in households is forecasted, we use the sum of D1 and T1 as TANESCO's electricity consumption.). According to the electricity consumption in Tanzania mainland in 2010 for each region published by TANESCO, we compared it with the value calculated from the model based on the number of subscribers and verified the prediction accuracy of the model. We used the data in [19], where household income differed on each region. In [19], a survey of households throughout Tanzania was conducted, which included items related to regions and income. From this, only the income of households receiving electricity supply from TANESCO was extracted and used as income data for each region. These data have a small sample size, so, for example, using the average household income in that region would be unreliable. Therefore, we assumed that each region has their own income distribution function, and the parameters of the distribution function were obtained using the maximum likelihood estimation method. The income distribution function uses the generalized beta of the second kind (GB2), which was explained in [24,25]. We randomly sampled from the distribution function using the parameters obtained for each region to determine the income of a household, and then sampled the electricity consumption at that income from the model. The samplings were performed for the number of customers in 2010, TANESCO, and the sum of the electricity consumption was used as the predicted electricity consumption in that region. The probability density function of GB2 is expressed as follows. The parameters a > 0, b > 0, p > 0, q > 0 were estimated, where B(p, q) is a beta function.
For the estimation, we used the GB2 package of R [26]. Table 3 lists the GB2 parameters obtained.  Table 4 lists electricity consumption, the number of customers of TANESCO, estimated electricity consumption, and errors used in the evaluation in each region. Considering the sum of each region's electricity consumption as the overall consumption in Tanzania mainland, the error between the actual measurement value and the predicted value was about −0.13, confirming that the prediction has sufficiently high accuracy. The average absolute error of each region was 0.26. The average households in each region was about 40,000, indicating that the prediction accuracy for a region of this size was 0.74. However, we observed a large variation in the accuracy of each region. It has been pointed out that the estimation of electricity consumption based on surveys was not very accurate [14,15]. Blodgett et al. [14] estimated electricity consumption for each of the 8 minigrids in Kenya by the general survey approach. As the error tends to decrease as the number of customers increases, the accuracy of total electricity consumption of eight minigrids was compared with our results. The estimated value was 72,359 Wh/day and the measured value was 17,365 Wh/day, where the number of customers was 154 of all eight minigrids, and the error was 317%. The electricity consumption estimated in this study is for each region in Tanzania, and the number of customers varies from several thousand to several hundred thousand. Although it is difficult to make a simple comparison due to the vastly different conditions, the mean prediction accuracy of 74% shown in this study is nevertheless considered to be sufficient.
To apply this method, for example, suppose there is a business operator in a developing country that builds a microgrid in each village. This can be used to estimate the total capacity of solar power generation equipment needed by such an operator considering the installation of microgrids in thousands of villages.

Conclusions and Future Work
In this study, we developed a household income electricity consumption model to estimate residential electricity consumption in a specific area based on household income. The residential electricity consumption of a specific area can be derived by adding up the electricity consumption of each household based on income in that area.
As a result of verification using the electricity consumption in Tanzania in 2010, the accuracy when predicting electricity consumption in the entire Tanzania mainland was 87% and the average accuracy of forecasting electricity consumption in each region was 74%.
Further research is required to improve the accuracy of prediction. The number of persons per household could differ in the number of products owned, rated output, and usage time. As there were no data for verification, we did not model the number of persons per household in this paper. However, detailed analysis and verification of the number of persons per household will help to improve the accuracy. The connection to the electric power company may be shared by multiple households [20], so it may be useful to analyze the number of people per connected customer as an object of discussion.
As our model can be adapted to other countries beyond Tanzania, we hope that our findings will help lead to suitable electrification in many developing countries.