This section will present the activity Modelling method, and formalize the activity simulation process for a single-family and a multi-family household, respectively.
4.1. Activity Modelling
Definition 2 (Activities)
. Activities are a finite set of individual domestic actions that may incur energy consumption within a household, i.e., where a represents an independent activity.
Definition 3 (Activity sequence)
. An activity sequence is time dependent, defined as , where , . The sequence represents a series of activities conducted at the discrete time .
TUS data are generated from a series of activities taken by residents (or users) in a household. An activity is formulated as a state, and different states can be seen for tracking the activities of a user in a time series. In this paper, we focus on modelling activity sequences using the Markov chain, while the model for estimating household energy consumption profiles using the generated activity data will be our future work. The Markov chain consists of a number of discrete states, and movement from a state to another is a stochastic process that satisfies a certain probability. The theory of Markov chain and some practical application can be found in [
21,
22]. Markov chain is widely used for modelling sequential stochastic processes, for example, to predict wind speed and model domestic occupancy.
Formally, a Markov chain model contains a finite set of states,
. The probability of a state changed to another state is defined as
which represents the fact that, when a current state
i is given at the time step
k, the next state
j is conditionally independent of the past states, noted as
. At a discrete time step
k, a transition probability matrix (TPM) with the size of
can be created, where
represents the number of states at the time step
k. Each entry in the matrix represents the change probability between two states. The sum of the transition probability of a particular state to other states equals 1, i.e.,
. For example, the Markov chain of sleeping may have two possible states
, and the possibility of moving from one state to another is determined by the transition probabilities,
p (see
Figure 2). Therefore, if a person is sleeping, there is a probability of
that, in the next time interval, (s)he will be awake; or there is a probability of
that, in the next time interval, (s)he will still be sleeping. Likewise, there are two possible transitions from awake. The corresponding TPM is
We now describe how to estimate the transition probabilities from a time step
k to its next time step
with an empirical data set. The transition probability from a state
i to
j can be estimated by
where
represents the count of users whose activity has changed from
i to
j in the TUS data, and
represents the count of users whose activity is
i at the time step
k.
However, at some time step, if no transition was found between two states in the empirical data, e.g., due to the lack of some data points,
will be zero. The transition matrix will become sparse. A zero value will lead to no activity transition between the two states during data generation, which is not the case in reality. We, therefore, use Laplace smoothing to cope with the zero-frequency problem, which increases the number of each transition by one so that there is no transition with zero probability, i.e.,
Note that the Laplace smoothing method is intended to exclude the zero transition probability of two states, but cannot guarantee adequacy.
Example 1 (Creating transition probability matrix)
. To train activity models using our TUS data (detailed in Section 5), we combine the activities into eight categories, each of which corresponds to a state in the Markov chain (see Table 2). As activities are time-dependent, we generate a total of 143 transition matrices based on our TUS data. The TUS data were collected for every 10 min, and have 144 time steps in total (see Figure 3). We assume that the potential states remain the same in all time steps, i.e., . After the transition probability metrics are created with TUS data, we need to create probability density functions (PDFs) for each of the states. These PDFs are used to determine the duration of a started activity from each time step. We use Kernel Density Estimate to calculate the PDFs on the empirical data. Mathematically, a kernel is a positive function
controlled by the bandwidth parameter
h [
23]. With this kernel function, the density estimate to a point
x within a group of points
is defined as:
where
K is the selected kernel function,
is the point that falls in that state, and the bandwidth
h is a smoothing parameter controlling the trade-off between bias and variance in the result. In this study, we use Gaussian kernel as the density function to fit the duration distribution of a state. However, note that there are other kernels available. For example, a binned kernel density function can be a better option when the size of points,
N, is big.
Example 2 (Activity duration distribution)
. Figure 4 shows the examples of sleeping activities that started at 9:00 p.m. and 12:00 a.m., respectively, which are fitted by Gaussian distribution. The figure shows that the highest probability of the sleeping duration is 0–60 min for beginning to sleep from 12:00 a.m., and 240–300 min for starting to sleep from 9:00 p.m. 4.2. Generating an Individual Activity Sequence
We start to generate an activity sequence when the Markov chain model parameters and the PDFs are ready. The generation involves a random walk on the Markov chain. We first pick an initial state according to the probability of the time-use data at a specified starting time, then a subsequent state is picked based on the transition probability matrix. When a state is decided, we generate the duration of the state by sampling the corresponding PDF. Algorithm 1 describes the generation process in more detail. In this algorithm, the set of transition probability matrices and the probability density functions of all states at each time step are given as the input. The output is the generated activity sequence representing the pattern that will potentially lead to energy consumption. The function argument m is the length of the sequence to be generated.
In the implementation, we optimise this algorithm by pre-sampling a large number of durations using the PDF for each state at each time step (over 10000), and save the duration data in a database. When a sequence is being generated, uniform sampling is performed upon the saved data.
Algorithm 1 Generating an individual activity sequence. |
- 1:
functionGenActivitySeq(, , m)
|
- 2:
| ▹ Initialize an empty activity sequence |
- 3:
|
- 4:
while do
|
- 5:
if k = 0 then
|
- 6:
Pick the initial state s
|
- 7:
else
|
- 8:
Generate the next state according to the k-th TPM,
|
- 9:
Sample the duration of s according to the k-th PDF,
|
- 10:
for do
|
- 11:
if then
|
- 12:
| ▹ Append the state s with the duration of l |
- 13:
|
- 14:
return
|
Example 3 (Generating an activity sequence)
. Figure 5 shows an example of generating an activity sequence using the models. The number in the table is the activity code described in Table 2. The duration of activity is generated by sampling based on the Gaussian probability density function, i.e., , whilst the change from an activity to the next is decided according to the probability in the TPM at the corresponding time step. 4.3. Generating Activity Sequences for Multiple Family Members
The previous section described generating an activity sequence for an individual person or a single-family household. The energy demand of a single family may well be derived from the generated sequence of activities. However, it is more complicated to simulate the activity sequences for multiple family members of a household, since some activities can demonstrate an exclusive nature between the sequences. A good example is that, if there is a couple living in a household and they are accustomed to making supper together, the cooking activity will be displayed at the same time in both activity sequences. However, if they agree that only one person is responsible for the supper, the cooking activity will appear in the activity sequence for one of them. This can be applied to other cases, such as laundry or cleaning. To simplify, this paper will focus on modelling the activities of a household with a maximum of two family members, and leave modelling more than two family members to the future work.
For the data generation, we first create a so-called independent activity sequence using the aforementioned approach for an individual person, then generate activity for the other person as a dependent sequence. The two sequences are denoted as S and , respectively, and is used to express the generation order. We describe the generation of a dependent sequence in the following.
Definition 4 (Non-flexible Activities)
. Non-flexible activities are defined as a subset of A, i.e., . For any individual activity, , it appears in the activity sequence constrained by the conditions on S.
Let
represent a sequence constructed by a unique activity
a that lasts for a duration of
l time steps, and ⊂ denotes the containing relationship of a sequence. Therefore, the generation function will be augmented with
S in order to generate
.
represents a sub-sequence of
S between the time index
i and
j; the operator ∩ does a pair-wise joining of the activities between two sequences; and the operator
denotes the length of a sequence. The generation process can be formalized as:
where
is the target sequence initialized by an empty sequence;
is the generated sequence; and ⊕ is the operator of concatenating two sequences.
The generation process should meet one of the following conditions:
where
and
.
For condition (
7), the non-flexible activity occurs in both sequences, and fully/partially overlaps. It can be used to simulate, for example, making food together. The condition (
8) is used when the non-flexible activity should appear in
, but may not necessarily appear in
S. It can be used to simulate when the dependent family member makes food, i.e.,
, or two family members make food separately at different times. The condition (
9) is used when only the independent family member makes the food, i.e.,
.
With these conditions, we now describe the generation process of the dependent sequence
by the Algorithms 2–4 under the conditions (
7), (
8) and (
9), respectively. The algorithms are self-explained and commented on. We suppose that the independent activity sequence
has been generated, and will be used as the input for the algorithms. In addition, the set of activities,
A, the TPM,
, and the probability density function of each activity at each time step,
, is augmented to generate a dependent activity sequence.
Algorithm 2 Generate a dependent activity sequence with the condition (7). |
- 1:
functionGenDepActivitySeq(A, , , , )
|
- 2:
| ▹ Initialize an empty activity sequence |
- 3:
while do
|
- 4:
| ▹ Generate a sub-sequence of , with duration of l using and |
- 5:
|
- 6:
if then
|
- 7:
if then
|
- 8:
|
- 9:
else
|
- 10:
|
- 11:
return
|
Algorithm 3 Generate a dependent activity sequence with the condition (8). |
- 1:
functionGenDepActivitySeq(A, , , , )
|
- 2:
| ▹ Initialize an empty activity sequence |
- 3:
while do
|
- 4:
| ▹ Generate a sub-sequence of , with duration of l using and |
- 5:
|
- 6:
if then
|
- 7:
if then
|
- 8:
|
- 9:
else
|
- 10:
|
- 11:
return
|
Algorithm 4 Generate a dependent activity sequence with the condition (9). |
- 1:
functionGenDepActivitySeq(A, , , , )
|
- 2:
| ▹ Initialize an empty activity sequence |
- 3:
| ▹ A set of non-flexible activities |
- 4:
while do
|
- 5:
| ▹ Subtract the activity set O from A |
- 6:
| ▹ Generate a sub-sequence of , with the length of duration, l, using and |
- 7:
|
- 8:
if then
|
- 9:
|
- 10:
else
|
- 11:
for each do
|
- 12:
if then
|
- 13:
|
- 14:
return
|