2.1. Related Definitions and Properties
The following defines important concepts in the mining process of status set sequence patterns with time windows, where
Table 1 is the symbol definition of the status set sequence pattern mining model with time windows.
Definition 1. Status Itemset (SI):An itemset can be defined as a status itemset if and only if it meets the following constraints:
;
;
.
Definition 2. Frequent Status Itemset with Time Windows (FSITW): FSITW can be expressed asif and only if it meets the following constraints:
;
;
.
When mining frequent status itemsets with time windows, this paper draws on the downward closure property of the Apriori algorithm, puts forward the two concepts of subset and superset and proposes three new important properties based on these to calculate efficiency. This is shown below.
Definition 3. Subset: The set of status items withinthe time window can be represented as, ifis a subset of, if and only if.
Definition 4. Superset:The set of status items withinthe time window can be represented as, ifis a superset of, if and only if.
Property 1. If status itemsetis frequent in the time window, then any subset of it must also be frequent in the time window.
Proof. Since the status itemset is frequent in the time window , . Since is a subset of , . Thus, is also frequent in the time window . □
Property 2: If status itemsetis infrequent in the time window, then any superset of it must also be infrequent in the time window.
Proof. Since the status itemset is infrequent in the time window , . Since is a superset of , , . Thus, is also infrequent in the time window . □
Property 3. If status itemsetis infrequent in the time window, status itemsetis infrequent on, and, so the status itemsetmust be infrequent in the time window
Proof. Since the status itemset is infrequent in the time window and the status itemset is a superset of according to property 2, is infrequent in the time window . Similarly, is infrequent in the time window , i.e., . Assuming is greater than , then Thus, the status itemset is infrequent in the time window . □
These three important properties have an important value and role in frequent status set mining and can greatly reduce the number of candidate status itemsets, thus greatly improving the computational efficiency.
Definition 5. Status-Set Sequence with Time Windows (SSTW): SSTW is a collection of orderedstatus items with time windows such as,, and W represents the time window in whichoccurs.
There are also two important properties when searching for a candidate SSTW, which can greatly improve the speed of the algorithm, as shown in Properties 4 and 5 below.
Property 4. If the status set sequencein the time windowis frequent, then any subset of its status set sequence must also be frequent in the time window.
Proof. Since the status set sequence is frequent in the time window , . Since is a subset of , , . Thus, is also frequent in the time window . □
Property 5. If the status set sequencein the time windowis infrequent, then any of its super status set sequences must also be infrequent in the time window.
Proof. Because the status set sequence is infrequent in the time window , . Since is a superset of , , . Thus, is also infrequent in the time window . □
Definition 6. Mean Support of SSTW: The average support for SSTW of the formcan be defined as, whereis the support of, i.e.,.
Definition 7. Mean Confidence of SSTW: The average confidence for SSTW of the formcan be defined as, whereis the confidence of, i.e., .
Definition 8. Time Coverage Rate of SSPTW:The time coverage of, tcr%, indicates the coverage of the SSPTW over the entire time period, which can be expressed as follows:whereis the total length of the time window in whichresides,represents the total time interval of the time series database. In special cases, when, the SSPTW is a traditional, full-time sequence pattern; when, the SSPTW is a part-time sequence pattern. Since the width of the minimum time windowis set in this paper and the following research will also be based on the minimum time window,must satisfy the following restrictions:.
Definition 9. Status Set Sequential Pattern with Time Windows (SSPTW): is SSPTW, if and only if it meets the following constraints:
, , , ;
, that is, and meet the user-defined minimum support ;
,;
,;
,.
Based on the above definition, it is known that an SSPTW’s average support is always not less than the minimum support, and its average confidence is not less than the minimum confidence; otherwise, a sequence with large average support and average confidence cannot ensure that it is an SSPTW.
Definition 10. Coverage Rate (CR) of SSPTW: The coverage of the status set sequence,, can be expressed as: Definition 11. Strong Status Set Sequential Pattern with Time Windows (Strong SSPTW):Thestatus set sequence pattern (SSPTW) is a strongstatus set sequence pattern if and only if, whereis the user-defined minimum coverage threshold (d%), which is assigned the same value as the minimum confidence threshold ( c%).
Definition 12. Factor Set of FSI (FS): The factor set of status itemsetcan be expressed as, for any status set sequence patternin; ifis still a status set sequence pattern, thenis an element of. The factor set of the frequent status itemset can be expressed as:
Theis an element in factor set, and is the subscript of each element.
When it satisfies Formula (2.4), whereis the user-defined minimum factor set ratio,is called the main factor set of frequent status set.
2.2. Periodic Analysis of SSPTW
A periodic status set sequence pattern is a collection of sequence patterns and a series of time windows, such as , where is a periodic time window containing specific events. Key concepts in the mining process of periodic status set sequential patterns are defined below.
Definition 13. Periodic Width, T:This is the user-specified width of a time window that satisfies the periodic pattern, where the periodic width is an integer multiple of the minimum time window, that is,.
Definition 14. Periodic Interval, O:This is a user-defined time window interval that satisfies a periodic pattern, where the periodic interval is an integer multiple of the minimum time window, that is,, where the periodic interval can be determined by experience by day, week, month, year, etc.
Definition 15. Periodic SSP: A status set sequence pattern, such as, when both the periodic widthand periodic intervalare satisfied. The status set sequence pattern is the periodic status set sequence pattern, which is expressed as.
Definition 16. Periodic Time Coverage Rate (PTCR): Assume thatis the number of cycles for a time window that satisfies a periodic SSP and n-p is the number of cycles for a time window that does not meet that periodic SSP.is a periodic SSP with strong regularity if and only ifsatisfies the following periodic time coverage values: Among them,is the user-defined minimum periodic time coverage threshold. When a periodic patternsatisfies bothandconstraint thresholds, but, this is a periodic SSP with weak periodic time coverage rules.
Definition 17. Periodic Analysis of Patterns:Periodic analysis of patterns refers to the mining of repeated status set sequence patterns with periodic regularity in a time series database.
Figure 1 is a framework for SSPMTW. In this paper, the sequential pattern mining model is extended to mine SSP with time windows, strong SSP with time windows, the main factor set of a frequent status itemset with time windows and periodic sequential patterns.