Data-Driven Surveillance of Internet Usage Using a Polynomial Proﬁle Monitoring Scheme

: Control charts, which are one of the major tools in the Statistical Process Control (SPC) domain, are used to monitor a process over time and improve the ﬁnal quality of a product through variation reduction and defect prevention. As a novel development of control charts, referred to as proﬁle monitoring, the study variable is not deﬁned as a quality characteristic; it is a functional relationship between some explanatory and response variables which are monitored in such a way that the major aim is to check the stability of this model (proﬁle) over time. Most of the previous works in the area of proﬁle monitoring have focused on the development of different theories and assumptions, but very little attention has been paid to the practical application in real-life scenarios in this ﬁeld of study. To address this knowledge gap, this paper proposes a monitoring framework based on the idea of proﬁle monitoring as a data-driven method to monitor the internet usage of a telecom company. By deﬁnition of a polynomial model between the hours of each day and the internet usage within each hour, we propose a framework with three monitoring goals: (i) detection of unnatural patterns, (ii) identifying the impact of policies such as providing discounts and, (iii) investigation of general social behaviour variations in the internet usage. The results shows that shifts of different magnitudes can occur in each goal. With the aim of different charting statistics such as Hoteling T 2 and MEWMA, the proposed framework can be properly implemented as a monitoring scheme under different shift magnitudes. The results indicate that the MEWMA scheme can perform well in small shifts and has faster detection ability as compared to the Hoteling T 2 scheme.


Introduction
Statistical Process Control (SPC) is a novel idea with the aim of high-quality process monitoring and control. It has been utilised frequently in industrial processes since the 1920s [1] to monitor numeric or attribute quality characteristics and it includes seven major techniques, the most prominent of which is the control chart. To conduct a process monitoring procedure, two phases (namely, Phases I and II) are defined for each control chart in such a way that Phase I aims to estimate the process parameters from the historical data (off-line monitoring), and Phase II aims to adequately detect unnatural conditions, denoted by Out-of-Control (OC), from the stable or normal situations, denoted by In-Control (IC) as soon as possible. According to Montgomery [2], an IC process has only process-inherent random variability while a process that can be affected by assignable (nonrandom) causes is referred to as an OC situation. So, it could be said that the major aim of Phase II monitoring is about the detection of assignable causes from the process-inherent random ones.
To monitor a process in SPC with control charts, two approaches are usually performed. The first (conventional) one is to consider a univariate or multivariate distribution for a single or multiple quality characteristics. For the second one, the process quality is formulated and monitored by a functional relationship between a response variable and one or more explanatory variables, which is denoted by profile monitoring [3]. It could be said that the major goal of profile monitoring is to check the stability of a specified IC relationship (or profile) over time [4].
In recent decades, several studies have focused on the development of novel control charts in the area of profile monitoring either in Phase I or II. As a few examples of the pioneer works, one can refer to Kang and Albin [5], Kim, et al. [6] and Woodall, et al. [7] in which some basic control charts were developed for the linear profiles whose shape is established by a simple straight line. Although linear profiles are the simplest and the most frequent case [8], other profile shapes may be needed in the complicated process. To name just a few, these are ordinal [9], binary logistic [10,11], Poisson [12], nonlinear [13], complex relationship [14,15] and so forth. In addition to these types, polynomial profiles have been considered in several studies which are in line with the aim of this study.
The polynomial profiles are built in the form of y = b 0 + b 1 x + b 2 x 2 +. . .+ b k x k and were first discussed by Kazemzadeh, et al. [16]. They proposed three control charts for Phase I analysis based on the F-test, Hotelling's T 2 (hereafter T 2 ) and Likelihood Ratio Test (LRT) statistics. Amiri, et al. [17] considered a quadratic (polynomial with order 2) profile type in monitoring the relationship between the torque produced by an engine and the engine speed in revolutions per minute as a case study in the automotive industry. In addition, they discussed the observed challenges of monitoring polynomial profiles in Phase I and II. Zhang, et al. [18] proposed two other modified methods for a quadratic form of polynomial profiles when the IC model was linear. They concluded that T 2 and LRT control charts yielded the same performance. In a similar problem, i.e., changing the linear IC form to a quadratic polynomial profile, Zhang, et al. [19] proposed a new Exponential Weighted Moving Average (EWMA) charting statistic in combination with a score test approach. The simulation results indicated that their proposed approach was robust against the variations of the profiles' shape. Furthermore, a cumulative sum (CUSUM) approach was proposed by Zhang, et al. [20] in similar problem adjustments. Despite the three previous works which were limited to speculation about the quadratic polynomial profiles, Yao, et al. [21] considered the general form of polynomial profiles and compared four common control charts for this situation.
Considering previous research in the area of profile monitoring, it could be observed that the emphasis of most studies has been on the development of new control charts not only in the polynomial model but also in all other profile types, while little attention has been paid to the practical application of the profile monitoring idea. To the best of the authors' knowledge, there is no research about the real application of polynomial profiles, but some sporadic works could be found in other profile types. For instance, to count a few: monitoring of highway safety [22], calibration system [23] and chemical gas sensors [24] with a general linear model, plasma etch process in semiconductor manufacturing with nonlinear profiles [25], social networks and healthcare with a logistic model [26][27][28], engine procedure in the automotive industry with autocorrelated profiles [17] and customer satisfaction with a nonparametric approach [29]. Furthermore, it is noteworthy to mention that there are several practical applications of control charts for monitoring quality characteristics; see, for example, [30][31][32][33].
Similar to the above research, the aim of this study is to propose a practical application of control charts with the idea of profile monitoring in the field of internet usage monitoring. From the (telecommunication) internet companies' point of view, the trend of total internet usage in some specific platforms (e.g., social networks or applications) during different hours in each day is crucial. Identifying different trends and behaviours in internet usage can reveal unnatural conditions and situations as well as lead to new decision and policy making such as offering discounts and markups, changing service tariff and costs, and so forth. Although a few studies investigated internet usage from other practical aspects [34][35][36], there is no monitoring scheme that has been extended yet in this area.
Thus, as the major contribution of this study, a novel monitoring scheme based on the idea of profiles is proposed to monitor the internet usage of a well-known anonymous internet company. By the aim of the proposed method, one can monitor the internet usage in each day and identify the unnatural trends and patterns. The problem definition, assumption, formulation and adjustments are completely described in the next sections. Previous studies usually investigated two common phases, Phase I and Phase II, in the practical application. Recently, Zwetsloot, et al. [37] suggested a more comprehensive SPC framework for practical application with four phases. Based on their idea and as the second major contribution, both phases (i.e., I and II) and some other steps are considered in this paper as a comprehensive framework for implementation of control charts in the area of this problem. It includes different steps such as model estimation, verification, Phase I analysis, Phase II signalling and signal interpretation. Although some other frameworks have been extended for other aims in the SPC domain [17,[38][39][40], the latter are not compatible for the provided real-life data used in this paper. The proposed framework could be applied in real application of not only polynomial profiles, but also other profile types and other SPC monitoring problems.
To sum up, the contributions of this paper can be listed as follows: (i) A practical application of profile monitoring in the area of internet usage monitoring.
(ii) Development of a novel SPC framework for detection of unnatural trends and patterns in some specific platforms (e.g., social networks or applications). (iii) Consideration of both SPC common phases and some other steps including model estimation, verification, Phase I analysis, Phase II signalling and signal interpretation in the proposed framework. (iv) Definition of charting statistics based on the polynomial profile model.
The remainder of the article is organized as follows. Section 2 proposes the problem definition and formulation. Section 3 presents a unified framework based on the SPC approach to solve the problem. The monitoring procedure based on the proposed framework is described in Section 4 and finally, Section 5 concludes the article and provides some direction for future research.

Problem Definition
In some companies such as online shops, hypermarkets, manufacturing plants and so forth, the trend (pattern) of purchase orders, consumption of a specific product or some other quality characteristics at different hours within a day can have several important consequences for the performance of the company. This type of information is very helpful in such companies to arrange the products and personnel, investigate the sales program, provide some hourly discounts and so forth. On the other hand, a change in the pattern of a specific day from the previous days could be caused by some important assignable causes, which may lead to unfavourable events. For example, if it is expected that the purchase orders will decrease with a nearly constant rate in all hours of holidays or weekends, but instead some occurrence of a sharp decrease or increase in a specific hour is observed, then it is sometimes necessary to discover the reason behind that unexpected event.
The above idea is also important in internet telecommunication (telecom) companies which provide data/broadband and internet services. As such, these companies can obtain some beneficial results from monitoring internet usage during different hours of the day. From the latter scenario, the following three major aims are investigated in this paper: (i) First, unnatural patterns, network failure and some other abnormal situations in which the total trend of a day is affected and varied are detected by this procedure. (ii) Identifying the impact of some policies such as providing discounts or markups for specific hours is considered as the second aim.
(iii) Some general social behaviour variations, which could also be considered as customer requirements, are detected by this idea. In this case, some specific products are subject to change and it is important to identify whether this change can affect the overall trend or not.
To reach the above aims, process monitoring is implemented, which usually refers to surveillance of an industrial process or manufacturing lines over time to achieve a high level of final product quality, by defining control limits for the desired quality characteristics and producing signals in the unfavourable situations at every step of the manufacturing process [2]. The SPC techniques, specifically control charts, provide several beneficial tools and techniques to discover unusual variability and prevent the occurrence of these again. Hence, it is necessary to define and utilize a proper control chart in the area of this problem. The simple and conventional approach is to provide some univariate or multivariate control charts to monitor the process at each hour of the day. However, it is a more complex scheme in this area, as a large number of charting statistics and control limits need to be defined [41]. To avoid these challenges, some techniques and tools are required to aggregate all the day's results and monitor the process entirely. For this aim, the profile monitoring idea which monitors a relationship between some explanatory and response variables over time is suggested in this paper. More details about how profile monitoring deals with this situation can be found in Zou, et al. [42] and Yeganeh, et al. [43].
Hence, a well-known anonymous telecom company provided some real-life data to be studied as per the above aims using the idea of profile monitoring. The data includes the internet usage (in terms of GigaBytes (GB)) of more frequently active subscribers in two major social networks, i.e., Instagram and WhatsApp, in each hour of the days between 1 January 2021 and 30 April 2022. So, the dataset of each social network has 485 rows (days) and 24 columns (hours). For better understanding, Figure 1 depicts the profile of internet usage in the first ten days of 2021 for (a) Instagram and (b) WhatsApp. Each line (colour) represents a specific day. As shown, there seem to be low usage of internet between 3:00 and 8:00, while the rate is at the highest volume during 18:00 to 23:00 in all ten days. Considering Figure 1, the polynomial profile models are suitable for the internet usage in each day. For this aim, we followed the approach of Wang, Li, Ma, Song and Wang [29] in which the number of months were assumed as the explanatory variables in such a way that the hours of the day establish the independent variables. Hence, the responses are defined as the usage of internet in a specific hour. For the jth day of the monitoring procedure, the polynomial profile model is as given in Equation (1).   Considering Figure 1, the polynomial profile models are suitable for the internet usage in each day. For this aim, we followed the approach of Wang, Li, Ma, Song and Wang [29] in which the number of months were assumed as the explanatory variables in such a way that the hours of the day establish the independent variables. Hence, the responses are defined as the usage of internet in a specific hour. For the jth day of the monitoring procedure, the polynomial profile model is as given in Equation (1).
As the explanatory variables are hours, they are fixed (constant) in all the profiles and it could be written that x i = 1, 2,. . ., 24. In other words, it is not necessary to use a time index (here j) for the explanatory variable [44]. Furthermore, it is assumed that the error terms are completely independent, that is, there is no between and within autocorrelated error terms. At the end of the jth day, there are 24 values (n = 24) for the internet usage at each hour so that the parameters of the model in Equation (1)  [â 0j ,â 1j , . . . , In Equation (2), the fixed explanatory variable is a 24 × (k + 1) matrix and the response variable is a 24 × 1 vector. Furthermore, the error variance is estimated by the Mean Squared Error (MSE) of the residuals as follows [42]: To solve this problem, the proper polynomial model is first assigned on the data to reach the IC model and then, the stability of this model is checked over time by the idea of profile monitoring and control chart theory. In fact, the IC regression model parameters in Equation (1), denoted by 0 index hereafter, including a 00 , a 10 ,. . ., a k0 and σ 0 2 , are monitored over time and triggering a signal on the jth day is equivalent to changing the IC model to OC, or the occurrence of a new pattern in internet usage on the jth day. Considering this point, the response variables do not directly monitor the process and it does not matter in this approach whether the internet usage is located in a specific range in different hours, but it is necessary that the IC polynomial predefined model has a constant form at the end of each day over time. Therefore, when the internet usage at a specific hour changes from the previous day's pattern, and if the usage in other hours also changes according to a predefined IC polynomial model in the way that the IC model remains constant, the process is still IC; otherwise, an OC signal is triggered by this approach.

The Proposed Framework
As a general SPC approach, previous historical data are used to establish a control chart in Phase I and estimate the IC model. The information of m previous days, with the most common range being between 20 and 30, is also considered as the Phase I (reference) data in this study (it is usually denoted by the number of subgroups in Montgomery [2], hence we can say there are m subgroups). The value of m is considered as a hyper parameter of the model and could be obtained based on expert judgment or some other criteria. On the other hand, it is necessary to verify the polynomial model assumptions before Phases I and II analyses. Except the work presented by Amiri, Jensen and Kazemzadeh [17] and Zwetsloot, Jones-Farmer and Woodall [37], there is no other research on model verification in the profile monitoring area. Five steps are defined in the proposed approach to reach a suitable decision about the process condition; these are model estimation, model verification, Phase I analysis, Phase II analysis and signal interpretation. These steps could be applied not only in polynomial profiles but also in other profile types. The details of the five proposed steps are given in the following subsections.

Model Estimation
In the first step, the polynomial model is fitted on each subgroup and the coefficients as well as the residuals of m profiles are estimated considering a specific value of k. Hence, the residual and coefficient matrices have the form m × n and m × (k + 1), respectively.

Model Verification
The error terms of the regression model in Equation (1) contain some principal assumptions; those are as follows: (i) Following normal distribution; (ii) Equality of variances; (iii) Lack of any autocorrelation.
The importance of these assumptions is due to the validation of OLS estimators under the normal distribution and independence of error terms. So, some requirements are necessary to ensure proper design principals under the prepared real data. The following checklist and solutions are followed in this framework:

•
The residuals of m profiles are aggregated (to have an m × n matrix) and the normality assumption is checked by the well-known Kolmogorov-Smirnov test. To modify the non-normal data, which is equal to the acceptance of an alternative hypothesis, Montgomery [2] suggested the use of transformation on the raw data. Following this idea, the responses (internet usage values) are transformed as y* = y s , where (0 < s < 1).
For different values of s (from greater than 0 to lower than 1), the polynomial model is fitted on the data, and the normality of residuals is investigated to reach the best value of s based on the model's accuracies.

•
The equality of variance between the samples is checked by the Levene test computed by performing analysis of variance (ANOVA) on the squared deviations of the residual values from other samples means. It is expected to reach large p-values for the equality of variances, whereas transformation on response values could also be modified this way.

•
The existence of autocorrelation may occur between or within the profiles. These two situations were completely investigated by Noorossana, et al. [45] and Soleimani, et al. [46], in which the autocorrelation coefficients were denoted by φ and ρ, respectively. Furthermore, sometimes both of them (between and within correlation) may happen at the same time; this scenario has been discussed by Ahmadi, et al. [47]. Note that the observations of different lags are possible when studying the autocorrelation effect. So, a broad range of situations should be checked, and it is worth mentioning that this is a very time-consuming procedure. Considering the above references, it is suggested that the first order autocorrelation (i.e., lag = 1) is the most probable condition and thus, we only consider it in this paper. On the other hand, there is no general approach or hypothesis test for checking the autocorrelation in all the profiles simultaneously in the current Phase I literature.

•
In line with this aim, the first order autocorrelation effect is checked on the rows and columns of the m × n obtained residual matrix to reach the m-variate vector [φ 1 , φ 2 ,. . ., φ m ] and the n-variate vector [ρ 1 , ρ 2 ,..., ρ n ]. Then, their average could be the indicator of existence of autocorrelation ∅ between and within profiles, which is computed . Based on the results of Noorossana, Amiri and Soleimani [45] and Soleimani, Noorossana and Amiri [46], the effect of autocorrelation can be disregarded in case of absolute values lower than 0.3 values for -∅ and ρ. When there is autocorrelation in the model, Amiri, Jensen and Kazemzadeh [17] suggested to use a Linear Mixed Model (LMM). It generates some new coefficients for the model and the effect of autocorrelation could be reduced or removed. Another possible approach is to employ the proposed statistics by Noorossana, Amiri and Soleimani [45], Soleimani, Noorossana and Amiri [46] and Ahmadi, Yeganeh and Shadman [47] for Phase I analysis, depending on the type of autocorrelation. The best polynomial model is obtained by checking the order of different models, i.e., the determination of k in Equation (1), and yields to the satisfaction of the above three conditions. It is naturally an iterative procedure which starts from k = 2 and is terminated when the best value of k is reached. Note that the value of k is identical for all the m profiles. At the end of this step, there is a matrix of coefficients with dimension m × (k + 1).

Phase I Analysis
As mentioned, the principal aims of Phase I analysis are to check the stability of the process, estimate the parameters, remove the outliers, ensure the distribution of the data and check the false alarm rate. Kang and Albin [5] and Yeh, et al. [48] suggested the T 2 control chart for Phase I profile monitoring analysis due to the existence of large shifts. Furthermore, a proper power and detection ability were reported for this approach in Kazemzadeh, Noorossana and Amiri [16]. Different extensions of the T 2 control chart have been proposed based on different estimations of the variance-covariance matrix, one of which has been extended by Williams, et al. [49]. This approach is also applied here by defining the charting statistic for the hth profile as outlined in Equation (4).
Note that in this paper, the index of each profile is indicated by h in Phase I, and by j in Phase II. In Equation (4), the estimation of the variance-covariance matrix is computed as shown in Equation (5).
To obtain the control limit, the exact distribution of T 2 h is needed; while it may not be known, some approximations have been proposed. For example, Williams, et al. [50] showed that χ 2 (k+1),α could be considered as a proper control limit (α is determined by the user specifications). By this approach, all the m charting statistics are compared with χ 2 (k+1),α and the profiles with charting statistic greater than χ 2 (k+1),α , i.e., T 2 h > χ 2 (k+1),a are removed and the procedure is iterated until there are no OC profiles. At the end of the Phase I analysis, the IC process parameters including a 00 , a 10 , . . ., a k0 , φ 0 , ρ 0 , σ 0 and the variance-covariance matrix (S 0 ) are estimated based on the averaging of the remaining profiles. Note that the 0 index indicates the IC condition.

Phase II Analysis
As mentioned in the literature review, three general categories were extended in polynomial profiles entailing (i) T 2 -, (ii) LRT-and (iii) EWMA-based methods. Our simulations revealed similar performance between the LRT and EWMA approaches (the results can be given to the interested readers), so it is neglected here. Two other groups were chosen for the Phase II monitoring aim. As the shift in magnitude in Phase II is usually lower than that of Phase I, the T 2 statistic proposed by Williams, Woodall and Birch [49] is not efficient. Yao, Li, He and Zhang [21] extended a modified version of the T 2 control chart for Phase II analysis in polynomial profiles. In this approach, the variance-covariance matrix (S) is modified based on the idea of Kang and Albin [5]. To compute the charting statistics in the jth profile in Phase II, the parameters are first estimated using the OLS approach (β j ) and then the charting statistic (T 2 j ) are computed as shown in Equation (6).
In Equation (6), δ 0 is the IC standard deviation of errors which is estimated in Phase I analysis using Equation (3). The OC signal is triggered when T 2 j is greater than UCL T 2 . As the distribution of T 2 j is not known in Phase II and the proposed control limit by Williams, Woodall, Birch and Sullivan [50] is suitable for Phase I, UCL T 2 is computed by a Monte Carlo simulation. For this aim, it is common to define a new criterion denoted by Average Run Length (ARL), which means, on average, the number of required samples to reach a signal. When the simulation studies are performed using IC profiles, then it is denoted as ARL 0 . It is expected to be reached as a target by the adjustment of UCL T 2 [51]. To better clarify, the following Algorithm 1 is suggested to obtain UCL T 2 . Following previous works such as those of Kang and Albin [5] and Yeganeh, et al. [52], in this paper, we also set ARL 0 at 200. Algorithm 1. Procedure of computations of UCL T 2 based on the desired ARL 0 Maxiteration = 10,000; Set UCL T 2 to an approximate value; while ARL 0 is not equal to desired ARL 0 Let ARL = []; for t = 1:Maxiteration do j = 0; while (charting statistic < UCL T 2 ) Generate random IC profile; Compute charting statistic; j = j + 1; end while Store the signalling time (j) in ARL; end for; % ARL would be a 1 × 10,000 vector after this procedure However, the T 2 control chart is not solely sufficient for Phase II analysis due to its weakness in detection of small shifts. Several authors such discussed this issue and suggested the use of EWMA control charts [5,6]. To remedy this challenge, the Multivariate EWMA (MEWMA) approach, proposed by Zou, Tsung and Wang [42], is also applied in Phase II monitoring of polynomial profiles. The comprehensive details of this scheme were discussed in Yeganeh, Shadman and Amiri [52]. Similar to UCL T 2 , UCL MEWMA is also determined based on Monte Carlo simulations. To construct the MEWMA charting statistic, the coefficients of the jth generated polynomial profile is first scaled as follows: where Φ −1 (.) is the inverse of the standard normal cumulative distribution function, and F(.,n) is the chi-square cumulative distribution function with n degrees of freedom. The (k + 2)-variate vector Z j , which concatenates Z j (β) and Z j (σ) as Z j = [Z j (β),Z j (σ)], is a multivariate normally distributed with mean vector 0 and covariance matrix Σ m = X X −1 0 where Σ m is a (k + 2) × (k + 2) square matrix. Eventually, W j is defined as the MEWMA of the vector Z j from the first to the jth profile: where θ is the EWMA smoothing parameter (here it is assumed to be fixed and equal to 0.2) and W 0 is a (k + 2)-dimensional vector (usually equal to the mean value of Z j in the IC condition). The MEWMA chart signals if When there is autocorrelation effect in Phase II, the LMM approach, as well as the proposed statistics by Noorossana, Amiri and Soleimani [45], Soleimani, Noorossana and Amiri [46] and Ahmadi, Yeganeh and Shadman [47] can be utilized to remove the autocorrelation effect; note that this is similar to the Phase I analysis for the T 2 control chart. On the other hand, several researchers, such as Sheu and Lu [53] and Li, et al. [54], studied the robustness of EWMA control charts with the effect of autocorrelation in the Phase II analysis so that the MEWMA approach [42] can be directly employed with autocorrelated data. More discussions on the latter are presented in Appendix A.

Signal Interpretation
A Phase II signal in an industrial process is the indicator of an unnatural situation and it is highly recommended to stop the process after an OC signal to find and remove the assignable causes. As mentioned in Section 2, three aims are defined for the internet usage monitoring in this paper. In line with these aims, each OC signal is investigated and discussed to reach proper justifications and conclusions.
For better understanding, the above procedures and steps are summarized as a comprehensive framework in Figure 2. In this framework, the values of m (number of subgroups in Phase I) and ARL 0 are assigned by the user.

Analysis of the Practical Implementation of the Proposed Framework
In this section, the proposed framework in Figure 2 is investigated through a telecom company's real internet usage data. After several discussions with the company's experts about the monitoring challenges and considering the predefined monitoring aims in Section 2, three different scenarios (timestamps) of internet usage are defined to better illustrate the practical application of the proposed methods. In these scenarios, an OC situation occurred in a specific time which is known to us. We want to demonstrate that had the proposed framework been used at that time, it would have identified this unnatural event by triggering an OC signal. To provide a realistic condition, we adjust the Phase I analysis before the change in a way that some of the days in the beginning of Phase II would also be IC. Using this approach, the ability of the proposed framework in the detection of false

Analysis of the Practical Implementation of the Proposed Framework
In this section, the proposed framework in Figure 2 is investigated through a telecom company's real internet usage data. After several discussions with the company's experts about the monitoring challenges and considering the predefined monitoring aims in Section 2, three different scenarios (timestamps) of internet usage are defined to better illustrate the practical application of the proposed methods. In these scenarios, an OC situation occurred in a specific time which is known to us. We want to demonstrate that had the proposed framework been used at that time, it would have identified this unnatural event by triggering an OC signal. To provide a realistic condition, we adjust the Phase I analysis before the change in a way that some of the days in the beginning of Phase II would also be IC. Using this approach, the ability of the proposed framework in the detection of false alarms (i.e., triggering an OC signal when in fact the process is IC) is also evaluated. The following subsections provide the details about each scenario. In all the scenarios, the values of m, α and ARL 0 are set at 20, 0.95 and 200, respectively.

Scenario I: Detection of Unnatural Patterns in Internet Usage
This subsection aims to investigate the unnatural pattern detection ability of the proposed method. Due to some repairs and maintenances actions in the network, there were some network problems in the internet usage from 24 September 2021 until 15 days later. Naturally, the internet usage profile was different during this interval. For brevity and the similarity of conclusions, the results of this subsection are only based on Instagram data and the results of WhatsApp are neglected.
Suppose the current time is 3 September 2021 and m is set at 20 so Phase I data involves 15 August-3 September 2021. Figure 3a depicts the profile of Instagram internet usage in each hour of the day in Phase I. It is obvious that each line depicts a specific day in this period. Considering steps 1 and 2 in the framework, i.e., model estimation and verification (Sections 3.1 and 3.2), the proper value of k is obtained as 8. Table 1 Table 1, it could be concluded that the normal distribution is not followed in each situation when the raw values used as the p-values of the Kolmogorov-Smirnov test (p ks ) are near to 0. Having transformed the raw values by s = 0.5, we were able to reach normal error terms in k = 8 and 9. Due to the lower complexity, the value of k is assigned as 8. In this situation, the autocorrelation effects are neglected; hence, the absolute values of -∅ and ρ are lower than 0.3 (it is equal to φ 0 = ρ 0 = 0). It is also obvious that there is no problem regarding the equality of variance based on the Levene's test results (the values of p l are greater than 0.05 in all cases except k = 7).
In the next step (Phase I analysis, i.e., Section 3.3), considering Equation (4), the charting statistics (T 2 h ) are computed until there are no outliers in the data. For this scenario, it needed three iterations to have no outliers, which is depicted in Figure 4. In the first iteration (Figure 4a), the 14th sample exceeded the control limit (χ 2 (9,0.95) = 16.92) and was removed from the Phase I data. Then ( Figure 4b) the Phase I computations were continued by m = 19, which led to another outlier and finally, in the third iteration, all the samples (m = 18) were in IC (Figure 4c). After this procedure, the IC parameters were estimated by the remaining eighteen profiles. The IC model was obtained as y ij = 18.34 where the standard deviation of errors is estimated as 1.38. Furthermore, the variance-covariance matrix is a 9 × 9 matrix which is not reported, for brevity. From Table 1, it could be concluded that the normal distribution is not followed in each situation when the raw values used as the p-values of the Kolmogorov-Smirnov test (pks) are near to 0. Having transformed the raw values by s = 0.5, we were able to reach normal error terms in k = 8 and 9. Due to the lower complexity, the value of k is assigned as 8. In this situation, the autocorrelation effects are neglected; hence, the absolute values of ∅ and ρ are lower than 0.3 (it is equal to φ0 = ρ0 = 0). It is also obvious that there is no problem regarding the equality of variance based on the Levene's test results (the values of pl are greater than 0.05 in all cases except k = 7).
In the next step (Phase I analysis, i.e., Section 3.3), considering Equation (4), the charting statistics ( ) are computed until there are no outliers in the data. For this scenario, it needed three iterations to have no outliers, which is depicted in Figure 4. In the first iteration (Figure 4a), the 14th sample exceeded the control limit (χ 2 (9,0.95) = 16.92) and was removed from the Phase I data. Then (Figure 4b) the Phase I computations were continued by m = 19, which led to another outlier and finally, in the third iteration, all the samples (m = 18) were in IC (Figure 4c). After this procedure, the IC parameters were estimated by the remaining eighteen profiles. The IC model was obtained as yij = 18.34 + 3.52xi − 6.14xi 2 − 1.89xi 3 − 0.27xi 4 + 0.02xi 5 − 0.001xi 6 + 0.000025xi 7 − 0.00000026xi 8 + εij, where the standard deviation of errors is estimated as 1.38. Furthermore, the variance-covariance matrix is a 9 × 9 matrix which is not reported, for brevity.   To start the Phase II analysis, UCL T 2 and UCL MEWMA are calculated by simulations as 23.59 and 2.68, respectively. As the network problems, which are usually referred to as OC shift in the SPC literature, had happened on 24 September 2021, there are 20 natural (IC) days in Phase II, and it was not expected to have any signal during the period from 4 to 23 September 2021. The profile of these 20 IC samples (days) are shown in Figure 3b. It could be inferenced from Figure 3a,b that the Phase I and IC Phase II samples provided similar profiles.
To detect unnatural conditions, two control charts were used to monitor the internet usage in Phase II and these are depicted in Figure 5. As an interesting result in Figure 5a, the T 2 control chart had very weak performance in such a way that it triggered an OC signal with a 132-day delay. As mentioned in the initial part of this subsection, the problem in the network (or equivalent OC condition) lasted about 15 days, so it could be said that the T 2 control chart was not able to identify this situation. On the other hand, the MEWMA chart only needed 7 days to reach an OC signal (i.e., 1 October 2021) which is completely superior to the T 2 control chart. The increasing pattern in MEWMA statistics after the occurrence of the change (i.e., 24 September 2021), which is obvious in Figure 5b, indicates that the exact time of the shift was nearly seven days later. The reason behind this inference is related to another SPC field denoted as the change point estimation. As it is beyond the scope of this paper, interested readers are referred to Holland and Hawkins [55] and Montgomery [2].
Mathematics 2023, 11, x FOR PEER REVIEW 15 of 26 this inference is related to another SPC field denoted as the change point estimation. As it is beyond the scope of this paper, interested readers are referred to Holland and Hawkins [55] and Montgomery [2]. As the T 2 control chart is weak in the detection of small shifts, it could be stated that the applied magnitude shift on 24 September 2021, due to a network problem, was small. On the other hand, the MEWMA approach, which has superior ability in small shifts, identified the OC situation after 7 days. To better show this assertion, the Instagram internet usage profiles of these 7 days (i.e., 24 September-1 October 2021) are illustrated in Figure 3c. Some differences in the pattern of this period can be seen comparing two other As the T 2 control chart is weak in the detection of small shifts, it could be stated that the applied magnitude shift on 24 September 2021, due to a network problem, was small. On the other hand, the MEWMA approach, which has superior ability in small shifts, identified the OC situation after 7 days. To better show this assertion, the Instagram internet usage profiles of these 7 days (i.e., 24 September-1 October 2021) are illustrated in Figure 3c. Some differences in the pattern of this period can be seen comparing two other IC situations (Figure 3a,b) which could be an indicator of a small shift. However, decision making about the IC and OC situations with only the information from plotting the profiles is not an easy task, and a scientific and statistical approach is needed to provide acceptable results. As the last step, the provided signal should be interpreted by an expert to find the main reasons for its occurrence. That said, we know now it was caused by some repairs and maintenances actions in the network. Generally, the discussions and conclusions of this subsection show that the proposed framework is a proper choice for the identification of the unnatural patterns in the monitoring procedure of internet usage.

Scenario II: Detection the Effect of Discount Policy
The second scenario investigates the effect of a discount policy in the internet usage pattern using the proposed framework. The company had suggested an 80% discount on the internet prices due to holidays on 12-26 March 2022 between 0:00 and 7:00. So, it is expected to see a tangible increase in these hours' usage on the given days and also some changes in the patterns of other hours. The aim is to find how and when the proposed framework identifies this situation. Additionally, dealing with the autocorrelation effect is investigated. Similarly, to the previous subsection, the results of WhatsApp are here neglected for brevity.
Suppose the current time is 6 March 2022 and m is set at 20, so Phase I data involves 15 February-6 March 2022. The profiles of these twenty days are shown in Figure 6a. Obviously, there are five IC days in Phase II (7-11 March 2022) which are depicted in Figure 6b. Finally, the discount times (12-26 March 2022) in which it is supposed to reach an OC signal as soon as possible are illustrated in Figure 6c. The effect of the discount is obvious in Figure 6c, especially between 3:00 and 7:00, as the internet usage has almost quadrupled.
It is again necessary to transform the data due to the non-normality of the raw values; for example, Figure 7 depicts the histogram of the residual of the first ten profiles, which is an indicator of the existence of the skewness in all subgroups. After transforming the values, p K and p L were calculated as 0.22 and 0.51, respectively (the best value of k was again 8). However, the values of -∅ and ρ were computed as 0.38 and 0.06, respectively, which indicates the between-profile autocorrelation, and thus the charting statistics in Section 3.3 are not appropriate. Figure 7 depicts a matrix of the residual of the first ten Phase I profiles as the scatter plots for each pair (the diagonal histograms are depicted based on the residual of raw values of each profile). The first lag autocorrelation and also non-normality of the raw values' residuals can be clearly observed in Figure 7. It is again necessary to transform the data due to the non-normality of the raw values; for example, Figure 7 depicts the histogram of the residual of the first ten profiles, which is an indicator of the existence of the skewness in all subgroups. After transforming the values, pK and pL were calculated as 0.22 and 0.51, respectively (the best value of k was again 8). However, the values of ∅ and ρ were computed as 0.38 and 0.06, respectively, which indicates the between-profile autocorrelation, and thus the charting statistics in Section 3.3 are not appropriate. Figure 7 depicts a matrix of the residual of the first ten Phase I profiles as the scatter plots for each pair (the diagonal histograms are depicted based on the residual of raw values of each profile). The first lag autocorrelation and also non-normality of the raw values' residuals can be clearly observed in Figure 7. To remedy this challenge, we first tested the LMM approach implemented by Amiri, Jensen and Kazemzadeh [17] but it was not able to remove the autocorrelation effect. Then, we utilized the approach proposed by Noorossana, Amiri and Soleimani [45]. As it was developed for Phase II analysis, we applied some modification to be able to implement it in Phase I. For further illustration, the details are provided in Appendix A. Considering the formula provided in Appendix A, the UCL was obtained as χ 2 (24,0.95), which is equal to 36.42 and the charting statistics were computed as shown in Figure 8. From that, it could be concluded that all the profiles were IC and the parameters could be estimated. The IC model was obtained as yij = 18.11 + 5xi − 6.83xi 2 + 2xi 3 − 0.28xi 4 + 0.02xi 5 − 0.001xi 6 + 0.000025xi 7 − 0.00000025xi 8 + εij, and the standard deviation of errors is estimated as 1.55. Furthermore, the variance-covariance matrix is a 9 × 9 matrix which is not reported, for brevity. To remedy this challenge, we first tested the LMM approach implemented by Amiri, Jensen and Kazemzadeh [17] but it was not able to remove the autocorrelation effect. Then, we utilized the approach proposed by Noorossana, Amiri and Soleimani [45]. As it was developed for Phase II analysis, we applied some modification to be able to implement it in Phase I. For further illustration, the details are provided in Appendix A. Considering the formula provided in Appendix A, the UCL was obtained as χ 2 (24,0.95) , which is equal to 36.42 and the charting statistics were computed as shown in Figure 8. From that, it could be concluded that all the profiles were IC and the parameters could be estimated. The IC model was obtained as y ij = 18.11 + 5x i − 6.83x i 2 + 2x i 3 − 0.28x i 4 + 0.02x i 5 − 0.001x i 6 + 0.000025x i 7 − 0.00000025x i 8 + ε ij , and the standard deviation of errors is estimated as 1.55. Furthermore, the variance-covariance matrix is a 9 × 9 matrix which is not reported, for brevity. For Phase II analysis, the charting statistic of Noorossana, Amiri and Soleimani [45] was utilized as the T 2 control chart to consider the autocorrelation effect. Due to the robustness of EWMA control charts against the autocorrelation effect in Phase II analysis, we also used the MEWMA approach, proposed by Zou, Tsung and Wang [42], in this section with consideration of autocorrelation in the parameter estimation as discussed in Ap-  For Phase II analysis, the charting statistic of Noorossana, Amiri and Soleimani [45] was utilized as the T 2 control chart to consider the autocorrelation effect. Due to the robustness of EWMA control charts against the autocorrelation effect in Phase II analysis, we also used the MEWMA approach, proposed by Zou, Tsung and Wang [42], in this section with consideration of autocorrelation in the parameter estimation as discussed in Appendix A. For this aim, the UCL T 2 and UCL MEWMA are calculated using simulations as 45.56 and 1.78, respectively. Figure 9 shows the charting statistics of each control chart from the beginning of Phase II until the end of the discount time (7-26 March 2022). The yellow vertical lines are the start of the discount. As it can be seen, both charts did not provide a false alarm in the first five days (7-11 March 2022). On the other hand, T 2 control charts had a better performance as they could detect the OC situation in the first day of the discount, but the MEWMA chart needed two days for a signal. It could be inferenced that a shift with a large magnitude occurred in the profile parameters after the beginning of the discount, as the T 2 control chart could detect it earlier. Due to the recursive effect in the MEWMA statistic, we can see the rise in the U j values during the discount period.

Scenario III: Detection of Social Behavior Variations
Due to some protests in the company's country, the connectivity to some major social networks such as Telegram was lost between 24 February and 1 March 2021. On the other hand, the users were able to connect to WhatsApp without any filtering; this caused a tangible increase in the WhatsApp messages during these days. The difference between this situation and Scenario I (in Section 4.1) is that only one platform changed in Scenario III while all of them were influenced in Scenario I. The aim of this subsection is to illustrate

Scenario III: Detection of Social Behavior Variations
Due to some protests in the company's country, the connectivity to some major social networks such as Telegram was lost between 24 February and 1 March 2021. On the other hand, the users were able to connect to WhatsApp without any filtering; this caused a tangible increase in the WhatsApp messages during these days. The difference between this situation and Scenario I (in Section 4.1) is that only one platform changed in Scenario III while all of them were influenced in Scenario I. The aim of this subsection is to illustrate the ability of the proposed framework to detect this social behaviour variation. Some of the similar results are not given in this subsection for the sake of brevity.
Suppose that the current time is 20 February 2021 and m is set at 20 so Phase I data involves 1-20 February 2021. The profiles of these twenty days are shown in Figure 10a. Three days in Phase II did not have any change (21-23 February 2021), which are depicted in Figure 10b. Finally, the period with a sudden increase in the WhatsApp internet usage (24 February-1 March 2021) is illustrated in Figure 10c; for example, the maximum usage between 22:00 and 23:00 was about 100 GB on 1 March, while the average usage in this hour was about 60 GB in Phase I, i.e., 1-20 February 2021. By a similar procedure as Table 1, using s = 0.55, it was possible to reach a verified model with k = 5, p K = 0.06, p L = 0.95, -∅ = −0.07 and ρ = −0.15. So, the autocorrelation effect was neglected in the subsequent computations. Then, the Phase I analysis was carried out with two iterations and the IC model was obtained as y ij = 9.12 − 4.19x i − 0.73x i 2 − 0.05x i 3 + 0.002x i 4 − 0.00002x i 5 + ε ij , and the standard deviation of errors was estimated as 1.12. Furthermore, the variance-covariance matrix is a 9 × 9 matrix which is not reported, for brevity.
For Phase II analysis, the UCL T 2 and UCL MEWMA were calculated using simulations as 18.54 and 2.18, respectively. Then, the charting statistics were computed for both approaches from the beginning of Phase II until twenty days later. The results are reported in Table 2. Both control charts could detect the unnatural situation as soon as it happened. The T 2 control chart retrieved from OC to IC on 5 March 2021 in which the WhatsApp internet usage was normal, while the MEWMA scheme still showed the OC situation until 13 March 2021. As the MEWMA statistic stores the previous points information, it needs more time to come back to the IC condition. To avoid false alarms, one can reset the initial parameters of MEWMA schemes (i.e., W 0 in Equation (8)) which is not in line with the aim of this study [56]. Another useful solution may be the use of smaller smoothing parameters for the MEWMA approach (θ in Equation (8)). Montgomery [2] evaluated the performance of the EWMA control chart under different smoothing parameter values. Knowing the magnitude of the OC shift, we can select the proper value for θ. The very large charting statistics in OC profiles are an indication of large shifts in the OC situation which can also be found by comparing different panels in Figure 10. The results of the three scenarios show that the proposed framework can be considered as a practical application in the monitoring of unnatural situations and events in telecom companies. Using this procedure, it is not needed to monitor each hour of the day separately, as it incorporates all of them as a charting statistic. Due to the extension of the profile monitoring statistic, an acceptable detection ability is achieved such that the OC situations in all the scenarios are detected very quickly. Furthermore, consideration of non-normal and autocorrelated conditions provide a very powerful and accurate framework under different model assumptions.

Concluding Remarks
Although several novelties and ideas have been extended in the area of profile monitoring, papers about the practical application of profile monitoring are scarce. In line with this knowledge gap, in this paper, on the basis of a profile monitoring idea, a novel practical monitoring framework is proposed which includes five steps: (i) model estimation, (ii) model verification, (iii) Phase I analysis, (iv) Phase II analysis, and finally (v) signal interpretation. In this study, effort is made to consider different practical challenges and assumptions, such as the normality of data, the existence of autocorrelation, etc., and to provide useful solutions for them. The practical implementation of the proposed framework is demonstrated on the internet usage monitoring of telecom companies which has not been investigated in the literature. These companies encounter three major monitoring challenges entailing (i) detection of unnatural patterns, network failure and other abnormal situations in which the total trend of a day is affected and varied, (ii) identifying the impact of policies such as providing discounts or markups for specific hours and (iii) investigation of general social behaviour variations which could also be considered as the customer requirement. For this aim, a polynomial profile model between the hours of each day and the internet usage is defined and monitored over time. The proposed framework is applied to real historical data in three different timestamps and its monitoring ability is investigated. The results show that it can detect unnatural situations very quickly under the proper adjustments and setups.
As this paper proposes a general applicable framework, it is not limited to polynomial profiles and it can be implemented in other real applications with different IC models. Finally, future studies can apply other Phase I and II control charts under the proposed framework and compare them with this approach. The major limitation of this study relates to the violation of predefined assumption such as normality, independence and so forth. In a situation such as the latter, one useful solution is to extend the non-parametric methods in this area of study. Several non-parametric control charts have been proposed and the use of them in this area can be considered as a nice future research direction.

Data Availability Statement:
The data of this study can be given to the interested researchers upon request.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. The Proposed Charting Statistic by Noorossana, Amiri and Soleimani [45]
In this approach, it is assumed that profiles are not independent from each other over time. This dependence exists in successive profiles in such a way that the autocorrelation structure follows a first-order autoregressive model with a known coefficient φ between the profiles. Furthermore, the IC simple linear model parameters are shown by A 0 , A 1 and σ 0 2 which are denoted by intercept, slope and error variance. For the jth generated profile in Phase II, the estimated response variables and errors are obtained based on the previous profile responses asŷ Then, the charting statistic is defined as where I represents the n × n identity matrix and e j indicates the 1 × n vector of error terms. The UCL of the proposed statistic is computed by χ 2 n,α . Furthermore, the estimation of error variance is different from Equation (3), where for the jth generated profile, it is computed as follows: