Prognostic Value of Scoring and 0-Upcrossing in Statistical Quality Control

Pestana, Dinis; Rocha, Maria Luísa

doi:10.3390/appliedmath6050077

Open AccessArticle

Prognostic Value of Scoring and 0-Upcrossing in Statistical Quality Control

by

Dinis Pestana

^1,2,*

and

Maria Luísa Rocha

^1,3

¹

Centro de Estatística e Aplicações, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal

²

Instituto de Investigação Científica Bento da Rocha Cabral, Calçada Bento da Rocha Cabral 14, 1250-012 Lisboa, Portugal

³

Centro de Estudos de Economia Aplicada do Atlântico, Faculdade de Economia e Gestão, Universidade dos Açores, Rua da Mãe de Deus, 9500-321 Ponta Delgada, Portugal

^*

Author to whom correspondence should be addressed.

AppliedMath 2026, 6(5), 77; https://doi.org/10.3390/appliedmath6050077

Submission received: 2 April 2026 / Revised: 4 May 2026 / Accepted: 7 May 2026 / Published: 12 May 2026

Download

Browse Figures

Versions Notes

Abstract

Rising temperatures in industrial processes are a serious alert that the system can be shifting from an In Control (InC) to an Out of Control (OutC) state, causing waste, financial losses and, eventually, disaster. Consultation in a case study analyzing the Statistical Quality Control (SQC) routines in a potato chip factory revealed that laymen dealing with data may naively spoil and misuse traditional SQC tools, downgrading the interval-scale temperature data to a simple nominal classification, true or false Negative (N) or Positive (P) symptoms that the production line is InC or OutC. Appropriate scores, negative for true N and false P, and positive for false N and true P, were designed so that their moving averages upcrossing 0 detect clusters of suspicious temperature deregulation, in order to effectively salvage the InC/OutC prognostic value of data.

Keywords:

Statistical Quality Control; change-point detection; classification evaluation; messy data; data wrangling; moving averages; upcrossings

1. Introduction

Statistical Quality Control (SQC) is essential for economy; namely, in terms of competition and leadership in industry and management. The majority of enterprises, including medium- and small-sized companies, invest in SQC, recognizing its importance for success.

The present report is the result of a consultation with a leading fried potato manufacturer. A new production line, starting operation in October 2024, had many in-built automated feedback and control features; namely, the means to retrieve frying-oil-temperature readings every 15 s.

Oil temperature is, obviously, a random variable since the introduction of a new batch of raw chips instantaneously lowers the temperature, but this triggers a response from the rheostats, increasing the Joule effect to attain the desirable frying-oil temperature.

Industry guidelines for the large-scale production of potato chips recommend a range of [175–185 °C], and this is also indicated by The Food Standards Agency (UK). Based on previous experience, the factory had a conservative target of 180° ± 4°. Very seldom, consistently low temperatures can compromise crispness, de-oiling and seasoning. But, on the other hand, with rheostat deregulation, overheating may occur, causing starch burning and spoiling several chip batches. Temperatures in the range [176–184 °C] were, therefore, considered negative (N) symptoms, and temperatures <172 °C (underheating) or >188 °C (overheating) were considered positive (P) indicators that the system was deregulating. Temperatures in the ranges [172–176 °C] and [184–188 °C] were considered sophisticated “fuzzy” treatment, with 175 °C and 185 °C (from the industry guidelines) acting as possible change points, implying that, for instance, for overheating, temperatures in the range [184–185 °C] could be classified either as N or P, and in the range [185–188 °C] as P or N. See details in Section 2.

The factory SQC team members used this Gaussian-based nominal classification in true and false negatives (TN and FN) or in false and true positives (FP and TP) to monitor the state of the system, disregarding the quantitative interval-scale data.

In fact, the resulting nominal data were no longer fit for SQC traditional analysis; for instance, in foolproof control charts, as described in the classic Montgomery [1] or Aslam [2]. The main issue, (mis)classification of InC/OutC, with this messy data, was subject to confusion; namely, the fuzzy classification TN/FN of temperatures in the range

(184, 185]

and TP/FP in the range

(185, 188]

was ambiguous, implying that the fuzzy tools recommended by Hryniewicz [3] should not be used in view of the poor accuracy levels of temperature classification in the nominal classes TP, FP, TN, and FN. See Stehman [4], Ting [5], Tharwat [6] or Opitz [7].

Inappropriate recording and monitoring of the data can result in high losses or even in catastrophic disasters, such as what happened with the Chernobyl nuclear power plant. De Veaux and Hand [8] refer to the fact that “Anyone who has analyzed real data knows that the majority of their time on a data analysis project will be spent ‘cleaning’ the data before doing any analysis. Common wisdom puts the extent of this at 60–95% of the total project effort, and some studies […] suggest that ‘between one and ten percent of data items in critical organizational databases are estimated to be inaccurate’”. Further, they state that the “claims by software vendors that their techniques can produce valid results no matter what the quality of the incoming data” is preposterous, deserving the celebrated Sir Ronald Fisher statement that “To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he only may be able to say what the experiment died of”.

From the eighties onward much progress has been made on the analysis of messy data, cf. Milliken and Johnson [9,10,11], but faith in the possibility of individuals dealing with bad data like professionals, as claimed by Asboth [12], seems too optimistic. Moreover, aside from cleaning the data, some data wrangling (cf. Petricek et al. [13]) may be needed, and progress using AI brought in interesting new features; see also Megahed et al. [14], Munappy et al. [15], and Mohammed et al. [16]. In the present framework, the goal is to identify change points hinting alteration from InC to OutC; for information on change-point-detection methods, see Truong et al. [17] and van den Burg and Williams [18], always having in mind that SQC tools can be misused. For a critical overview see Elhabashy et al. [19].

Our multidisciplinary research team was consulted to analyze the data and procedures in order to indicate weak and strong points of the SQC benchmark and to recommend alterations that would provide better forecasting; and, namely, to offer counseling on data handling. Less radical than Sir Ronald Fisher, we proposed ways of dealing with the existing messy nominal data, and highlighted inappropriate to use foolproof SQC tools such as CUSUM charts, as clearly advised in Elhabashy et al. [19].

As in many nonparametric frameworks, we defined adequate scores, whose moving averages and time series trends establish that 0-upcrossings clearly forecast InC/OutC shifts. The present report complements the findings in Pestana and Brilhante [20], and it is structured as follows:

Section 2 describes the factory SQC routines and ensuing production halts and downtimes, and discusses eventual InC/OutC misclassification using the standard metrics computed from the confusion matrix.

Section 3 addresses the use of scores and their moving averages for diagnosing the clustering of suspicious sequences signaling shifts from InC to OutC. The values of the daily score moving averages determination coefficients, although there are three outliers, are always moderate, supporting the idea that feedback and control are effective in enforcing the independence of sequential readings.

Section 4 discusses the improvements resulting from the SQC routines update.

Section 5 states concluding remarks, advising how to improve routines for timely alert that the system is sliding to OutC.

2. Factory Quality Control Routines

The main SQC factory team’s assignment was to continuously monitor the frying-oil-temperature evolution for each 2-min frying period. The production line automatically made available 8-ary sequences of oil temperatures 15 s apart. The majority of the SQC team members were laymen in Statistics, and, in fact, quite adverse to numeracy. So, they asked someone from the Informatics department to transform the numerical data furnished from the equipment into nominal color-coded marks visually depicting whether the system was In Control (InC) or shifting to Out of Control (OutC), and needing to be halted for maintenance or repair. This was a routine inherited from the previous production line surveillance team, relying on the belief that

t \in [175 °, 185 °]

was a negative (N) symptom that the system was InC, and

t \notin [175 °, 185 °]

was a positive (P) symptom that the system was sliding towards OutC.

Since the target was

180 ° \pm 4 °

, temperatures

t \in (176 °, 184 °)

were considered True Negative (TN) and color-coded ● and temperatures

t < 172 °

or

t > 188 °

were considered True Positive (TP) and color coded ●. In the ranges

[172 °, 176 °)

and

(184 °, 188 °]

(i.e., between one and two standard deviations apart from the target 180°), the classification rule was more elaborate. In terms of what concerns overheating,

t \in (184 °, 185 °)

was classified as TN if the next value was N, and False Negative (FN) if the ensuing value was P, and in such a case color-coded ●. On the other hand,

t \in [185 °, 188 °)

was considered TP if the next temperature was P, and classified as False Positive (FP) if the next value was N, in this case being color-coded ●. Similar rules were adopted in the rare case of under-heating:

TN ● if $\{\begin{matrix} t \in [176 °, 184 °], or \\ t \in (184 °, 185 °] and the subsequent value is < 185 °, or \\ t \in [175 °, 176 °) and the subsequent value is > 175 ° \end{matrix}$
TP ● if $\{\begin{matrix} t \geq 188 °, or \\ t \in (185 °, 188 °) and the subsequent value is > 185 °, or \\ t \leq 172 °, or \\ t \in (172 °, 175 °) and the subsequent value is < 175 ° \end{matrix}$ ;
FN ● if $\{\begin{matrix} t \in (184 °, 185 °] and the subsequent value is \geq 185 °, or \\ t \in [175 °, 176 °) and the subsequent value is \leq 175 ° \end{matrix}$ ;
FP ● if $\{\begin{matrix} t \in (185 °, 188 °) and the subsequent value is \leq 185 °, or \\ t \in (172 °, 175 °) and the subsequent value is \geq 175 ° \end{matrix}$ ,

Each frying-batch standard time is 2 min, originating a 8-ary sequence of temperature readings 15 s apart.

For example

(179.2 °, 180.3 °, 184.6 °, 185.3 °, 182.9 °, 182.3 °, 181.8, 188.4 °)

were classified (TN,TN,FN,FP,TN,TN,TN,TP), and recoded ●●●●●●●●, to appear in the supervisor’s monitor.

This Kanban-inspired (see Louis [21]) inspection simplicity is praise-worth, but the downgrading of the interval-scale temperatures data to the color-code nominal scale is rather exaggerated and adverse to deep statistical analysis.

The raw data of temperature readings, in Stevens [22] interval scale, were discarded as a result of the above nominal scale 8-ary color-doted sequences. Further, the ordering TN < FN < FP < TP was not used; hence, there is no reason to consider that the data are in the ordinal scale.

The factory SQC team used a layman’s unproved conjecture that the inspection of the color-coded data would indicate the state of the system: 8-ary sequences with less than five ●/● supported the likelihood that the system was InC. On the other hand, an accumulation of neighboring 8-ary sequences with five or more ●/● hinted that the system could enter an OutC state, needing to be halted for maintenance or repair. Another unexplained rule was the decision to consider suspicious 8-ary sequences terminating in the codon PPP, with the implication that no three adjacent P readings existed in the five initial tags of the 8-ary sequence.

As a result of these feedback and control rules, the production had been halted for maintenance on 37 occasions in the 69 working days from October 1 to December 21, as displayed in Table A1 in Appendix B; 11 of those interruptions lasted 2 min, 18 interruptions lasted less than 8 min (eventually meaning that half of the interruptions were either FD or required very simple maintenance), and the remaining 18 lasted more than 13 min, with a severe outlier of 109 min, as depicted in the bar chart and boxplot in Figure 1. The production halt extremes and quartiles are displayed in Table 1.

Considering the total production downtime (11.1 h) relative to the total production hours (1380 h) for those 69 working days, a rough estimate of the probability that the system is OutC is

π_{1} = \frac{11.1}{1380} \approx 0.008

(an underestimate, as discussed in Section 3), while the probability that it is InC is

π_{0} = 1 - π_{1} \approx 0.992

. This was considered satisfactory, since brand-new equipment should be almost always InC. However, the data did not provide a timely alert that a possible shift from InC to OutC could occur, and a rationale to distinguish whether an alarm would cause a False Discovery (FD), or the absence of an alarm would imply a False Omission (FO), and the ensuing losses and unnecessary wastes, is contrary to the recommended Lean Waste modern guidelines.

The data masking inevitably downscaled the original interval-scale quantitative data of temperatures (see Stevens [22]) to a nominal scale represented by the categories TN, FN, FP, and TP. This, of course, precluded the possibility of using the foolproof SQC standard tools, such as CUSUM charts.

The factory SQC team used the halt durations in Table A1 in Appendix B to classify each instance as False Discovery (FD) or False Omission (FO) of OutC: sensitivity and specificity are the true alert (hit) and true failure (miss) rates to diagnose OutC. The criteria were mainly a combination of halt durations with a qualitative assessment of renewal patterns of occurrences in the same day or in adjacent days; namely, several alerts in 10 h operating periods.

Sensitivity or true predictive rate (TPR) and specificity or true negative rate (TNR), positive predictive value (PPV) and false discovery rate (FDR), negative predictive value (NPV) and false omission rate (FOR), computed from the Confusion Matrix (bold part of Table 2), are cornerstone concepts to assess SQC. Other important metrics are defined in Table 3 and incorporated in the Performance Metrics Matrix, Table 2.

A comparison of the SQC team’s classifications with the findings of the maintenance department was used to scrutinize the classification-confusion matrix in Table 2 with associated (mis)matching evaluation metrics, as defined in Table 3. For detailed discussions on confusion matrices and related evaluation metrics, refer to [4,5,6,7,23,24,25,26,27].

The factory SQC team classification-confusion matrix is displayed in Table 4.

3. Scores and Their Moving Averages

The nominal data were unfit for the usual SQC diagnosis charts. The belief that 8-ary sequences with less than five ●/● (FN/TP) are a symptom that the system was InC, and 8-ary sequences with five or more ●/● (FN/TP) are a symptom that the system was shifting to OutC state, can obviously be misleading, since sudden surges are in most cases immediately corrected.

Only the accumulation of clusters of symptoms is meaningful. It was, therefore, decided to attribute scores to each 8-ary sequence using rules that promoted the attribution of positive scores to 8-ary sequences with a predominance of ●/● (TN/FP) temperatures, and of negative scores to 8-ary sequences with a predominance of ●/● (FN/TP) temperatures.

As shifts from InC to OutC should be forecasted by clustering P observations, the weights used were chosen so that 15-period-score (corresponding to 30 min) moving averages—that should mainly be negative, diagnosing InC—upcross 0 when clusters of strong disturbances do occur, signaling a shift to OutC. This was achieved by computing the scores as

Score = 2 \times # TP + # FN - \frac{# TN + # FP}{2} .

(1)

Observe that (1) can be re-expressed using the simple functions false discovery and false omission rates as multipliers of the number of temperatures exceeding 185°:

Score = \frac{1 + FDR}{FDR} # TP + \frac{1}{2 FOR} # FP .

(2)

The rationale for the definition of scores in (1) and their moving averages for 30-min periods is as follows:

The score of each 8-ary temperatures sequence ranges from $- 4$ , when all temperatures are TN, to 16, when all temperatures are TP. In 30-min periods averaging, if 3 of the 15 8-ary sequences were (TP,TP,TP,TP,TP,TP,TP,TP), their contribution to the moving averages would be $3 \times 16 = 48$ . If all the other 12 sequences were (TN,TN,TN,TN,TN,TN,TN,TN), or more generally any combination of TN and FP, contributing $12 \times (- 4) = - 48$ , the moving average would be 0. On the other hand, the average would be positive if at least in one of those sequences one of the temperatures was TP or FN.
On a brand-new production line, we would expect that the system is almost always InC. This would mean that P observations would be rare, and there is a clear indication, see Table 5, that sequences with four or less Ps should be expected in InC, and that sequences with five or more Ps should be interpreted as symptoms that there exists some risk of OutC—with the proviso, obviously, that only the accumulation of evidence from clustering of such sequences should be an effective alarm.
In Table 5, we compare the probabilities $p_{k} | H_{0}$ and $p_{k} | H_{1}$ of $k, k = 0, \dots, 8$ P observations in a 8-ary sequence, where $H_{0}$ is the null hypothesis that the model is Gaussian(180,4) and $H_{1}$ is the alternative hypothesis that the model shifted at least for Gaussian(190,4). It is assumed that the inbuilt controls continuously work to maintain the system InC, with the side effect of rendering sequential values approximately independent, nearly sub-independent in the sense discussed by Hamedani [28]. The simple approximate probabilities are, therefore, the product of the classification classes under $H_{0}$ and under $H_{1}$ , respectively. Taking into account Table 5, it is plausible to expect that the sequence of 30-min moving averages scores upcross 0 when there is a clustering of 8-ary sequences with 5 or more P temperatures intuitively tied to a possible shift towards OutC.

It is then expected that a scores moving average upcrossing 0 is a trustful signal that maintenance or repair should be considered.

The Supplementary Materials in compressed file DataArchive.zip contains working days .xlsx files displaying the scores of the 8-ary sequences, their 15-period moving averages and the corresponding charts. Moving averages have an interesting prognostic value for shifts from InC to OutC since, when 0-upcrossings occurred, production interruptions were needed. Observe, however, that some 2-min halts may denounce FD, or very minor problems are easily solved with routine maintenance.

The album of daily half-hour (15-sequences period) scores’ moving averages in Appendix C plainly shows that the clustering of suspicious sequences leading to 0-upcrosses is closely tied to production halts for maintenance or repair. The exceptional situation of the blue-colored chronogram on November 08, although there is a 0-upcross and a 38-min production interruption, may be an instance of a false OutC forecast that was thoroughly investigated by the maintenance team without detecting any need for repair. The tiny values of the determination coefficient

R^{2}

, as shown in Figure 2, whose extremes and quartiles are displayed in Table 6, support the conviction that feedback and control effectively guarantee the approximate independence of sequential temperature readings.

The ascertainment that 0-upcrossings of scores’ moving averages is a clear alert that the process is sliding down to OutC and indicates that Table A1 should be substituted by Table A2 in Appendix B, which displays the duration from 0-upcrossing to the end of the production halt, depicted in the bar chart and boxplot in Figure 3.

The production halts’ extremes and quartiles, measured from the 0-upcrossing alert until resumption of production, are displayed in Table 7.

Therefore, instead of a total halts’ duration of 11:06, the improved estimates

π_{1}^{*} = \frac{1180}{82, 800} \approx 0.01425

and

π_{0}^{*} = 1 - π_{1}^{*} \approx 0.98575

that the frying unit is OutC or Inc, respectively, should be used.

Table A2 in Appendix B was discussed with the factory SQC and maintenance teams, and we asked them to classify each instance with regard to the true system state and the team’s diagnosis. From this, a new confusion/(mis)matching matrix (Table 8) was computed.

Comparing Table 4 and Table 8 shows a substantial improvement for all the evaluation metrics.

4. Discussion

The rationale for the definition (1) of scores and the use of 30-min windows for moving averages in Section 3 tries to balance the negative contribution of 8-ary sequences with a predominance of temperature values less than 185° with the positive redressing of sequences with a predominance of greater than 185° readings. It is obviously a heuristic approach, since it is impossible to devise an optimal decision.

With regard to the choice of the moving averages period, we experimented with 20-, 30-, 40- and 60-min windows. The 20-min window produced many FDs, and the 40-min and 60-min periods a small number of alerts, resulting in too many FO. The 30-min window produced an adequate number of alerts, with a reliable balance trading of PPV (positive predictive value), a sensibility with NPV (negative predictive value) and specificity.

Obviously, other windows would produce reliable results using different coefficients: for the 20 min window, it would be wiser to use lower multipliers for TP and FN (or greater coefficients for TN and FP), and, for the 40-min window, to use the reverse rule of thumb. The goal should always be to attain significative upcrossings of some level that are hinting that the system was deviating towards OutC. In the 60-min window, the smoothing effect of using the large number (30) of scores in the moving averages renders this heuristic useless.

Concerning the multipliers used, the rationale in Section 3 clearly indicates that they achieve an interesting balance of negative and positive scores when the system is InC, and a positive imbalance when it shifts towards OutC. Obviously, other values could be chosen, for instance

2.5 \times # TP + # FN - \frac{# TN + # FP}{2}

, or

2 \times # TP + 1.5 \times # FN - \frac{# TN + # FP}{2}

, or

2 \times # TP + # FN - \frac{# TN + # FP}{3}

would produce more alerts, and

\times # TP + # FN - \frac{# TN + # FP}{2}

would produce less alerts. For the fun of it, recalling the famous Euler’s formula tying

π

, e, i, and −1, we experimented with

π \times # TP + i^{4} \times # FN + (- 1) \times \frac{# TN + # FP}{e}

, to no avail, producing an excessive number of false alarms.

In view of this small-scale experimentation, the rule of thumb “use 15-period moving averages (30-min window) of scores”, as defined in (1), has been adopted.

5. Conclusions

The recommendation to implement routines for computing scores and their moving averages, and to display alerts when 0-upcrossings occur, was incorporated into the spreadsheet, which is continuously updated in the supervisor’s monitor. From 2 January onward, the quantitative data were plainly recorded in interval scale, enabling the use of more sophisticated statistical analysis tools. For the factory SQC team’s satisfaction, the classification algorithm based in the decision tree depicted in Appendix A provided painted displays, as exemplified in Figure 4, which are even more eye catching than the color dots used in the factory SQC routines in 2024.

Considering that only 8-ary sequences terminating in the codon PPP were suspicious was an inadequate criterion, since, in view of Table 5, all sequences with five or more Ps are suspicious. However, to a certain extent even this criterion is irrelevant, since feedback and control can overpass momentary deregulation of temperature. Only the accumulation of suspicious 8-ary sequences in an adjacent or neighbouring cluster, indicated by moving averages’ 0-upcrossings, clearly indicates OutC situations.

Maintaining the integrity of the temperature readings, several statistics of the 8-ary sequences are readily available; namely, extreme values, ranges, and the number of values exceeding

μ + 3 σ

. SQC evolved substantially, with methodologies such as Taguchi’s Total Quality Control, Six Sigma and Beyond, or developments with Design of Experiments, [29,30,31,32], and the availability of temperatures in the interval scale enables the use of those developments, and of traditional charts, as described in Montgomery [1] and Aslam [2], without misusing tools [19]. This also enabled Pestana and Brilhante [20] to treat 8-ary sequences as digital ants, using digital pheromones to forecast shifts from InC to OutC.

Despite the availability of more sophisticated SQC tools when the interval-scale data are kept, we must be aware that their usefulness resides mainly in their ability to confirm a posteriori that SQC performance is high. The main issue of deciding to halt production for maintenance or repair, thus of forecasting possible OutC, is the routine task of the supervision unit, which must decide in real time whether there is a clustering of suspicious 8-ary sequences.

Thus, as scores’ moving averages can be computed routinely and in a timely manner, we further recommended adding columns in Figure 4 to display in each line the scores, as defined in (1), the last 30-min-score moving averages, and the highest temperature in the 8-ary sequence. Further, scores’ moving average 0-upcrossing should trigger a visual and sound alarm. Both recommendations have been implemented

Supplementary Materials

The following supporting information can be downloaded at: DataArchiveSet.zip, https://doi.org/10.5281/zenodo.20100519, containing daily data of scores and their 15-period moving averages, as well as charts displaying the corresponding time series, trend-line and determination coefficient. From this, materials and methods are available, and conclusions can be independently checked.

Author Contributions

Conceptualization, D.P. and M.L.R.; methodology, D.P.; software, M.L.R.; formal analysis, D.P. and M.L.R.; investigation, D.P. and M.L.R.; data curation, M.L.R.; writing—original draft preparation, D.P. and M.L.R.; writing—review and editing, D.P. and M.L.R.; visualisation, M.L.R.; supervision, D.P.; project administration, D.P. and M.L.R.; and funding acquisition, D.P. and M.L.R. All authors have read and agreed to the published version of the manuscript.

Funding

This work is funded by national funds through FCT—Fundação para a Ciência e a Tecnologia, I.P., under CEAUL Research Unit, UID/00006/2025, https://doi.org/10.54499/UID/00006/2025, and by the European Union—NextGenerationEU through the project UID/PRR/00006/2025, https://doi.org/10.54499/UID/PRR/00006/2025, and M. L. Rocha acknowledges the financial support from FCT—Fundação para a Ciência e Tecnologia (Portugal) through the research grant UIDB/00685/2025 of the Centre of Applied Economics Studies of the Atlantic—School of Business and Economics ∣ University of the Azores and from the Regional Directorate for Science, Innovation and Development—Azores Government through the research grant M1.1. A/FUNC.UI&D/018/2025 (PROSCIENTIA).

Data Availability Statement

The original contributions presented in this study are included in the Supplementary Material. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors are thankful for the authorisation to use the October–December 2024 data they were hired to analyse, with the provision that the trademark should not be disclosed, from reputation concerns of the firm, and that one year delay should be observed. They also aknowledge the many suggestions and critical reading of the preprint by Maria de Fátima Brilhante (Univ. Azores), and her comments on scenarios created by alternative scorings and moving averages period, and the Python v. 3.13 program and decision tree provided by Pedro Pestana (Univ. Aberta). The authors also thank the referees for interesting observations that contributed to improve the presentation.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

Aside from the abbreviations of the classification metrics defined in Table 3, the following abbreviations are used in this manuscript:

DInC	Diagnosed as InC
DOutC	Diagnosed as OutC
FD	False Discovery
FN	False Negative
FO	False Omission
FP	False Positive
InC	In Control
N	Negative
OutC	Out of Control
P	Positive
SQC	Statistical Quality Control
T	Total
TN	True Negative
TP	True Positive

Appendix A. Classification Code and Decision Tree

(Flowchart and code for the classification of temperatures in TN, FP, FN, TP qualitative symptoms of InC/OutC, and the associated green, yellow, blue, red coloring of cells.)

Classification code:

def classify_tn_tp_fn_fp_np(t_series):

import numpy as np

t = np.asarray(t_ series, dtype=float)

n = len(t)

# Subsequent values (NaN for last)

subsequent = np.empty_like(t)

subsequent[:-1] = t[1:]

subsequent[-1] = np.nan

# Start with all None

labels = np.array([None] * n, dtype=object)

# === TN Rules ===

tn_mask = (

((t >= 176) & (t <= 184)) |

((t > 184) & (t <= 185) & (subsequent < 185)) |

((t >= 175) & (t < 176) & (subsequent > 175))

)

labels[tn_mask] = "TN"

# === TP Rules ===

tp_mask = (

(t >= 188) |

((t > 185) & (t < 188) & (subsequent > 188)) |

(t <= 170) |

((t > 170) & (t < 175) & (subsequent < 175))

)

labels[t_mask] = "TP"

# === FN Rules ===

fn_mask = (

((t > 184) & (t <= 185) & (subsequent >= 185)) |

((t >= 175) & (t < 176) & (subsequent <= 175))

)

labels[fn_mask] = "FN"

# === FP Rules ===

fp_mask = (

((t > 185) & (t < 188) & (subsequent <= 185)) |

((t > 170) & (t < 175) & (subsequent >= 175))

)

labels[fp_mask] = "FP"

return labels

Figure A1. Classification decision tree.

Appendix B. Production Halts and Downtimes, October–December 2024

The tables with downtimes of the production line show that, with early diagnosis, the system can enter OutC. Table A2, inflates the estimate of the probability of being OutC. From the observation of 0-upcrossings preceding production halts, the estimates of the probabilities that the system is OutC or InC were updated in Section 3, as compared with the underestimate of OutC in Section 2.

The ascertainment that 0-upcrossings of scores’ moving averages is a clear alert that the process is sliding down to OutC indicates that Table A1 should be substituted by Table A2, which displays the duration from 0-upcrossing to the end of the production halt.

Table A1. Production halts and downtimes, in minutes, SQC factory team data, 2024 October–December.

Number	Date	Start-End Time	Duration	Number	Date	Start-End Time	Duration
1	7 October	14:08–14:37	29	20	19 November	11:18–11:20	2
2	10 October	22:26–22:48	22	21	19 November	11:28–11:30	2
3	15 October	11:24-13:37	109	22	19 November	13:16–13:42	26
4	16 October	17:04–17:10	6	23	19 November	19:14–19:16	2
5	21 October	14:20–14:44	24	24	19 November	19:32–19:36	4
6	23 October	09:16–09:18	2	25	19 November	20:18–20:24	6
7	23 October	10:04–10:35	31	26	22 November	14:22–14:51	29
8	25 October	15:08–15:54	46	27	29 November	16:18–16:20	2
9	28 October	19:26–19:28	2	28	29 November	18:18–18:58	40
10	28 October	20:16–20:39	13	29	4 December	10:12–10:19	7
11	30 October	09:54–09:56	2	30	5 December	17:14–17:28	14
12	30 October	10:14–10:16	2	31	10 December	11:02–11:35	33
13	30 October	10:20–10:22	2	32	11 December	09:14–09:20	6
14	30 October	15:12–15:14	2	34	11 December	13:12–13:42	30
16	4 November	08:24–08:29	5	35	12 December	12:06–12:12	6
17	5 November	11:20–11:45	25	36	13 December	12:04–12:30	26
18	8 November	16:02–16:39	37	37	16 December	16:14–16:33	19
19	13 November	08:44–09:28	44

Table A2. Downtime duration in minutes, from 0-upcrossings alert to production resumption, 2024 October–December.

Number	Date	0-Up-End Time	OutC Duration	Number	Date	0-Up-End Time	OutC Duration
1	7 October	13:32–14:37	65	20	19 November	11:00–11:20	20
2	10 October	22:02–22:48	46	21	19 November	11:22–11:30	8
3	15 October	11:08–13:37	125	22	19 November	13:02–13:42	40
4	16 October	16:50–17:10	20	23	19 November	18:56–19:16	20
5	21 October	14:06–14:44	38	24	19 November	19:18–19:36	18
6	23 October	09:04–09:18	14	25	19 November	20:08–20:24	16
7	23 October	09:54–10:35	41	26	22 November	14:14–14:51	35
8	25 October	14:56–15:54	58	27	29 November	16:00–16:20	20
9	28 October	19:22–19:28	6	28	29 November	18:02–18:58	56
10	28 October	20:10–20:39	17	29	4 December	09:56–10:19	23
11	30 October	09:44–09:56	12	30	5 December	16:56–17:28	32
12	30 October	09:58–10:16	18	31	10 december	10:48–11:35	47
13	30 October	10:18–10:22	4	32	11 December	08:56–09:20	24
14	30 October	12:22–12:47	25	33	11 December	10:16–10:32	16
15	30 October	14:56–15:14	18	34	11 December	12:58–13:42	44
16	4 November	08:08–08:29	21	35	12 December	11:56–12:12	16
17	5 November	11:08–11:45	37	36	13 December	11:52–12:30	38
18	8 November	15:50–16:39	49	37	16 December	15:54–16:33	37
19	13 November	08:32–09:28	56

Appendix C. Album of Daily-Score-Moving-Averages’ Chronograms

Appendix C is an album of daily-score-moving-averages’ chronograms, exhibiting the strict relationship of 0-upcrossings to shifts from InC to OutC. The horizontal axis is the time of the working day, and the vertical axis is the 15-period (30-min window) scores’ moving averages. Further, the trend lines, in general decreasing, show that the system has inbuilt feedback and control procedures that usually keep it InC.

In the charts that follow, blue scores’ time series are consistently below 0, and correspond to days with no overheating alerts. Orange time series exhibit 0-upcrossing(s) alerts, consistently forecasting the need to interrupt production for maintenance or repair. In the majority of cases the decreasing trend indicates that inbuilt controls are performing appropriately, either when the system is InC or after repair when 0-upcrossings triggered alerts.

There is not a clear pattern of seasonality. Oil temperature is a random variable, whose variability depends on the temperature of new batches of potatoes to be fried, on the intensity of the electric current and on the Joule effect, with constant feedback and controls limiting its range. Under InC, scores of 8-ary sequences are almost always negative, with episodic increases due to current intensity surges. When the system is shifting towards OutC, cluster of 8-ary sequences with positive scores will create peaks—that in many cases feedback and control will eliminate, smoothing the moving averages—but whose uncontrolled accumulation must be interpreted as an alert.

Appendix C.1. October 2024

Appendix C.2. November 2024

Appendix C.3. December 2024

References

Montgomery, D. Introduction to Statistical Quality Control, 8th ed.; Wiley: Hoboken, NJ, USA, 2020. [Google Scholar]
Aslam, M. Introduction to Statistical Process Control; Wiley: Hoboken, NJ, USA, 2020. [Google Scholar]
Hryniewicz, O. Statistics with fuzzy data in statistical quality control. Soft Comput. 2008, 12, 229–234. [Google Scholar] [CrossRef]
Stehman, S.V. Selecting and interpreting measures of thematic classification accuracy. Remote Sens. Environ. 1997, 62, 7–89. [Google Scholar] [CrossRef]
Ting, K.M. Confusion Matrix. In Encyclopedia of Machine Learning and Data Mining; Sammut, C., Webb, G.I., Eds.; Springer: Boston, MA, USA, 2016; p. 1. [Google Scholar] [CrossRef]
Tharwat, A. Classification assessment methods. Appl. Comput. Inform. 2018, 17, 168–192. [Google Scholar] [CrossRef]
Opitz, J. A Closer Look at Classification Evaluation Metrics and a Critical Reflection of Common Evaluation Practice. Trans. Assoc. Comput. Linguist. 2024, 12, 820–836. [Google Scholar] [CrossRef]
De Veaux, R.D.; Hand, D.J. How to Lie with Bad Data. Stat. Sci. 2005, 20, 231–238. [Google Scholar] [CrossRef]
Milliken, G.; Johnson, D. Analysis of Messy Data: Designed Experiments, 2nd ed.; Number vol. 1 in Analysis of Messy Data; CRC Press: Boca Raton, FL, USA, 2008. [Google Scholar]
Milliken, G.; Johnson, D. Analysis of Messy Data: Nonreplicated Experiments; Number vol. 2 in Analysis of Messy Data; CRC Press: Boca Raton, FL, USA, 1984. [Google Scholar]
Milliken, G.; Johnson, D. Analysis of Messy Data: Analysis of Covariance; Number vol. 3 in Analysis of Messy Data; Chapman and Hall/CRC: Boca Raton, FL, USA, 1984; vol. 3. [Google Scholar] [CrossRef]
Asboth, D. The Well-Grounded Data Analyst: Solve Messy Data Problems Like a Pro; Manning Publications: Shelter Island, NY, USA, 2025. [Google Scholar]
Petricek, T.; van den Burg, G.J.J.; Nazábal, A.; Ceritli, T.; Jiménez-Ruiz, E.; Williams, C.K.I. AI Assistants: A Framework for Semi-Automated Data Wrangling. IEEE Trans. Knowl. Data Eng. 2023, 35, 9295–9306. [Google Scholar] [CrossRef]
Megahed, F.M.; Chen, Y.J.; Zwetsloot, I.M.; Knoth, S.; Montgomery, D.C.; Jones-Farmer, L.A. Introducing ChatSQC: Enhancing statistical quality control with augmented AI. J. Qual. Technol. 2024, 56, 474–497. [Google Scholar] [CrossRef]
Munappy, A.R.; Bosch, J.; Olsson, H.H.; Arpteg, A.; Brinne, B. Data management for production quality deep learning models: Challenges and solutions. J. Syst. Softw. 2022, 191, 111359. [Google Scholar] [CrossRef]
Mohammed, S.; Budach, L.; Feuerpfeil, M.; Ihde, N.; Nathansen, A.; Noack, N.; Patzlaff, H.; Naumann, F.; Harmouch, H. The effects of data quality on machine learning performance on tabular data. Inform. Syst. 2025, 132, 102549. [Google Scholar] [CrossRef]
Truong, C.; Oudre, L.; Vayatis, N. Selective review of offline change-point-detection methods. Signal Process. 2020, 167, 107299. [Google Scholar] [CrossRef]
van den Burg, G.J.J.; Williams, C.K.I. An Evaluation of Change Point Detection Algorithms. arXiv 2022, arXiv:2003.06222. [Google Scholar] [CrossRef]
Elhabashy, A.; Wells, L.J.; Woodall, W.H.; Camelio, J. Misuse of Quality Control tools in manufacturing. In Extended Abstracts of the 2018 IISE Annual Conference; Barker, K., Berry, D., Rainwater, C., Eds.; Institute of Industrial and Systems Engineers: Norcross, GA, USA, 2018; Available online: https://www.researchgate.net/publication/325402674_Misuse_of_Quality_Control_Tools_in_Manufacturing (accessed on 4 May 2026).
Pestana, P.; Brilhante, M.F. A Digital Pheromone-Based Approach for In-Control/Out-of-Control Classification. Commun. Math. 2026, 34, cm:16690. [Google Scholar] [CrossRef] [PubMed]
Louis, R. Custom Kanban: Designing the System to Meet the Needs of Your Environment; Productivity Press: University Park, IL, USA, 2006. [Google Scholar]
Stevens, S.S. On the Theory of Scales of Measurement. Science 1946, 103, 677–680. [Google Scholar] [CrossRef] [PubMed]
Powers, D.M. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar] [CrossRef]
Ting, K.M. Error Rate. In Encyclopedia of Machine Learning; Sammut, C., Webb, G., Eds.; Springer: Boston, MA, USA, 2011; p. 331. [Google Scholar] [CrossRef]
Ting, K.M. Sensitivity and Specificity. In Encyclopedia of Machine Learning; Sammut, C., Webb, G., Eds.; Springer: Boston, MA, USA, 2011; pp. 901–902. [Google Scholar] [CrossRef]
Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef] [PubMed]
Chicco, D.; Tötsch, N.; Jurman, G. The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Min. 2021, 14, 13. [Google Scholar] [CrossRef] [PubMed]
Hamedani, G.G. Sub-Independence: An Expository Perspective. Commun. Stat.—Theory Methods 2013, 42, 3615–3638. [Google Scholar] [CrossRef]
Roy, R.K. A Primer on Taguchi Method; Society of Manufacturing Engineers: Dearborn, MI, USA, 2010. [Google Scholar]
Praveen, G. The Six Sigma Performance Handbook: A Statistical Guide to Optimizing Results; McGraw-Hill: New York, NY, USA, 2005. [Google Scholar]
Pysdek, T.; Keller, P.A. The Six Sigma Handbook, 3rd ed.; McGraw-Hill: New York, NY, USA, 2010. [Google Scholar]
Vuchkov, I.N.; Boyadjieva, L.N. Quality Improvement with Design of Experiments: A Response Surface Approach; Springer: Dordrecht, The Netherlands, 2001. [Google Scholar]

Figure 1. Interruption halts and downtime durations in minutes: bar chart (left) and boxplot (right).

Figure 2. Score-moving-averages’ determination coefficients: bar chart (left), histogram (center), and boxplot (right).

Figure 3. Production halts’ durations, in minutes, from the 0-upcrossing alert until resumption of production: bar chart (left) and boxplot (right).

Figure 4. Data recording from January 2025 onward, with temperature readings and color classification—green for TP, orange for FP, blue for FN and red for TP—scores as defined in Equation (1), and moving averages (30-min window, period 15).

Table 1. Production-halt extremes and quartiles, in minutes.

$x_{1 : 37}$	$Q_{0.25}$	M	$Q_{0.75}$	$x_{37 : 37}$
2	2	13	30	109

Table 2. Performance metrics’ matrix.

T (Total)	DOutC	DInC	BM	PT
OutC (P)	TP	FN	TPR	FNR
InC (N)	FP	TN	FPR	TNR
Prev	PPV	NPV	${LR}^{+}$	${LR}^{-}$
Acc	FDR	FOR	MK $Δ_{P}$	DOR
BAcc	$F_{1}$ score	FM index	MCC	CSI

Table 3. Classification metrics.

Metric	Abbreviation and Definition
Total	T = TP + FN + FP + TN = OutC + InC = DOutC + DInC
Sensitivity or True Positive Rate	$TPR = \frac{TP}{P}$
Specificity or True Negative Rate	$TNR = \frac{TN}{N}$
False Negative Rate	$FNR = \frac{FN}{P}$
False Positive Rate	$FPR = \frac{FP}{N}$
Prevalence	$Prev = \frac{P}{P + N}$
Positive Predictive Value (precision)	$PPV = \frac{TP}{TP + FP}$
Negative Predictive Value	$NPV = \frac{TN}{TN + FN}$
False Discovery Rate	$FDR = \frac{FP}{TP + FP}$
False Omission Rate	$FOR = \frac{FN}{TN + FN}$
Positive Likelihood Rate	${LR}^{+} = \frac{TPR}{FPR}$
Negative Likelihood Rate	${LR}^{-} = \frac{FNR}{TNR}$
Accuracy	$Acc = \frac{TP + TN}{P + N}$
Balanced Accuracy	$BAcc = \frac{TPR + TNR}{2}$
Bookmaker Informedness	$BM = TPR + TNR - 1$
Markedness	MK $Δ_{P} = PPV + NPV - 1$
Prevalence Threshold	$PT = \frac{\sqrt{TPR \times FPR} - FPR}{TPR - FPR}$
Diagnosis Odds Ratio	$DOR = \frac{{LR}^{+}}{{LR}^{-}}$
$F_{1} Score$	$F_{1} Score = \frac{2 PPV \times TPR}{PPV + TPR} = \frac{2 TP}{2 TP + FP + FN}$
Fowlkes-Mallows Index	$FM = \sqrt{PPV \times TPR}$
Threat score or Critical Success Index ⁽¹⁾	TS = CSI = $\frac{TP}{TP + FN + FP}$
Mathews Correlation Coefficient	$MCC = \sqrt{TPR \times TNR \times PPV \times NPV} -$
	$- \sqrt{FNR \times FPR \times FOR \times FDR}$

⁽¹⁾ Or Jaccard Index (JI).

Table 4. Factory SQC team classification-confusion matrix.

T = 37	DOutC	DInC	BM = 0.55152	PT = 0.36342
P = OutC = 22	TP = 18	FN = 4	TPR = 0.81818	FNR = 0.18182
N = InC = 15	FP = 4	TN = 11	FPR = 0.26667	TNR = 0.73333
Prev = 0.59460	PPV = 0.81818	NPV = 0.73333	${LR}^{+}$ = 3.06818	${LR}^{-}$ = 0.24793
Acc = 0.78378	FDR = 0.181818	FOR = 0.26667	MK = 0.55152	DOR = 12.375
BalAcc = 0.77576	$F_{1}$ score = 0.81818	FMI = 0.81818	MCC = 0.55152	TS score = 0.69231

Table 5. Odds that the system is InC or in OutC, given the number of P’s in the 8-ary sequence.

Number of P’s	$p \| H_{0}$	$p \| H_{1}$	$p \| H_{0} / p \| H_{1}$
0	0.40950039	1.5464 $\times 10^{- 8}$	26481524.2
1	0.38679107	1.0478 $\times 10^{- 6}$	369153.707
2	0.15983674	3.106 $\times 10^{- 5}$	5146.02023
3	0.0377432	0.00052614	71.7357668
4	0.00557033	0.00557033	1
			$p \| H_{1} / p \| H_{0}$
5	0.00052614	0.0377432	71.7357668
6	3.106 $\times 10^{- 5}$	0.15983674	5146.02023
7	1.0478 $\times 10^{- 6}$	0.38679107	369153.707
8	1.5464 $\times 10^{- 8}$	0.40950039	26481524.2

Table 6. Determination coefficients’ extremes and quartiles.

$x_{1 : 69}$	$Q_{0.25}$	M	$Q_{0.75}$	$x_{69 : 69}$
2 $\times 10^{- 8}$	0.0029	0.0103	0.0269	0.1089

Table 7. Extremes and quartiles of production halts, from the 0-upcrossing alert until resumption of production, in minutes.

$x_{1 : 37}$	$Q_{0.25}$	M	$Q_{0.75}$	$x_{37 : 37}$
4	18	24	41	125

Table 8. Score-moving-averages-based confusion matrix.

T = 37	DOutC	DInC	BM = 0.72353	PT = 0.30691
P = 20	TP = 18	FN = 2	TPR = 0.9	FNR = 0.1
N = 17	FP = 3	TN = 14	FPR = 0.17647	TNR = 0.82353
Prev = 0.54054	PPV = 0.85714	NPV = 0.875	${LR}^{+}$ = 5.1	${LR}^{-}$ = 0.12143
Acc = 0.86487	FDR = 0.14286	FOR = 0.125	MK = 0.73214	DOR = 42
BalAcc = 0.86177	$F_{1}$ score = 0.87805	FMI = 0.87831	MCC = 0.72782	TS score = 0.78261

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pestana, D.; Rocha, M.L. Prognostic Value of Scoring and 0-Upcrossing in Statistical Quality Control. AppliedMath 2026, 6, 77. https://doi.org/10.3390/appliedmath6050077

AMA Style

Pestana D, Rocha ML. Prognostic Value of Scoring and 0-Upcrossing in Statistical Quality Control. AppliedMath. 2026; 6(5):77. https://doi.org/10.3390/appliedmath6050077

Chicago/Turabian Style

Pestana, Dinis, and Maria Luísa Rocha. 2026. "Prognostic Value of Scoring and 0-Upcrossing in Statistical Quality Control" AppliedMath 6, no. 5: 77. https://doi.org/10.3390/appliedmath6050077

APA Style

Pestana, D., & Rocha, M. L. (2026). Prognostic Value of Scoring and 0-Upcrossing in Statistical Quality Control. AppliedMath, 6(5), 77. https://doi.org/10.3390/appliedmath6050077

Article Menu

Prognostic Value of Scoring and 0-Upcrossing in Statistical Quality Control

Abstract

1. Introduction

2. Factory Quality Control Routines

3. Scores and Their Moving Averages

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Classification Code and Decision Tree

Appendix B. Production Halts and Downtimes, October–December 2024

Appendix C. Album of Daily-Score-Moving-Averages’ Chronograms

Appendix C.1. October 2024

Appendix C.2. November 2024

Appendix C.3. December 2024

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI