Systemic Predictive and Prescriptive Maintenance

Löfstrand, Magnus; Eklund, Patrik

doi:10.3390/app16021088

Open AccessArticle

Systemic Predictive and Prescriptive Maintenance

by

Magnus Löfstrand

^1,*

and

Patrik Eklund

²

¹

School of Science and Technology, Örebro University, 701 82 Örebro, Sweden

²

Department of Computing Science, Umeå University, 901 87 Umeå, Sweden

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(2), 1088; https://doi.org/10.3390/app16021088

Submission received: 21 December 2025 / Revised: 13 January 2026 / Accepted: 19 January 2026 / Published: 21 January 2026

(This article belongs to the Special Issue Sustainable Reliability, Maintenance, and Fault Diagnosis Strategies for Mechanical and Manufacturing Systems)

Download

Browse Figures

Versions Notes

Abstract

In this paper we introduce a systemic approach for predictive and prescriptive maintenance framed within the larger system of systems, as exemplified by a use case in mining. Developments are presented as systematic, while there is a focus on improved availability modeling using the time usage model with a corresponding UML/SysML StateMachine representation. Data becomes connected with a more elaborated definition of time, with failure modes and analytics in reliability engineering being supported by improved underlying information structures.

Keywords:

systemic predictive maintenance; information structures; failure modes; analytics; maintainment; maintenance-in-maintenance

1. Introduction

Industrial mining operations typically need increased sustainability, resiliency, and high efficiency while keeping costs down. High reliability and optimal maintenance of mechanical systems while pursuing efficiency and longevity in terms of, for example, remaining useful life (RUL) become important on the technical side. People and their competences, certifications, and training relate to the operational side, including efficient machine operations, machine maintenance, and the related data structuring and analysis to predict and prevent failure, increase RUL, and optimize mining process operations.

Monitoring continuous operation in a complex system of systems requires an intertwining of information and process structures, residing over a foundation for a structure of ‘system states’. Begin and end times for states, and times for state transitions, require a state-based time classification framework.

In this paper we use a time classification standardized for mining operations [1], which is aligned with a maintenance standard [2], and further use a UML state machine diagram [3] for modeling discrete event-driven behaviors within a production cycle in mining operations. UML’s state machine is a specific form of finite-state automata based on an object-oriented variant of the statechart formalism [4].

Traditional reliability theory [5] handles time intervals and durations rather than instantaneous times and usually simplifies the structure of states by assuming a system with only two states, “up” and “down”. The time recorded for the transition from “up” to “down” is the time recorded for a failure event, where a failure is related to a point in time when a system loses its ability to perform a required function.

In this paper, to support sustainable approaches to system reliability, predictive maintenance, and fault diagnosis, specifically in the mining industry but applicable in others as well, the authors argue for more detailed descriptions of time in the context of operations, terminology and standards, failure and failure modes, etc., as needed to further develop availability modeling for applications in the following:

✓: Fault detection and diagnosis using AI, IoT, and data analytics;
✓: Reliability modeling of complex mechanical and manufacturing systems;
✓: Simulation for predictive maintenance and fault prevention;
✓: Decision-making under uncertainty for maintenance planning and risk mitigation.

Thus, developments reported in this paper not only support the optimization of industrial assets but also align with broader objectives such as life cycle management and Life of Mine Planning (LoMP) common to virtually all mines. The results also relate to business model selection on the scale from traditional sales (mining drill rigs) to functional provision [6] (for example drill meters per time unit) or function-based productivity (for example tons of ore per time unit). In cases where customer satisfaction is ensured through function-based offers, keeping track of cost and profit, and being able to calculate cost and profit using further developed time descriptions in the context of operations, terminology and standards, failure and failure modes, certainly becomes particularly important.

Technology is (through levels of business partners and providers in the mine) related to business (including the obvious business parameters, cost and profit). In parallel, processes of the mine are related to information (and information carriers such as vehicles and logs of staff operations). In this context, looking at the mine and its operations as a system of systems, the question then becomes which economy and role in the mine should be calculated, and how do these cost, profit, and value calculations relate to cost, profit, and value calculations of other roles in the mine? An example includes the top corporate audit level where the role of the CEO and the Company Board would like overarching reporting similar to a stop light (with green, yellow, and red signaling) concerning the operation of and economic development of the mine. On the (structurally) lowest level, the machine operators have their more specific machine-, logistics-, and rock-related needs, which not only represent costs but also value creation strongly related to the stop light signaling. In between, there are other layers and roles, one being the local and regional maintenance functions with their own technical and economic requirements, including, in a traditional business model, maximizing the margins of aftermarket sales. It is clear that optimizing across several interrelated levels of operations is important in the system of systems, and this paper provides an example of developing more accurate descriptive representations needed to do so.

The main and general objective of this paper is to increase awareness of the need to be more precise and specific about information structures related to failure events, their timings, and mitigation of failure by means of more accurately defined and described maintenance processes, e.g., in terms of system modeling-based state and time usage modeling. Contributions in these regards are in this paper connected to the use case drawn from experience in research related to iron ore surface and underground mining, fleet management, including drill rigs and dump trucks, and maintenance staff competences and logistics.

In this paper we propose that to create much improved actionable intelligence for the mining use case, more precise and specific information structures related to failure events, timings, and mitigation in the context of maintenance processes are vital. This in particular concerns maintenance resources, maintenance optimization, and maintenance management to support assets (mine and vehicle fleets) and their management using digitalized solutions (including AI, digital twins, and predictive and prescriptive maintenance), in effect creating “maintainment” information for “big wheel vehicles”, typically used in mining, in parallel to “infotainment” systems used in individually owned cars.

The objective of Section 2 is to give a brief overview of our ore mining use case, as it involves both technical and financial aspects.

Section 3 contributes to the historical perspective of availability modeling, as availability relates to reliability-centered maintenance in digital twin settings.

The objective of Section 4 is to define the concept and structure of time, making distinctions between calendar time, time intervals, and time durations.

In Section 5 we define system states, mainly for unravelling the up state, as based on the GMG standard.

Section 6 provides some new insights into risk modeling based on FMEA.

Section 7 concludes this paper.

Additionally, in Appendix A we provide a brief review of common reliability metrics, as they are typically used in the mining industry. Appendix B shows brief examples of technical and financial KPIs used in the mining industry.

2. The Ore Mining Use Case

The mining value chain is complex due to its diverse range of parties, each with specific roles. Exploration companies and geologists identify and evaluate mineral deposits; mining companies develop and operate mines, managing extraction and processing; equipment manufacturers supply machinery and technology; logistics providers handle transportation of raw materials and finished products; smelters and refineries process concentrates into usable metals; traders and commodity brokers facilitate market sales and distribution; regulators and government agencies oversee compliance, permitting, and environmental standards; financial institutions, investors, and insurers provide capital and risk management; and end-users, such as steel producers and manufacturers, utilize the final mineral products. In addition, communities and environmental organizations play a role in social license and sustainability efforts throughout the value chain. Clearly, the business structure and the roles of business partners in mining are complex.

Business partners in the mine include the mining company machine and equipment suppliers who supply drill rigs, dump trucks, crushers, elevators, pick-up trucks, etc., which all need maintenance from time to time, whether preventive, condition-based, predictive, or prescriptive.

Equipment suppliers in mining are sometimes (typically depending on the business contracts in place) also in charge of equipment maintenance and/or data analytics in the context of mining intelligence systems. In mixed-fleet systems, i.e., where fleets of mixed brand vehicles interact, complexities increase regarding operations, maintenance, and data analytics in mining intelligence. Additionally, locally employed maintenance staff representing one-person firms or SMEs may also be involved, further increasing maintenance coordination needs. It is also not uncommon that such a mixed group of maintenance staff service several mines in a larger geographical area. Clearly, the ore mining use case is complex and can change over time while potentially also (of course) being affected by world market prices, which can fluctuate on relatively short time scales.

A specific use case focus in this paper is predictive maintenance for drill rigs in open-pit mining, based on experience from cooperation with industrial partners, including operation and failure data analytics.

A mining company may use its fleet in several mines at different locations and therefore faces a complex task of operation and maintenance in order to manage the fleet, possibly supplied by different providers. Maintenance is often provided by the product supplier. However, sometimes third parties are engaged for maintenance and even for operation.

While maintenance itself is challenging, e.g., from the viewpoint of accommodating the maintenance sub-process in a time usage model, maintenance in maintenance often occurs, adding complexity to the system-of-systems modeling objectives. During a scheduled maintenance, e.g., a predetermined and preventive maintenance without prior specific observations, a degradation or potential fault may be observed, which requires diagnosis and may cause the scheduled maintenance to be paused. The maintenance state for the scheduled maintenance, residing as a sub-state within a down state, will then be paused, and the system state will transfer to a suspended state within the down state and instantly create a “new failure thread” to be handled separately from but within the “parent process” involving the scheduled maintenance.

As the scheduled maintenance proceeds, maintenance procedures are checked off one by one, and sometimes unexpected observations are made. For instance, while checking the rock drill or the rotation unit, and performing tightening and adjustments checks for possible leakages, checks for wear are conducted simultaneously. If something unusual is spotted, it may be handled as part of the scheduled maintenance, but it may also be registered as a potentially more severe potential failure point to be properly diagnosed and maintained outside the scope of the initially scheduled maintenance.

3. Availability Modeling

A service support system, as needed to keep the mining system of systems available and as agreed on, might for a complex technical system include availability planning, material provision, maintenance and remanufacture, configuration control, decision making, support system, modifications, maintenance plan development, and education [7].

The concept of technical availability, defined as the degree to which a system is operational and accessible when required, began garnering attention in the early 20th century. In the 1920s and 1930s, researchers explored early stochastic processes and probabilistic models, laying the foundation for future work in system reliability and availability. After World War II, the importance of availability in military operations led to a greater focus on maintenance and operational readiness, resulting in the establishment of the first metrics for evaluating reliability and availability [8].

The introduction of discrete-event simulation (DESs) in the 1970s offered a powerful framework for availability modeling, especially in manufacturing and service industries [9].

In the 1980s, the concept of reliability-centered maintenance (RCM) emerged, emphasizing the need for rigorous modeling in maintenance strategies to optimize availability [10]. RCM methodologies integrated failure modes and effects analysis (FMEA) into availability modeling, leading to more targeted maintenance strategies based on predicted system performance.

The 1990s marked a period of increased interdisciplinary collaboration, with ideas borrowed from operations research, information theory, and systems engineering. Researchers began using software tools for reliability and availability prediction. Iconic models such as the Weibull distribution gained popularity for their effectiveness in modeling time-to-failure data [11], allowing organizations to refine their maintenance schedules based on empirical data.

More recent developments, including digital twins (DTs), have the potential to transform how manufacturers conceive, design, manage, and optimize products, manufacturing systems, and services. However, their rapid evolution has led to a fragmented landscape in terms of conceptualization, applications, maturity levels, and enabling technological capabilities, creating a misalignment between academic research and industrial implementation [12]. This paper, among other things, proposes measures to rectify this misalignment.

However, the future trajectory of availability modeling and simulation appears promising, characterized by continuing innovation and adaptation. Artificial intelligence (AI) needs to be intelligently implemented to overcome barriers to achieving operational excellence through AI [13].

In [14], the focus is on (1) construction of a comprehensive analysis framework; (2) identification of key drivers and boundaries of environmental systems change; (3) investigation of influencing mechanisms of human–nature interactions; (4) development of mathematical simulation and modeling methods; (5) integrated utilization of data from different sources; and (6) expansion of environmental systems modeling across spatial, temporal, and organizational scales.

Nomenclature, terminology, and standards are additionally important, as terminology is often presented in natural language whereas more precise notation uses symbolic language and is adopted in order to support mathematical modeling. Terminology and notation must be well connected so that ambiguities can be avoided and interpretations of analytical results are not confused by a mismatch between terminology and notation.

The concept of a “system” is difficult to nail down in all its subtle detail. However, from a system maintenance point of view, a “system” in its most rudimentary form is a structured set of items. We may certainly choose alternative namings. The name item is favoured in CEN’s EN 13306:2017 standard on maintenance terminology [2], where an item is said to be a part, component, device, subsystem, functional unit, equipment, or system that can be individually described and considered. In [2] it is further said that a number of items may themselves be considered as an item. Implicitly, the standard then also embraces the concept of an “atomic” item, which is an item that is not a set of items. In other words, a “system” may be viewed as a hierarchical structure of items, in the standard called an indenture level of sub-division within an item hierarchy, where in this paper we add that sub-items may be sets of items or atomic items. We may have pre-defined relations between and within any subsets of items, and relations between items in different sets of items must be defined by and constructed from pre-defined relations. The CEN standard does not explicitly include terminology for relational structures. However, the standard includes typing and properties of items, which may be used to define relational structures within the overall system.

As pointed out in [15], reliability refers to the probability of a system meeting its desired performance standards in yielding output for a specific time duration when used under specific conditions [16]. In [15] they further point out that the theoretical definition of reliability is one minus probability of failure, given by

R (t)

, and that availability and maintenance are related to reliability and are defined as essential components of it [17]. Indeed, availability is a function of reliability and maintainability in that availability measures the degree to which an item can operate at some future time t or during a future time interval

(t_{1}, t_{2})

, which can be regarded as a reliability metric [5]. In [15] it is further summarized that there are four common maintenance approaches that can be applied to mining assets: reactive, preventive, condition-based, and prescriptive. Recent advances in prescriptive maintenance (PsM) as reviewed by [18] show that while data-driven predictive techniques are well-established, the prescriptive layer—aimed at transforming predictive insights to actionable intelligence—lacks empirical validation. The authors further infer that the lack of end-to-end holistic frameworks requires dedicated work on integration. That includes horizontal integration (across functional areas) and vertical integration (across decision-making levels). Furthermore, the authors conclude that PsM efforts are increasingly aligned with product/service quality and lifecycle performance, rather than traditional maintenance metrics (e.g., availability or downtime) alone. However, it must also be added that availability (as well as system efficiency and cost) is obviously a key KPI in relation to system productivity, service quality, and lifecycle performance.

The review of methods for reliability analysis and failure predictions, and their applications and distinctions, carried out in [15], as well as their conclusions, is particularly valuable. In [15], there is also the conclusion that at present, fault diagnostic systems are mostly built as a combination of individual parts (such as data collection, feature extraction and dimensionality reduction, and fault recognition) with little consideration of the whole diagnostic system. A complete end-to-end integrated and automated diagnostic system should be paid more attention, they conclude.

In [19], a related review on AI-driven predictive maintenance, mining industry challenges in maintaining high production levels while minimizing unplanned failures and operational costs, and keeping critical assets (crushers, conveyors, mills, etc.) operating with less than present risk levels, are pointed out. It is further concluded that despite advancements in AI-driven methods improving sensor-based data acquisition and asset management and extending equipment lifecycles while reducing failures, challenges such as data standardization, model scalability, and system interoperability persist, requiring further research.

4. Definition of Time and DATA Connected with Time

Time, often called “calendar time”, can be formally understood as a real number

t \in R

. We may have time representations like “1999-12-31 23:59:59 GMT”, this being the instant time one second before the end of the year 1999 in the GMT time zone. Such time representations can be converted to real numbers, including the decision about the instant time, or time instant, representing

0 \in R

. It may be “0000-01-01 00:00:00 GMT”, or the start time of a production, or something similar. The conversion of time representations to real numbers should be “linear” in the sense that durations of time are now skewed within the overall time scale; e.g., if, given a fixed α, an interval

[t, t + α] \subset R

corresponds to a time period for a particular

t \in R

, the interval will correspond to a period of the same length for any

t \in R

.

A period of time or time period is a closed interval

[t_{1}, t_{2}] \subset R

.

A duration of time or time duration is the length of a time interval, where

t_{2} - t_{1}

is the length of the time interval

[t_{1}, t_{2}]

. Clearly, one specific time interval defines a unique time duration, but a specific time duration

α

is the length of an interval

[t, t + α]

for any

t \in R

.

Two time periods

[t_{1}, t_{2}]

and

[t_{3}, t_{4}]

are connected if

t_{2} = t_{3}

or

t_{4} = t_{1}

. If

t_{2} = t_{3}

we may say that

[t_{1}, t_{2}]

and

[t_{3}, t_{4}]

, in that order, i.e., viewed as interval

[t_{1}, t_{2}]

followed by interval

[t_{3}, t_{4}]

, are consecutive time periods. This indicates that we may define various order relations within the set of all time periods.

We may note that two intervals are connected if and only if their intersection is a one-point set, and we may say that two intervals are non-overlapping if their intersection is the empty set.

Time periods may be further “aggregated”, where we would usually “add” durations of non-overlapping or consecutive intervals.

We should note that the (set) union of intervals may not be an interval, and the sum of durations may not be a duration, since for a sum of durations we may not have one interval corresponding to the sum of durations. On the other hand, a sum of durations of time periods will correspond to one interval if that one interval is the union of a consecutive sequence of intervals corresponding to the periods.

Once the terminology and notation related to “time” have been fixed, a “time usage modeling task” will find itself in a position to be precise and unambiguous, e.g., concerning notions like “hours” in “hours of operation” or similar.

More importantly, being unambiguous about informal notions like “hours” will immediately invite being equally precise and unambiguous concerning the scope of terminology and notation related to “activity, event, and state”, where notions like “operation”, “failure”, and “maintenance” need to be well-defined, in particular in their attributions involving the conceptualization of time.

Given the more elaborated definition of time, we are in a good position to define data structures for data connected with time.

Let the set

R

of real numbers be the set of times, and let

R^{[]}

denote the set of closed intervals of real numbers; i.e.,

R^{[]}

as the set of time periods is a subset of the powerset

P R

of all subsets of real numbers. Further, let

R^{+}

be the set of positive real numbers including zero; i.e., elements in

R^{+}

can be understood as time durations or sums of time durations of consecutive or non-overlapping time periods.

Data in a set D of data values connected to time can now be indexed by elements in

R

,

R^{[]}

, and

R^{+}

. A data value v indexed as

v_{t}, t \in R

could then be understood as “value v (observed) at time t”. As a notation we may further understand

v_{t}

as the value of a function

v : R \to D

at time t, so that

v_{t} = v (t)

. In many application contexts we would often call such a function v a signal. An indexing

v_{ι}, ι = [t_{1}, t_{2}] \in R^{[]}

could be understood as “value v connected with or observed during the time interval

ι

”. Similarly, the indexing

v_{α}, α \in R^{+}

, could be understood as “value v connected with or observed as related to a time duration

α

”. In these situations we must distinguish between functions of the form

v : R \to D

,

v : R^{[]} \to D

, and

v : R^{+} \to D

; i.e., we must not confuse data connected, respectively, to times, time intervals, and time durations.

In the case of failures and concepts involving “time to” and “time between”, we usually define these concepts based on cumulative distribution functions

f : R \to [0, 1]

.

Failures are obviously recorded in connection to a specific (calendar) time

t \in R

, but when using this data in a “time to” context, the calendar time is converted to a time duration

t \in R^{+}

.

In practice, when analyzing a set of failures, these failures are assumed to have happened at virtually the same “zero time”, so that “time” in “time to” in fact is a time duration. This means that the cumulative distribution function for “failure times” t has the form

f : R^{+} \to [0, 1]

, where

t \in R^{+}

in

f (t) = P r ([0, t])

,

P r ([0, t])

being the probability that failure occurs “before time t”, is a time duration.

Accuracy related to time periods and durations is also important in billing regardless of billing models adopted within service contracts.

Being precise about the notion of time is unfortunately only of secondary importance, e.g., in standards like SysML, as we point out in Section 5.

5. The Time Usage Model (TUM) and State Machines

Expressions and formulas in Section 4 show how analytics in engineering often confuses time with time duration even in the same context. The confusion may be harmless in that particular analytical context, but when data is recorded and merged from different contexts into one expectedly coherent dataset, the analysis task must often start with paying attention to time representations with the same meaning across the dataset.

Datasets in industrial situations, and when involving calendar times, often show single data records in tables of records, where each record includes one and only one time and indeed the calender time registered for that particular data record. However, when we look at states and state transitions appearing while a subsystem is “up” or “in operation”, there are several intermediate times to be recorded, e.g., for state transitions, which will be used to calculate durations, e.g., for “in operation” or “is productive”. Hence data for one single failure may appear in several consecutive data records rather than as lumped together into one single data record for one single failure.

Figure 1 shows a time usage model [1] as a graphical representation of activities, statuses, and events in (surface) mining operation.

A corresponding UML/SysML state machine representation is shown in Figure 2, where the DOWN activity is ‘intentionally blank’ and unspecified simply for the reason that it is unspecified in GMG’s TUM. In [1] is stated that detailed classification of maintenance activity will be considered in future work by the GMG Asset Management Working Group. An important part of the DOWN activity is obviously maintenance preceded by diagnosis of failure.

As mentioned in Section 2, while a subsystem is being maintained, it is often required to handle a “maintenance in maintenance” (MiM). During a specific ongoing maintenance, new observations may emerge, followed by assessments that may lead to decisions about extending the ongoing maintenance with additional maintenance steps in order to handle the newly observed situation, which, if not maintained, will further increase the risk of future failures.

From a formal point of view, such a situation actually initiates a new thread within the state machine, viewed as a multi-threaded state machine. The thread is triggered within the maintenance state and initiates a new diagnosis “child state”, keeping the parent maintenance state within the down state. The child maintenance state as an MiM must begin and end within the parent maintenance state.

A parent maintenance activity can essentially be specified by a set of actions, where one of these actions can be specified to trigger the opening of a thread specifying a MiM, as indicated below, where a general view of a state definition of a maintenance task, adopting SysML syntax [20], is presented.

	state def Maintenance {
	    entry assign down.mnt.Var := …;
	    do action parentMaintenance : Maintenance {
	       action mntStep_1 … mntStep_i-1;
	       then action childDiagnosis, childMaintenance;
	       then action mntStep_i+1 … mntStep_n;
	   }
	    exit assign down.mnt.Var := …;
	}

In this example, referring to the brief maintenance scenario discussed in Section 2, maintenance step

i - 1

could be checking the rotation unit, where the childDiagnosis is triggered because of the observation of a degradation, in turn leading to the appropriate childMaintenance, after which the scheduled maintenance proceeds as originally planned. The SysML syntax is quite rich and enables the system specification to be developed broadly and in elaborate detail, supporting implementations within digital transformations in the mining industry, an industry known for easily purchasing independent digital solutions but often providing at most ad hoc solutions to their integration and intercommunication, especially with underlying company-wide information system platforms. Details in this respect are beyond the scope of this paper.

We should also note how the SysML syntax does not lend itself to encode and use time concepts as defined in Section 4. In SysML, the notion of ‘occurrence’ has an extent in time, which is seen as “covering the period in time from the occurrence’s creation to its destruction”. The SysML syntax does not clearly define whether “period of time” is a time interval or an aggregation of time periods in the sense described in Section 4, also in relation to the time value “hours of”. We may also note that what we call an instant time or a time instant is in SysML a time slice with zero duration and called a snapshot.

The SysML notion of ‘time slice’ corresponds to “some duration of time”, which may be understood as a SysML time slice being a period of time and an extent in time being a duration of a period of time. However, SysML indeed does not define its time concepts with rigor like we suggest in Section 4, which in turn means we cannot use SysML’s general syntax to specify time-related values other than, e.g., viewing such values as attributes.

6. Failure and Failure Mode

Nomenclatures for failure and recovery from failure by means of maintenance are poorly connected. We can see this, e.g., by comparing [2,21]. While [2] specifies generic terms and definitions for the technical, administrative, and managerial areas of maintenance, failure mode and effect analysis (FMEA) typically has its focus on the effect analysis and indeed is based on failure mode; however, when ‘failure’ in ‘failure mode’ is not all that well-defined, it is usually referred to as being context-dependent.

The purpose of the European Standard on maintenance [2] is to define the generic terms used for all types of maintenance, and the ‘definitions’ are often very brief. For instance, “maintenance objectives” is simply defined as targets assigned and accepted for the maintenance activities, and “failure” is defined as loss of the ability of an item to perform a required function. Detail is mostly missing, like for “spare parts”, where the standard does not include recommendations for spare parts encoding, and “loss” appears as a binary qualification rather than assumed to involve levels of loss, potentially included in FMEA.

The FMEA Reference Manual published by the Automotive Industry Action Group (AIAG) has been published in four editions, the first one in 1993, the second in 1995 [22], the third in 2001, and the fourth in 2008 [23]. These four editions of FMEA were copyrighted by the three largest North American automotive OEMs that founded the AIAG in 1982. In 2019, the AIAG and the German Verband der Automobilindustrie (VDA) published their first edition of the FMEA Handbook [21], which combines the AIAG 4th Edition FMEA Manual [23] and the four sections of the VDA Volume 4 Manual [24].

FMEA’s struggle with definitions of “fault” and “failure” shows throughout the history and development of FMEA. The difficulty and complexity in defining “failure” show already in the 1980 revision MIL-STD-1629A, which, about “failure definition”, states the following: The contractor shall develop general statements of what constitutes a failure of the item in terms of performance parameters and allowable limits for each specified output. The contractor’s general statements shall not conflict with any failure definitions specified by the procuring activity. MIL-STD-1629A is also on a quite general level when briefly defining failure “cause”, “effect”, and “mode”.

While development of FMEA was ongoing, e.g., during NASA’s Apollo Program, “failure” became connected to “function”, as [25] presents conditions of “component” or “functional” failure.

In [21], failure becomes more clearly connected with “function”, where “failure of function” is seen as derived from “function descriptions”, so that “function”and “malfunction” become antithetical. The types of these “functions” or “malfunctions” are then seen as failure “modes”, where typical modes include “loss” and “degradation” of function. Already at this point we see how the mathematical models are different. When we say “loss” we are closer to describing a state using a logical statement involving many-valued grades of “loss”, whereas when we say “degradation” we observe “loss” over time, which often calls for stochastic modeling of “degradation”.

This in a nutshell is the struggle concerning definitions of “failure” as related to “function”, where failure is often seen as total loss of function.

The distinction is reasonably well understood in health care involving interventions (treatment), but less so in engineering involving maintenance. In health, an intervention may be seen as often targeting mitigation of the “failure”, but, in fact, many interventions aim at restoring “function”, partially or completely lost due to the disease, and/or controlling the risk of further adverse events caused by the disease for which intervention is added. Maintenance in engineering is similar, or could be understood similarly, without drawing too many parallels between “machine” and “human”.

Qualification and quantification within FMEA’s set of ranks for failure mode has both algebraic and logic flavor. A well-founded definition of “failure” is actually never provided, even if the underlying assumption seems to be that related concepts like “component” and “function” are classified, in one way or another, in some classification systems developed within the engineering domain of investigating failure modes.

FMEA’s ranking tables for severity, occurrence, and detection enable the development of ranked lists of potential failure modes.

In [23], ranking is said to be relative, and developing a ranked list establishes a priority system, e.g., for design improvements. The ranks are typically seen as numbers. However, viewed as elements in a ranked list, the ranks are symbols, essentially since the set of ranks is a totally ordered set of ranks.

From a logical point of view, this order of the ranks is the reverse order of what we intuitively expect to have in a ranked set of truth values, where the lowest truth value corresponds to (absolute) “false” and the highest truth value to (absolute) “true”.

The ranks for severity, occurrence, and detection indeed do not have identical semantic explanations, which essentially means we do not have one set

R

of ranks but in fact three,

R_{S}, R_{O}, R_{D}

, respectively, for severity, occurrence, and detection. However, when we compute with these rank values using arithmetic multiplication, providing FMEA’s Risk Priority Number (

R P N

) is a straightforward arithmetic multiplication of the ranking values for severity (S), occurrence (O), and detection (D). It is often informally written as

R P N = S * O * D

to formally mean the arithmetic multiplication

R P N = r_{S} * r_{O} * r_{D}

, where

r_{S}, r_{O}, r_{D}

are rank elements, respectively, in

R_{S}, R_{O}, R_{D}

and where all three sets are seen as equal to a “general-purpose” set

R = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

of ranks, now explicitly understood as numbers.

If we were to reverse the order within the FMEA set of ranks, i.e., having “10” logically as “excellent” or “no problem”, or similar, and “1” as “very critical” or “total problem, or similar, then the operator ∗ could be viewed algebraically, with

R

possessing a suitable algebraic structure. Explaining these situations in detail requires explanation of the use of algebraic structures as applied in many-valued logic, an explanation of which is outside the scope of this paper. The algebraic structures typically used are explained in [26,27,28].

7. Conclusions and Future Work

To improve analytics in reliability engineering based on availability modeling, fault diagnostics using AI and IoT, simulation for predictive and prescriptive maintenance, and decision-making under uncertainty for maintenance and risk mitigation, it is shown to be important to not only base the models on more extensive standards, such as the GMG time usage model, but also to select nomenclature and terminology (and additional standards as necessary) to facilitate precise notation and the use of symbolic language to support mathematical modeling. Development of the included UML/SysML state machine view of GMG’s TUM for surface mining proved that metrics such as MTBF, MTTR, availability, failure rate, etc., commonly used in engineering need to be further developed in terms of "what”. That is, the more detailed the state machine developed, the more actually relevant analytics can be implemented, but this also requires increasing the amount of data. The related issues of “data model”, data architecture”, and “data framework” need to be further explored and developed so that the data structures developed are stringent as well as contextually relevant for the use case.

Furthermore, we have shown that creating “intelligent” decision support systems (digital twins, intelligent computing algorithms, or similar) require well-structured and nested KPIs and metrics to connect operations to decision support in terms of functionality and finance.

As a limitation, the proposed rigorous data structure and indeed the suggestion to use and comply with different standards simultaneously may potentially increase the computational burden of the approach. In addition, the mappings between different standards are not easily achieved.

As this paper provides prerequisites for enriching the notions for metrics used in reliability engineering, in future papers we will provide such enrichment, which will further make use of stochastic processes based on our time model. In general, we further suggest complying with information structures as provided in standards and adding precisions to standards whenever required to enable more elaborate calculations involving such metrics.

Author Contributions

Conceptualization, M.L. and P.E.; methodology, M.L. and P.E.; validation, M.L. and P.E.; formal analysis, M.L. and P.E.; investigation, M.L.; resources, M.L.; data curation, M.L. and P.E.; writing—original draft preparation, M.L. and P.E.; writing—review and editing, M.L. and P.E.; visualization, M.L. and P.E.; supervision, M.L. and P.E.; project administration, M.L.; funding acquisition, M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

We are grateful to the anonymous reviewers, whose comments have helped us significantly improve the content of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Reliability Engineering Metric

For reliability engineering in general, metrics and KPIs (key performance indicators) are used to measure technical performance and monitor the financial effectiveness of mining operations, e.g., for the purpose of optimizing the production and business objectives of mining companies and their equipment suppliers. Analytics of data collected and arranged according to information structures based on variables from these metrics and KPIs is then used, e.g., for improving production processes and decision-making with respect to company business objectives.

In reliability engineering, specifically in mining, metrics typically include bit/rod and rock variables, whereas KPIs include efficient use of equipment, OPEX and CAPEX optimization, safety in production, minimizing negative environmental impact, and ensuring sustainability in long-term natural resource management.

Reliability engineering in mining focuses on ensuring that mining equipment and operations function effectively and efficiently over time. Various metrics are employed to assess reliability, improve maintenance strategies, and enhance overall performance. In the following sub-sections we briefly explain the common reliability metrics used in the mining industry.

Appendix A.1. Mean Time Between Failures (MTBF)

MTBF is a key metric used to measure the average time between equipment failures during operational periods. It is calculated by dividing the total operational time by the number of failures. Higher MTBF values indicate better reliability and performance of mining equipment.

Appendix A.2. Mean Time to Repair (MTTR)

MTTR measures the average time required to repair a failed piece of equipment and return it to operational status. A shorter MTTR is desirable, as it indicates swift response and effective repair processes, minimizing downtime.

Appendix A.3. Availability

Availability calculates the proportion of time equipment is available for use compared to total time. It is often expressed as a percentage. High availability indicates that equipment is operational and ready for use most of the time.

Appendix A.4. Overall Equipment Effectiveness (OEE)

OEE encompasses three factors: availability, performance efficiency, and quality rate. It provides a holistic view of how effectively equipment is utilized in mining operations. This metric helps identify areas for improvement and boosts productivity.

Appendix A.5. Failure Rate

This failure rate metric represents the frequency with which failures occur over a specified period. It is typically expressed in failures per unit of time (e.g., failures per hour). A lower failure rate indicates better reliability and reduced frequency of issues.

Appendix A.6. Reliability Function

The reliability function quantifies the probability that a system or component will perform its intended function without failure for a specified period. Reliability functions can be derived from different probability distributions, such as exponential or Weibull distributions.

Appendix A.7. Preventive Maintenance Metrics

Metrics such as schedule compliance and maintenance history are for tracking the effectiveness of preventive maintenance programs. These metrics help ensure that maintenance activities are conducted as planned, thus enhancing reliability.

Appendix A.8. Maintenance Costs

Tracking costs associated with maintenance, including labor, parts, and downtime, are essential for understanding the economic aspect of reliability. High maintenance costs can indicate frequent failures or inefficiencies in maintenance practices.

Appendix A.9. Work Order Completion Rate

This metric evaluates the percentage of completed maintenance work orders against those planned. A high completion rate indicates effective work planning and execution, contributing to improved reliability outcomes.

Appendix B. Metrics and KPIs

The mining industry as well as many others employs various metrics to assess and enhance the reliability of equipment and processes. Metrics such as MTBF, MTTR, availability, OEE, and failure rates provide insights into operational performance. Additionally, methodologies like RCA and the analysis of maintenance costs contribute to developing comprehensive reliability management strategies. By leveraging these metrics, mining operations can optimize equipment performance, reduce downtime, and achieve more sustainable production. The challenge is to use these metrics and others in new and innovative ways to evaluate, for example, the production in a mine using financial criteria such as Total Cost of Ownership (TCO) per year. TCO might in mining break down into, for example,

Components and maintenance costs;
Operator costs;
Depreciation costs;
Fuel costs;
Drilling tool costs;
Interest costs.

However, following up cost breakdown is only the first step and does not by itself allow much intelligent computing to be carried out. Doing so requires a stringent data structure, intertwined process and information models, and well-developed definitions of for example, “data model”, data architecture”, and “data framework”. This is because rough structures and unclear definitions can only produce overarching breakdowns such as the (albeit important) cost breakdowns mentioned above. However, to connect technology and its maintenance to finance and decision support, using data analytics and AI, structures and definitions that support the following are needed:

Data management;
Transforming and combining data for various users and use cases;
Analysis (frequency, correlation, conditionalities, rules, and reinforcement);
Demonstrator decision support (mapping fault, function, and fixes).

Again, creating actually “intelligent” decision support systems (digital twins, intelligent computing algorithms, analytics, or related concepts) requires well-structured and nested KPIs and metrics, preferably based on standards, to function beyond the basic pie charts that do not support decision making to the degree of detail required for functional and financial decisions to be made with a sufficient degree of detail and certainty.

The concept of a “data model” is vague and often misleading. It sometimes refers to a specific data structure, sometimes to a more global data framework.

A data structure usually refers to typed data appearing in some relational structure, i.e., a structure suitable for implementation as a database. It is important to distinguish between the data structure and the data residing within the data structure, i.e., a data structure per se does not contain data but may have been populated with data to become a data structure with data. An empty (non-populated) database has structure but no data. Data with no structure is difficult to analyze.

A data architecture is an architecture of data structures. It is not a structure of structures, but rather a conceptually and contextually organized conglomerate of structures, where mappings between structures are designed to possess structure-preserving properties. Ownership of data structures is thus often more desirable than just owning the data.

Whereas we may have a data structure without data, it makes no sense to have a data architecture without data structures. However, we may have a data architecture of data structures without any data in respective data structures. A data architecture is often an asset.

A data framework is a framework of data architecture. It is not an architecture of architectures, but rather an application domain-oriented categorization of architectures, where transformations between architectures are established to reveal inter-architectural causation within the general framework supported data analytics and decision-support development. A data framework cannot be envisioned without any detail about its underlying data architectures, and also, a class of architectures will not per se embrace any canonic categorization for a data framework. A simple and general example is shown in Figure A1.

Figure A1. Examples of KPIs.

KPIs obtained during an operating cycle, like penetration rate, would typically be seen as falling under bit/rod variables and are measurements derived from sensors and instrumentation within the system. Such data could also be called “raw data”. KPIs representing aggregated data based on operation cycle data are typically statistical summaries connected with calendar time periods; KPIs appearing in financial reporting are higher-level variables constructed by combining technical data from operation with expected performances concerning productivity and profit and will involve data and parametrization also outside the scope of data from operations. The list of KPI variables in Figure A1 could be expanded, e.g., by making use of the ’Basic Time Elements’ listed in Table 2 in [1].

References

Global Mining Guidelines Group (GMG). A Standardized Time Classification Framework for Mobile Equipment in Surface Mining; Global Mining Guidelines Group (GMG): Châteauguay, QC, Canada, 2020. [Google Scholar]
EN 13306:2017; Maintenance—Maintenance Terminology. European Committee for Standardization (CEN): Brussels, Belgium, 2017.
Unified Modeling Language (UML), Version 2.5.1; Object Management Group (OMG): Boston, MA, USA, 2017.
Harel, D. Statecharts: A visual formalism for complex systems. Sci. Comput. Program. 1987, 8, 231–274. [Google Scholar] [CrossRef]
Rausand, M.; Barros, A.; Høyland, A. System Reliability Theory: Models, Statistical Methods, and Applications, 3rd ed.; Wiley Series in Probability and Statistics; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2021. [Google Scholar]
Löfstrand, M.; Kyösti, P.; Reed, S.; Backe, B. Evaluating availability of functional products through simulation. Simul. Model. Pract. Theory 2014, 47, 196–209. [Google Scholar] [CrossRef]
Nytomt, F. Service Reliability and Maintainabilit. Licentiate Thesis, Luleå University of Technology, Luleå, Sweden, 2004. [Google Scholar]
Kuo, W.; Zuo, M. Optimal Reliability Modeling: Principles and Applications; Wiley: Hoboken, NJ, USA, 2003. [Google Scholar]
Law, A.M. Simulation Modeling and Analysis, 5th ed.; McGraw-Hill Higher Education: Columbus, OH, USA, 2015. [Google Scholar]
Moubray, J. Reliability-Centered Maintenance; G—Reference, Information and Interdisciplinary Subjects Series; Industrial Press: South Norwalk, CT, USA, 2001. [Google Scholar]
Phadke, M. Quality Engineering Using Robust Design, Prentice-Hall International ed.; Prentice Hall: Upper Saddle River, NJ, USA, 1989. [Google Scholar]
Villegas, L.F.; Macchi, M.; Polenghi, A. Digital twins in manufacturing: A unified conceptual framework. Annu. Rev. Control 2025, 60, 101031. [Google Scholar] [CrossRef]
Tariq, M.U.; Poulin, M.; Abonamah, A.A. Achieving Operational Excellence Through Artificial Intelligence: Driving Forces and Barriers. Front. Psychol. 2021, 12, 686624. [Google Scholar] [CrossRef] [PubMed]
Wang, C. Grand Challenges in Environmental Systems Engineering. Front. Environ. Sci. 2021, 9, 809627. [Google Scholar] [CrossRef]
Odeyar, P.; Apel, D.B.; Hall, R.; Zon, B.; Skrzypkowski, K. A Review of Reliability and Fault Analysis Methods for Heavy Equipment and Their Components Used in Mining. Energies 2022, 15, 6263. [Google Scholar] [CrossRef]
Dhillon, B. Mining Equipment Reliability, Maintainability, and Safety; Springer Series in Reliability Engineering; Springer: London, UK, 2008. [Google Scholar]
Menčík, J. Reliability of Systems. In Concise Reliability for Engineers; Mencik, J., Ed.; IntechOpen: London, UK, 2016; Chapter 5. [Google Scholar]
Orošnjak, M.; Saretzky, F.; Kedziora, S. Prescriptive Maintenance: A Systematic Literature Review and Exploratory Meta-Synthesis. Appl. Sci. 2025, 15, 8507. [Google Scholar] [CrossRef]
Rojas, L.; Peña, Á.; Garcia, J. AI-Driven Predictive Maintenance in Mining: A Systematic Literature Review on Fault Detection, Digital Twins, and Intelligent Asset Management. Appl. Sci. 2025, 15, 3337. [Google Scholar] [CrossRef]
Systems Modeling Language (SysML), Version 2.0; Object Management Group (OMG): Boston, MA, USA, 2025.
FMEA Handbook, 1st ed.; AIAG: Southfield, MI, USA; VDA: Berlin, Germany, 2019.
Potential Failure Mode and Effect Analysis (FMEA), Reference Manual, 2nd ed.; Chrysler Corporation, Ford Motor Company, General Motors Corporation: Detroit, MI, USA, 1995.
Potential Failure Mode and Effect Analysis (FMEA), Reference Manual, 4th ed.; Chrysler LLC, Ford Motor Company, General Motors Corporation: Detroit, MI, USA, 2008.
Quality Assurance in the Process Landscape—Sections 1–4; VDA: Berlin, Germany, 2021.
Procedure for Failure Mode, Effects, and Criticality Analysis (FMECA); National Aeronautics and Space Administration (NASA): Washington, DC, USA, 1966.
Eklund, P.; García, J.G.; Höhle, U.; Kortelainen, J. Semigroups in Complete Lattices: Quantales, Modules and Related Topics; Developments in Mathematics; Springer International Publishing: Cham, Switzerland, 2018. [Google Scholar]
Eklund, P.; Galán, M.A.; Helgesson, R.; Kortelainen, J. Fuzzy terms. Fuzzy Sets Syst. 2014, 256, 211–235. [Google Scholar] [CrossRef]
Eklund, P.; Kortelainen, J.; Löfstrand, M. Quantales for Fuzzy Sets and Relations of Higher Types. Mathematics 2025, 13, 2159. [Google Scholar] [CrossRef]

Figure 1. The time usage model presented and explained in [1].

Figure 2. UML/SysML state machine view of GMG’s TUM for surface mining.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Löfstrand, M.; Eklund, P. Systemic Predictive and Prescriptive Maintenance. Appl. Sci. 2026, 16, 1088. https://doi.org/10.3390/app16021088

AMA Style

Löfstrand M, Eklund P. Systemic Predictive and Prescriptive Maintenance. Applied Sciences. 2026; 16(2):1088. https://doi.org/10.3390/app16021088

Chicago/Turabian Style

Löfstrand, Magnus, and Patrik Eklund. 2026. "Systemic Predictive and Prescriptive Maintenance" Applied Sciences 16, no. 2: 1088. https://doi.org/10.3390/app16021088

APA Style

Löfstrand, M., & Eklund, P. (2026). Systemic Predictive and Prescriptive Maintenance. Applied Sciences, 16(2), 1088. https://doi.org/10.3390/app16021088

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Systemic Predictive and Prescriptive Maintenance

Abstract

1. Introduction

2. The Ore Mining Use Case

3. Availability Modeling

4. Definition of Time and DATA Connected with Time

5. The Time Usage Model (TUM) and State Machines

6. Failure and Failure Mode

7. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Reliability Engineering Metric

Appendix A.1. Mean Time Between Failures (MTBF)

Appendix A.2. Mean Time to Repair (MTTR)

Appendix A.3. Availability

Appendix A.4. Overall Equipment Effectiveness (OEE)

Appendix A.5. Failure Rate

Appendix A.6. Reliability Function

Appendix A.7. Preventive Maintenance Metrics

Appendix A.8. Maintenance Costs

Appendix A.9. Work Order Completion Rate

Appendix B. Metrics and KPIs

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI