1. Introduction
The rapid growth of Internet of Things (IoT) devices in industrial and infrastructure applications has made system reliability and predictive maintenance (PdM) a critical research area. While symmetric distributions and correlation structures are attractive due to their mathematical simplicity and analytical convenience, real-world IoT reliability data often exhibit pronounced asymmetry, skewness, and heterogeneous dependence patterns that violate such assumptions. Modern IoT deployments generate large volumes of heterogeneous operational data while operating under harsh environmental, communication, and software constraints. At the same time, these devices are exposed to harsh environments, communication faults, and software errors, leading to non-negligible failure rates and complex degradation patterns [
1,
2]. Classical reliability techniques, originally developed for relatively small and homogeneous systems, are often inadequate for such large-scale, data-rich IoT settings.
In practice, PdM for IoT systems is still dominated by a combination of rule-based maintenance, conventional lifetime modelling, and machine-learning pipelines. Parametric reliability models—such as exponential, Weibull, or Gamma regression—continue to be widely used to describe skewed time-to-failure distributions and to derive maintenance policies [
3,
4,
5]. More recently, supervised learning and deep neural networks have become popular for failure prediction and remaining useful life estimation, often within Industry 4.0 and IoT-enabled monitoring architectures [
1,
2,
6]. These approaches can exploit high-dimensional sensor data but typically provide only point predictions or ad hoc confidence scores, making probabilistic calibration and formal uncertainty quantification difficult, especially in the presence of strongly skewed, heavy-tailed measurements and rare catastrophic failures. As a consequence, existing approaches often struggle to identify latent production-level heterogeneity and to coherently link discrete failure events with continuous lifetime information.
From a statistical modelling perspective, symmetric distributions and correlation structures are widely used in reliability analysis due to their analytical simplicity, interpretability, and computational convenience. Gaussian assumptions and symmetric error models therefore remain common default choices in engineering applications. However, empirical IoT reliability data rarely conform to symmetry: reset counts and failure times typically exhibit strong right-skewness, heavy tails, and symmetric dependence driven by heterogeneous production batches. The present work explicitly addresses this mismatch by moving beyond symmetric assumptions and modelling asymmetric reliability patterns within a unified Bayesian framework based on skewed likelihoods and shared latent effects.
Bayesian methods offer a principled alternative by combining prior engineering knowledge with heterogeneous data in a unified probabilistic framework. In reliability engineering, hierarchical Bayesian models have been used to integrate expert judgement, to pool information across related components and to update maintenance decisions dynamically as new data arrive [
7,
8,
9]. Such models are particularly attractive for IoT settings, where data are naturally organised in groups or “batches” (devices, firmware versions, production lots) and where both discrete event counts and continuous degradation signals are observed. However, full Bayesian inference for realistic hierarchical models with non-Gaussian, skewed likelihoods is computationally demanding: Markov chain Monte Carlo (MCMC) methods can require long chains, careful tuning, and elaborate convergence diagnostics, which complicates deployment in industrial environments.
The Integrated Nested Laplace Approximation (INLA) provides an efficient deterministic alternative to MCMC for a broad class of
Latent Gaussian Models (LGMs) [
10,
11,
12]. In the LGM framework, the high-dimensional latent field is modelled as a Gaussian Markov random field (GMRF), while the observations are linked through potentially skewed exponential-family likelihoods (e.g., Poisson, Negative Binomial, Gamma, Inverse Gaussian). INLA exploits the sparsity of the GMRF precision matrix together with nested Laplace approximations to deliver accurate posterior marginals and posterior predictive distributions at a fraction of the computational cost of simulation-based methods. Although INLA has been successfully applied in spatial statistics, disease mapping, and survival analysis, its application to IoT reliability and predictive maintenance—where heterogeneous count and lifetime data are jointly observed—remains relatively limited compared to data-driven approaches. Recent IoT reliability studies primarily focus on deep learning or hybrid architectures [
13,
14], often without explicit probabilistic calibration or modelling of production-level heterogeneity.
In this work, we focus on the joint modelling of two canonical IoT reliability signals: discrete reset counts of embedded devices and continuous failure times. Both are inherently right-skewed and heterogeneous across production batches; devices from a faulty batch exhibit many resets and short lifetimes, while devices from healthy batches reset infrequently and operate much longer. Following the general idea of combining Poisson and Gamma components in a joint reliability model, as previously explored in an IoT case study by [
15], we formulate a family of INLA-based LGMs that explicitly target such skewed behaviour. The proposed framework uses a Poisson or Negative Binomial likelihood for reset counts and a Gamma likelihood for failure times, coupled through shared latent batch effects. This construction allows us to represent the underlying degradation process in a transparent way, to quantify uncertainty via posterior predictive distributions and credible intervals, and to diagnose model fit through distribution-level posterior predictive checks.
In contrast to existing predictive maintenance approaches, this paper makes three key contributions. First, it jointly models discrete reset counts and continuous failure times within a unified Bayesian framework using skewed exponential-family likelihoods. Second, it introduces shared latent batch effects to explicitly capture production-level heterogeneity and latent quality differences across IoT device populations. Third, it demonstrates that INLA enables efficient, stable, and fully Bayesian inference for such joint non-Gaussian models, providing a practical alternative to computationally intensive MCMC-based solutions.
The aim of this work is to investigate how well INLA-based LGMs can capture heterogeneous, right-skewed reliability signals across IoT device populations. Building on the methodological foundations above, we make the following contributions:
We construct a controlled synthetic dataset that reflects common patterns observed in industrial IoT networks: skewed reset counts, heavy-tailed lifetimes, and production-level heterogeneity.
We formulate three Bayesian reliability models (A–C) within the INLA framework: independent models for resets and lifetimes, a shared-parameter model, and a fully joint Poisson–Gamma model with batch-specific latent effects.
We evaluate each model using posterior predictive checks, population-level calibration, and per-batch inference to identify the roles of latent degradation effects.
We demonstrate that Model C—featuring shared latent batch effects—provides substantially improved predictive fit and captures degradation patterns that simpler models cannot represent.
We show how INLA enables complete Bayesian inference on the full dataset with negligible computational cost while maintaining numerical stability and accurate uncertainty quantification.
The results show that the hierarchical INLA model provides a well-calibrated, interpretable, and computationally efficient approach to Bayesian predictive maintenance for IoT systems with skewed distributions. Beyond the specific Poisson–Gamma setting considered here, the proposed framework illustrates how INLA can be used more broadly to model skewed reliability data in industrial applications, providing a bridge between modern IoT sensing infrastructure and rigorous Bayesian uncertainty quantification.
2. Materials and Methods
2.1. Bayesian Inference
Bayesian inference provides a coherent probabilistic framework for learning from data by combining prior beliefs about unknown quantities with the information contained in the likelihood. Let
denote model parameters and
y observed data. Bayes’ theorem states that
where
is the prior distribution and
is the likelihood. The posterior distribution
summarises all information about
after observing
y [
16,
17].
Prior distributions encode domain knowledge or regularisation. Weakly informative priors help stabilise inference in hierarchical and skewed-data settings, reduce overfitting, and avoid unrealistic parameter values [
18,
19]. Priors may be proper or improper, but posterior propriety must hold [
20].
Posterior inference relies on computing expectations, credible intervals, or functionals such as
For most hierarchical or non-linear models these integrals have no closed form. The normalising constant
is also intractable, motivating computational methods.
Posterior predictive distributions quantify uncertainty for new observations
:
and play a central role in model checking and predictive maintenance, where forecasting future failures is essential [
17,
21,
22].
MCMC methods approximate the posterior by generating samples from
[
23]. Although asymptotically exact, MCMC may be computationally expensive, sensitive to autocorrelation, and slow to converge in high-dimensional or highly skewed models. This motivates deterministic alternatives such as the INLA, which directly approximates the posterior marginals using analytic approximations rather than sampling [
10].
Bayesian inference therefore provides a principled foundation for uncertainty quantification, while INLA represents a specialised computational strategy that enables fast and accurate posterior evaluation in complex reliability and predictive-maintenance models.
2.2. Integrated Nested Laplace Approximation (INLA)
In this study, the central computational tool is the INLA, a methodology proposed by [
10] as a fast and accurate alternative to MCMC for Bayesian inference in hierarchical models. INLA is designed for a broad class of models known as LGMs, which combine a structured latent field with likelihoods from the exponential family. This construction makes INLA especially suitable for modelling skewed phenomena such as failure times, degradation measures, or event-count processes, which naturally arise in reliability engineering and predictive maintenance. In this work we have used 25.06.13 version of
R-INLA.
In the INLA framework, the observed data
follow an exponential family likelihood,
with
denoting the linear predictor and
representing hyperparameters (e.g., dispersion, shape, precision). The linear predictor is embedded in a latent Gaussian field
where
is a sparse precision matrix. Sparsity, which encodes conditional independence between components of
x, plays a key role in the computational efficiency of INLA [
24].
The joint posterior distribution can be written as
but its exact evaluation is infeasible in most hierarchical and non-linear models. Instead of sampling from this posterior (as in MCMC), INLA constructs accurate deterministic approximations of two sets of marginal posteriors:
The computational strategy relies on three nested steps:
Gaussian approximation of the latent field. For fixed , the conditional posterior is approximated by a multivariate Gaussian, centred at its mode , with precision given by the negative Hessian of .
Laplace approximation of the hyperparameter posterior. The marginal posterior of
is approximated as
Because most LGMs contain only a few hyperparameters, numerical integration over is tractable.
Posterior marginals of latent components. Each latent marginal is computed by integrating over the approximated
:
This “integrated and nested” sequence of Laplace approximations leads to precise and fully deterministic posterior summaries. A key computational advantage emerges from the Gaussian Markov random field (GMRF) representation of
x: the precision matrix
is sparse, allowing efficient Cholesky factorisation and leading to complexity on the order of
instead of the
scaling required for dense covariance structures [
25].
In reliability and predictive-maintenance applications, INLA is particularly appealing because it naturally accommodates skewed likelihoods such as Poisson or Negative Binomial (for event counts) and Gamma or Inverse Gaussian (for time-to-failure or degradation signals). These distributions integrate seamlessly with the LGM structure, and their hyperparameters (e.g., dispersion, shape) are estimated jointly with latent effects. Moreover, INLA supports multi-likelihood systems through the copy mechanism, which enables different responses to share the same latent degradation process. This is crucial for joint modelling of resets and failure times, where both responses describe different facets of the same physical degradation mechanism.
Finally, the R-INLA implementation automates the full computational procedure: sparse-matrix construction, Laplace approximations, numerical integration, and derivation of posterior predictive distributions. Since the method avoids stochastic simulation, results are fast, reproducible, and stable, making INLA a powerful alternative to MCMC in industrial predictive-maintenance settings.
In the proposed models, the number of hyperparameters remains small. Specifically, Model A estimates two intercepts and the Gamma shape parameter, Model B additionally includes a dispersion parameter for the Negative Binomial likelihood, and Model C estimates a single precision parameter for the batch-level latent effects. This low-dimensional hyperparameter space, combined with sparse GMRF structure, explains the computational efficiency of INLA in this setting.
2.3. Skewed Distributions in Reliability and IoT Data
Data arising in reliability engineering, predictive maintenance, and large-scale IoT systems rarely follow symmetric or Gaussian patterns. Already classical monographs on reliability analysis emphasise that most failure-time data are best modelled using distributions for positive random variables such as the exponential, Weibull, gamma, or lognormal, whereas the normal distribution is seldom appropriate for product lifetimes [
3]. Empirical examples, such as ball bearing fatigue tests or integrated circuit failure times, typically show strongly right-skewed histograms due to the lower bound at zero and the presence of a long upper tail [
3].
Failure-related processes thus frequently produce observations that are right-skewed, heavy-tailed, and heterogeneous across components and batches. Such skewness emerges both from the underlying physics of degradation (e.g., wear, accumulation of damage, first-passage phenomena) and from the discrete, non-negative nature of many event-count outcomes.
In IoT applications, two forms of skewed behaviour commonly occur:
Event counts (e.g., resets, fault occurrences) are characterised by a high concentration of zeros and small integers, with occasional bursts of numerous failures. This leads to asymmetric, often overdispersed count distributions, sometimes with excessive zeros. The modern count-data literature explicitly discusses highly skewed and overdispersed distributions, for example, through discrete Weibull regression for inhomogeneous and highly skewed data with excess zeros [
26], Poisson–Tweedie models for ultra-overdispersed count data with excessive zeros [
27] or Bayesian discrete Weibull models for highly skewed counts [
28].
Time-to-failure or degradation measures display long right tails, where many components survive for long periods, while a minority fail very early. Standard reliability texts stress that lifetime data are typically modelled by positive, right-skewed distributions such as Weibull, gamma, lognormal, or related families [
3]. Practical guidance documents describe, for instance, the gamma distribution as a natural model for positive data that are skewed to the right and commonly used in survival and reliability studies [
29], while the Weibull distribution is widely used for time-to-failure data and for modelling skewed process data in capability analysis [
29]. For degradation-based modelling, the inverse Gaussian process has been advocated for as an effective first-passage-time model with right-skewed lifetime behaviour [
5].
Skewness has important methodological implications. Models assuming symmetry (e.g., classical Gaussian regression or linear mixed models) may underperform when applied to highly skewed reliability or count data, leading to biased or imprecise estimates, especially in small samples or in the presence of extreme skew in predictors [
30]. Moreover, using normal approximations for strongly overdispersed count data—such as negative binomial data with heavy right tails—can yield inadequate confidence intervals and poor tail inference [
31]. In practice, this motivates the use of models whose likelihood explicitly reflects asymmetry, non-negativity, and mean–variance relationships.
IoT and predictive-maintenance case studies additionally highlight that sensor features and derived indicators are often non-Gaussian and skewed; industry guidelines explicitly note that many machine-learning algorithms implicitly assume Gaussian inputs and can be sensitive to skewed, non-Gaussian data distributions [
32]. This is particularly relevant for PdM workflows that rely on sparse failure labels and small samples, where skewness is exacerbated.
In this context, skewed data are most naturally handled through likelihoods from the exponential family—Poisson, negative binomial, gamma, inverse Gaussian, and related distributions—which encode asymmetry and non-negativity at the likelihood level rather than through ad hoc transformations.
The author of [
15] implicitly exploits this structure by modelling IoT reset counts and failure times using Poisson and gamma likelihoods, both inherently right-skewed. Although the original work does not explicitly discuss “skewness,” the choice of likelihoods directly reflects the empirical distributional characteristics of the IoT processes under study and aligns with standard reliability practice [
3,
29].
In this study, skewness plays a central role: we aim to demonstrate that INLA provides an efficient and accurate Bayesian framework for modelling highly skewed and heterogeneous failure-related signals through latent Gaussian structures and joint models while preserving a physically interpretable description of event counts and lifetimes.
2.4. Exponential Family Models for Skewed Data
Skewed data encountered in reliability and IoT applications are well modelled by likelihood functions belonging to the exponential family. The theory of generalised linear models (GLMs) explicitly builds on the exponential family and provides a unified framework for handling normal, binomial, Poisson, gamma, and related likelihoods with appropriate link functions [
33,
34,
35]. In this framework, the distributional form directly controls non-negativity, skewness, and mean–variance relationships.
The general form of an exponential-family likelihood can be written as
where
is the canonical parameter,
a dispersion or scale parameter, and
are known functions. The link function relates the linear predictor to
and hence determines how the mean and variance depend on covariates.
Several exponential-family models are particularly suitable for skewed reliability data:
Poisson distribution models right-skewed count data with variance proportional to the mean, commonly used for event arrivals or reset counts. For overdispersed or ultra-overdispersed situations, extensions such as Poisson–Tweedie models have been proposed specifically to handle highly skewed count data with excessive zeros [
27].
Negative binomial distribution generalises the Poisson by allowing overdispersion, thereby capturing heavy right tails and bursty failure processes; the resulting distributions can be highly skewed, and standard normal approximations may fail to provide adequate inferences for the mean in such settings [
31].
Gamma distribution is strictly positive and right-skewed, making it a canonical model for time-to-failure, degradation rates, or continuous lifetime measurements. Applied guidance emphasises its use for positive data values that are skewed to the right and commonly arise in reliability and survival studies [
3,
29].
Inverse Gaussian distribution captures even heavier right tails and arises naturally as a first-passage-time distribution in degradation processes; inverse Gaussian processes have been advocated for as flexible stochastic models for degradation and remaining useful life in reliability analysis [
5].
These models are particularly attractive in the Bayesian setting because they admit tractable likelihoods and combine well with hierarchical structures. Their skewness properties arise directly from their parametric form: Poisson and negative binomial display integer-valued right skew, while gamma and inverse Gaussian have continuous heavy right tails. From the perspective of predictive maintenance, such distributions describe both frequent small events and rare extreme failures in a physically interpretable way.
A key observation, supported by both theoretical work and applied practice, is that using exponential-family likelihoods embeds skewness directly into the modelling framework rather than treating it as a residual error term. For example, GLM theory shows how Poisson and gamma models replace the Gaussian error structure with exponential-family likelihoods while retaining a linear predictor and link function [
33,
34,
35]. This avoids the need for purely parametric skew distributions (e.g., skew-normal or skew-
t families) in many reliability applications while maintaining a clear interpretation in terms of failure processes.
Moreover, exponential-family likelihoods integrate seamlessly with the LGM framework introduced in
Section 2.3. The conditional Gaussianity of the latent layer, together with skewed exponential-family likelihoods, creates a powerful and flexible structure for joint modelling of heterogeneous IoT signals. In the context of the present work, this includes joint Poisson–Gamma models for reset counts and failure times as in [
15] but extended to a full Bayesian LGM fitted via INLA.
These likelihoods are particularly effective in predictive maintenance, where both event counts and degradation measures follow highly asymmetric, non-Gaussian patterns. Industrial reports on PdM adoption explicitly warn that many input features are skewed and non-Gaussian, which can degrade the performance of algorithms that implicitly assume Gaussianity [
32]. By embedding skewness directly into the likelihood, exponential-family models ensure that predictive uncertainty, early failures, and long-tail behaviour are properly captured. This forms the distributional foundation upon which the INLA-based LGM framework builds in subsequent sections.
2.5. Predictive Assessment and Credible Intervals
An essential component of Bayesian modelling is evaluating how well the fitted model predicts data consistent with the observed sample. INLA enables this through direct access to posterior predictive distributions and deterministic approximations of predictive uncertainty [
17,
36].
Posterior predictive inference is based on the distribution
which INLA evaluates using samples drawn from the approximated joint posterior via
inla.posterior.sample(). These samples allow the computation of predictive summaries such as medians, credible intervals, and expected frequencies for any subset of the sample space.
Model adequacy is assessed by comparing replicated draws with the observed data. Typical checks include:
overlays of observed and predictive histograms,
comparison of observed statistics with their posterior predictive credible intervals,
graphical inspection of uncertainty bands for expected counts or failure-time densities.
If observed features lie within the corresponding predictive intervals, the model is considered adequately calibrated for those aspects.
Predictive credible intervals naturally reflect uncertainty in highly skewed or heterogeneous failure processes, making them particularly informative in joint count–duration models used in predictive maintenance. They reveal whether the fitted model captures not only central tendencies but also the tail behaviour and dispersion characteristic of reliability data.
2.6. Shared Latent Effects in Joint Models
Many reliability and predictive–maintenance systems produce multiple data streams that reflect different manifestations of the same underlying degradation mechanism. Discrete event counts, such as resets or minor failures, and continuous characteristics, such as failure times or degradation measurements, often arise from a single latent “health state” governing the behaviour of the component or batch [
17]. Joint models with shared latent effects provide a principled way to couple these heterogeneous responses by allowing them to depend on a common unobserved quantity.
In a general Bayesian formulation, two responses
and
are linked to a shared latent effect
through
so that information flows between both likelihoods through the posterior of
rather than directly through the observations. This structure is particularly appealing in reliability engineering, where observable signals often contain only partial information about the underlying degradation. Shared latent effects provide partial pooling, stabilise estimates across batches or components, and ensure coherent inference even when one data source is sparse or noisy [
20].
Within the INLA framework, shared latent effects arise naturally because all likelihood components are embedded in a single latent Gaussian hierarchy [
10]. In practical implementations, multiple responses are represented within one unified model using data-stacking strategies—commonly referred to as the “NA–trick”—whereby observations that are not relevant for a given likelihood component are explicitly set to missing values. This construction allows different likelihoods to depend on the same latent Gaussian process without duplicating parameters or latent structures. This allows information from one data modality to inform the latent structure in another, thereby enabling coherent joint inference across heterogeneous response types.
In the present predictive-maintenance application, discrete reset counts and continuous failure times share a batch-specific latent effect representing device quality. This effect increases expected failure counts while decreasing expected lifetime for defective batches, offering a unified probabilistic representation of the degradation mechanism.
Overall, joint models with shared latent effects constitute a flexible and interpretable framework for integrating heterogeneous reliability signals. Combined with INLA’s deterministic posterior approximations, they provide an efficient tool for modelling complex degradation structures and improving predictive performance in industrial maintenance applications.
3. Results
The goal of this study is to evaluate how INLA performs when modelling skewed and heterogeneous reliability data typical for large-scale IoT ecosystems. Following the motivation outlined in the Methods section, we focus on two naturally occurring diagnostic signals:
the number of reset events, representing soft failures accumulated during device operation;
the time-to-failure, a continuous and strongly right-skewed degradation indicator.
These signals are routinely collected without the need for additional instrumentation and therefore provide a realistic basis for diagnostic data fusion in predictive maintenance systems.
In practical IoT deployments, individual nodes rarely offer sufficiently rich diagnostics to support device-level health modelling. Instead, devices are manufactured in batches and operate under similar environmental constraints, which makes it appropriate to treat them as a statistical population. This leads naturally to hierarchical Bayesian modelling, where device- or batch-level effects can be used to represent latent differences in quality, degradation rate, or material properties. The present study follows this population-based perspective and aims to determine how well different INLA-based latent Gaussian structures capture these heterogeneous effects.
Building on the theoretical foundations established in the Materials and Models section, this case study demonstrates how Bayesian latent–Gaussian modelling can be applied to highly skewed reliability signals arising in IoT devices. The preceding methodological discussion introduced three core ingredients that are essential for constructing effective predictive-maintenance models:
the Bayesian paradigm and posterior predictive reasoning,
the INLA framework for fast and deterministic inference in LGMs,
the use of skewed exponential-family likelihoods to model non-negative and heavy-tailed reliability data,
and the use of shared latent effects to couple heterogeneous responses.
These components jointly form a coherent probabilistic toolbox for analysing IoT maintenance signals, where event counts (resets) and time-to-failure measurements often exhibit strong right-skewness, overdispersion, and batch-level heterogeneity. As discussed, such data cannot be meaningfully represented by Gaussian error models. Instead, reliability modelling requires likelihoods that enforce non-negativity and accommodate heavy-tailed behaviour—typically Poisson or Negative Binomial for counts and Gamma or Inverse Gaussian for failure times. Moreover, physical degradation processes frequently manifest simultaneously in discrete and continuous signals, motivating the shared-latent-effect construction.
To provide a controlled benchmark for analysing model performance, we construct a synthetic dataset reflecting common patterns observed in industrial IoT populations. Four production batches are simulated, each comprising 50 devices. Three batches represent nominal manufacturing conditions, producing devices with low reset intensities and long average lifetimes. One batch is intentionally specified as defective, characterised by substantially increased reset rates and drastically reduced failure times. This setting induces strong right-skewness in both diagnostic signals as well as substantial between-batch heterogeneity—features known to challenge classical Gaussian or non-hierarchical models.
The simulated data follow the generative structure described in
Section 2: reset counts are drawn from a Poisson (or Negative Binomial) model with log intensity influenced by a batch-specific latent effect, while failure times follow a Gamma distribution whose mean is inversely related to the same latent effect. This design ensures a coherent physical interpretation: devices with poor underlying “health” produce more resets and fail more rapidly.
3.1. Data Description and Experimental Setup
The dataset consists of two complementary reliability signals: reset counts
and failure times
, jointly observed for
devices. To emulate the forms of skewness and heterogeneity characteristic of IoT systems (as discussed in
Section 2.1 and
Section 2.3), the data were synthetically generated from parametric degradation mechanisms that enforce right-skewed behaviour and strong between-batch variability.
Four production batches were simulated, each representing a distinct latent “quality level”. Batches 1, 2, and 4 follow moderately reliable profiles, while batch 3 is intentionally defective, exhibiting both a very high frequency of resets and substantially reduced lifetimes. This design reflects scenarios commonly encountered in predictive maintenance applications, in which an underlying latent degradation process (
Section 2.6) drives heterogeneous behaviour across groups.
The synthetic dataset was generated from an explicit hierarchical generative process designed to reproduce heterogeneous and right-skewed reliability behaviour observed in industrial IoT systems. For each device
i belonging to production batch
, reset counts and failure times were generated as:
where
denotes a batch-specific latent effect controlling the underlying degradation level. Positive values of
correspond to increased reset intensity and reduced expected lifetime, representing defective production batches, while negative values indicate healthy batches.
Figure 1 displays the resulting joint distribution of resets and failure times, coloured by batch. The plot highlights two key properties motivating the use of INLA-based LGM models: the continuous variable
T is heavily right-skewed, with defective units failing exceptionally early, and the discrete count variable
R spans orders of magnitude, with batch 3 producing resets an order of magnitude more frequently than the remaining batches.
This construction provides full ground truth for evaluating model behaviour and enables controlled assessment of posterior accuracy, uncertainty quantification, and batch-level reconstruction.
Table 1 summarises the empirical means of reset counts and failure times. The intentionally degraded batch 3 exhibits over an order of magnitude more resets (
) and dramatically shortened lifetime (
), while batches 1, 2, and 4 show moderate and comparable behaviour.
Table 2 summarises the true batch-specific and global parameter values used in the synthetic data generation process, corresponding directly to the coefficients appearing in the log-intensity and log-mean model equations.
The parameters and denote the global intercepts of the log-intensity and log-mean equations, respectively, while represents the batch-specific latent effect and is the global Gamma shape parameter.
The strongly asymmetric distributions (especially the extreme behaviour of batch 3) provide a challenging testbed for assessing the capacity of INLA-based latent Gaussian models to capture skewness via Poisson, Negative Binomial, and Gamma likelihoods.
3.2. Model Specification
We consider three Bayesian reliability models capturing different aspects of the reset–lifetime mechanism observed in the simulated IoT device population. The models form a natural progression from simple pooled structures to a hierarchical joint model with shared latent batch effects. For all specifications, denotes the reset count of device i and its failure time, while indexes the production batch.
3.2.1. Model A: Pooled Poisson–Gamma
The first model assumes homogeneous failure behaviour across all devices. Reset counts follow a Poisson likelihood,
and lifetimes follow a Gamma distribution,
with
k treated as a free shape parameter. Both processes share only intercept terms, implying that all variability is absorbed by the likelihoods themselves. This specification provides a baseline representation of skewed count and lifetime data without overdispersion or batch heterogeneity.
3.2.2. Model B: Pooled Negative-Binomial–Gamma
The second model extends the reset component to account for overdispersion, which is a common feature of bursty failure processes. Reset counts are modelled with a Negative Binomial likelihood,
where
controls dispersion via
. The lifetime component remains the same Gamma structure as in Model A. This specification allows the model to adapt to empirical reset-count variation that exceeds the Poisson expectation.
3.2.3. Model C: Hierarchical Joint Model with Shared Batch Effects
The third model introduces a latent Gaussian batch effect to capture systematic quality differences among production batches. Let
denote the batch-specific effect for batch
g. The reset and lifetime predictors become
with
encoding the assumption that devices with higher reset intensity exhibit shorter expected lifetimes. The batch effects follow i.i.d. Gaussian priors,
with
estimated during inference. The batch-level precision parameter
is treated as a hyperparameter within the INLA framework and is inferred jointly with the remaining model parameters using INLA’s marginal likelihood approximation. A weakly informative prior is assigned to
in order to regularise extreme batch-level variability while allowing the data to dominate posterior inference. The posterior marginal distribution of
is obtained directly as part of the INLA output.
This model jointly represents both responses through the same underlying latent degradation process and is therefore the only specification capable of inferring batch-level quality differences.
3.3. Model Comparison
To assess the goodness-of-fit of the proposed models (A–C) and to compare their predictive performance, two standard Bayesian criteria were employed: the Deviance Information Criterion (DIC) and the Watanabe–Akaike Information Criterion (WAIC). Both metrics evaluate the model by balancing the goodness-of-fit against a penalty for model complexity (the effective number of parameters), thereby discouraging overfitting. WAIC is considered a fully Bayesian approach that utilises the posterior predictive distribution and is often preferred for hierarchical models. For both criteria, lower values indicate a model with better predictive properties.
3.3.1. Model A: Pooled Poisson–Gamma
To establish a baseline and to characterise the behaviour of “healthy” components, Model A is fitted exclusively to batches 1, 2, and 4, excluding batch 3, which exhibits extreme degradation patterns. This mirrors the laboratory-style calibration stage typically used in predictive maintenance, where normal operating conditions are modelled first, prior to analysing heterogeneous or defective units.
Figure 2 shows the resulting subset of the simulated dataset.
Model A assumes a single Poisson likelihood for the reset counts and a single Gamma likelihood for the failure times, with no batch-level random effects and no covariates beyond the intercepts. The fitted fixed effects indicate well-identified location parameters:
The Gamma precision parameter is estimated at
(sd
), corresponding to a moderately concentrated failure-time distribution for healthy batches.
Posterior predictive distributions for the restricted dataset are displayed in
Figure 3. For batches 1, 2, and 4 the pooled Poisson–Gamma specification captures the central mass of both failure-time and reset-count distributions reasonably well. The predictive medians coincide closely with the empirical histograms, and the 95% credible intervals (CrI) demonstrate appropriate uncertainty width.
In information-theoretic terms, Model A achieves
which indicates a satisfactory fit for homogeneous, non-defective batches. As will be shown in subsequent sections, this pooled structure becomes inadequate once the defective batch 3 is reintroduced: the same parametrisation then systematically underestimates the probability of large reset bursts and overestimates long failure times, demonstrating that a simple Poisson–Gamma model cannot accommodate latent heterogeneity.
3.3.2. Model B: Pooled Negative Binomial–Gamma
Model B extends the pooled formulation of Model A by replacing the Poisson likelihood with a Negative Binomial distribution in order to accommodate potential overdispersion in reset counts. As in Model A, the analysis is restricted to batches 1, 2, and 4, with the abnormally defective batch 3 removed from the pooled dataset. The model retains the same Gamma likelihood for failure times. Posterior inference yields well-identified fixed effects:
The dispersion hyperparameter of the Negative Binomial likelihood is estimated as
indicating only mild residual overdispersion in the truncated dataset. The precision parameter of the Gamma likelihood is also stably recovered:
Both information criteria improve slightly relative to Model A:
but the magnitude of the improvement is modest, reflecting the fact that after removing batch 3, the remaining reset counts exhibit relatively weak overdispersion.
Figure 4 displays the posterior predictive distributions for failure times (left panel) and reset counts (right panel). Compared with Model A, predictive intervals for resets shrink marginally and better match the empirical distribution, consistent with the increased flexibility of the Negative Binomial likelihood.
Despite its additional dispersion parameter, Model B remains a fully pooled specification, with no mechanism to represent systematic differences between production batches. Consequently, while it improves flexibility over the Poisson-based Model A, it cannot capture latent quality variation and therefore motivates the hierarchical construction introduced in Model C.
3.3.3. Model C: Hierarchical Joint Model with Shared Batch Effects
Before introducing the hierarchical joint model, it is instructive to analyse why the pooled models (A and B) will fail when fitted to the full dataset. Both pooled approaches systematically misrepresent the joint degradation mechanism once all four batches including the severely degraded batch 3 are included.
Figure 5 and
Figure 6 summarise the posterior predictive behaviour of Models A and B fitted to the entire dataset. Despite the use of different likelihoods for the reset counts (Poisson vs. Negative Binomial), both models exhibit clear structural misspecification:
failure-time predictions are overly diffuse and fail to reproduce the sharp degradation visible in batch 3,
reset-count predictions underestimate the extreme right tail,
neither model adapts to between-batch heterogeneity, because all devices are assumed to originate from a single homogeneous population.
These deficiencies motivate the introduction of a hierarchical model that allows batches to differ in their latent degradation state while still borrowing strength across data sources. This is precisely the purpose of Model C.
The results of Models A and B (when fitted to the full dataset including the defective batch 3) demonstrate that pooled likelihoods are unable to capture the heterogeneous behaviour of the production batches. For reference, when fitted to the entire dataset, Model A produces:
whereas the Negative Binomial extension in Model B yields:
Although Model B better accommodates overdispersion, both approaches treat the resets and failure times as arising from a single homogeneous population. This assumption is fundamentally violated in the full dataset, where batch 3 exhibits extreme degradation (high reset counts and very short lifetimes), while batches 1, 2, and 4 display normal operating behaviour. This motivates the construction of a hierarchical model in which batch-level latent variables explicitly capture underlying material or production differences.
Following the strategy employed in [
15], Model C introduces a latent batch-level effect
shared between the Poisson/Negative Binomial and Gamma likelihoods. This shared structure directly encodes the assumption that both resets and lifetimes are manifestations of the same latent physical degradation mechanism. Rather than fitting two independent likelihoods, the joint model learns a common degradation profile for each batch, allowing the model to simultaneously:
link high reset intensity with short lifetimes,
identify clusters of devices with similar behaviour,
separate normal batches from severely degraded ones,
propagate latent uncertainty jointly across both likelihood components.
This construction restores model identifiability on the full dataset and provides the hierarchical structure required to infer degradation differences across batches.
Model Specification
Let
and
denote the reset count and failure time for device
i. The joint model assumes:
so that a positive batch effect simultaneously increases resets and decreases failure time (effectively setting
). The latent effects
follow an i.i.d. Gaussian prior with precision
. In the INLA implementation, the coupling is achieved through the
copy mechanism, which forces the latent field in the Gamma branch to be an exact scaled copy of the field used in the count branch.
Fitted to the full dataset, Model C recovers the underlying batch structure with high accuracy. The posterior batch effects, shown in
Table 3, reveal a striking pattern: batch 3—the intentionally degraded group—exhibits a large positive effect (
), while the remaining batches have negative effects, corresponding to low reset counts and long lifetimes.
The hyperparameters are also sharply estimated:
and the model achieves the best information criteria among all three models:
representing an improvement of more than 4400 in DIC relative to Model A and more than 800 relative to Model B (full-data fits). This underscores the necessity of explicitly modelling latent heterogeneity. The magnitude of the observed DIC and WAIC reductions should be interpreted in the context of the synthetic experimental design. The dataset was generated with strong batch-level heterogeneity, including a severely defective batch, which pooled models are structurally unable to represent. As a result, hierarchical models with shared batch effects naturally yield large improvements in information criteria.
Figure 7 displays the posterior predictive distributions for the full dataset using the hierarchical Model C. The predictive bands closely follow the empirical distributions, widen naturally in the tails, and successfully reproduce the multimodal shape induced by mixed batch quality. This is in sharp contrast to Models A and B, which oversmooth the distribution and fail to explain heavy-tailed behaviour.
3.4. Per-Batch Posterior Predictive Analysis
A key motivation for introducing Model C was to recover latent heterogeneity between production batches, which is not identifiable under pooled specifications (Models A and B). While global posterior predictive checks summarise average calibration, they do not reveal whether individual batches representing distinct manufacturing conditions or component-quality regimes are consistently reproduced by the fitted hierarchical model.
We therefore examine posterior predictive distributions separately for each batch. This allows us to assess:
- (i)
whether Model C captures the local scale and shape of both failure-time and reset distributions,
- (ii)
how uncertainty varies between batches, and
- (iii)
whether the latent batch effects inferred by the model translate into visibly different predictive behaviour.
Figure 8 displays, for all four batches, posterior predictive histograms (median and 95% CrI) for failure times (left column) and reset counts (right column). These distributions are generated conditional on the shared batch-level latent effect
, which propagates jointly into the Poisson and Gamma likelihoods via the
copy mechanism.
For batches 1, 2, and 4 (top three rows), the model produces narrow and well-aligned posterior intervals. Resets remain low and tightly concentrated, while the corresponding failure-time distributions exhibit long lifetimes with limited spread. This is consistent with the simulated ground truth, in which these batches represent normal-quality production with modest degradation. The hierarchical priors shrink their latent effects toward one another, yielding stable, low-variance predictive distributions.
Batch 3 presents a substantially different picture. Here, the posterior predictive distributions exhibit:
markedly higher reset counts, with heavy right tails driven by large values of , and
extremely short failure times, correctly reflecting the severe underlying degradation coded in the simulation.
The predictive spread is wider in both likelihoods, illustrating how the model extrapolates additional uncertainty for defective or unstable production groups. Importantly, the location, scale, and tail behaviour of both distributions match the empirical patterns almost exactly, demonstrating that the hierarchical structure successfully isolates and represents non-standard batch-level dynamics.
Taken together, the per-batch diagnostics confirm that Model C captures the joint behaviour of resets and lifetimes not only in a global sense but within each production regime. This decomposition is essential for predictive maintenance applications: operational decisions (e.g., early replacement, warranty allocation, or targeted quality inspection) depend on identifying which specific batches exhibit abnormal behaviour, rather than simply modelling average tendencies. Model C delivers this capability, whereas Models A and B, lacking batch-specific structure, smooth over these critical differences.
4. Discussion
The results presented in this study demonstrate that the choice of statistical model is decisive when analysing reliability data from large-scale IoT systems. Our findings confirm that standard pooled approaches—even those utilising appropriate exponential-family likelihoods—are insufficient for capturing the complex degradation patterns inherent in heterogeneous device populations. Unlike typical machine learning approaches used in predictive maintenance, the proposed Bayesian framework focuses on probabilistic calibration, uncertainty quantification, and interpretability rather than purely predictive accuracy. While data-driven models may achieve strong performance in large labeled datasets, they often lack transparent uncertainty estimates, which are critical for reliability engineering and maintenance decision-making.
The comparison between the pooled models (A and B) and the hierarchical joint model (C) highlights the critical role of latent structure in predictive maintenance. While Model A (Poisson–Gamma) and Model B (Negative Binomial–Gamma) provided adequate fits for the subset of “healthy” batches, they failed when applied to the full dataset containing defective units. The high DIC and WAIC values for these models on the full dataset indicate that treating IoT devices as a homogeneous population leads to severe model misspecification. Specifically, the pooled models were forced to average the behaviour of healthy and defective units, resulting in predictive distributions that oversmoothed the data—underestimating the risk of immediate failure in defective batches while overestimating the failure intensity of healthy ones. While information criteria such as DIC and WAIC were used to assess relative model fit, alternative validation strategies such as leave-one-out or time-series cross-validation were not considered in this study. Given the synthetic and non-temporal nature of the data, posterior predictive checks were deemed the most appropriate validation tool. Future work will investigate cross-validation strategies in dynamic IoT settings with temporal structure.
In contrast, Model C successfully recovered the ground truth of the simulated degradation process. By incorporating a shared latent batch effect, the model achieved a dramatic reduction in information criteria (DIC decreasing from 6565 in Model A to 2118 in Model C). More importantly, the posterior predictive checks confirmed that the hierarchical structure allows for locally calibrated predictions (see
Figure 8). The model correctly identified Batch 3 as defective (
, see
Table 3), effectively linking high reset counts with short time-to-failure. This confirms that the “copy” mechanism in INLA provides a flexible and powerful way to fuse heterogeneous data sources (discrete counts and continuous lifetimes) into a unified health indicator, as suggested in previous works [
15].
Although this study focuses on two reliability signals, the proposed framework naturally extends to multiple signals by introducing additional likelihood components linked through shared or partially shared latent effects. In practice, high-dimensional raw sensor streams would be summarised into reliability-relevant indicators, allowing the latent Gaussian structure to remain sparse. Under such conditions, the computational efficiency of INLA is preserved even as the number of observed signals increases. The prior for the batch-level precision parameter was chosen to be weakly informative, following standard recommendations for hierarchical variance components, in order to regularise extreme batch effects without dominating the likelihood. Preliminary experiments indicated that posterior inference was primarily data-driven. A formal prior sensitivity analysis is left for future work, particularly in applications involving real-world datasets. Although alternative reliability distributions such as Weibull or Lognormal are commonly used for lifetime modelling, the Gamma distribution was selected due to its analytical convenience, compatibility with the exponential-family framework, and seamless integration with INLA. A systematic comparison of alternative lifetime likelihoods within the joint modelling framework is an important topic for future work.
A key motivation for this work was the prevalence of right-skewed distributions in reliability engineering [
3,
29]. Our analysis reinforces the view that Gaussian approximations are unsuitable for such data. The success of the Gamma and Negative Binomial likelihoods in capturing the heavy tails of the failure-time and reset-count distributions illustrates the necessity of using exponential-family models. Model B’s improvement over Model A (on the calibrated subset) specifically highlights the importance of accounting for overdispersion in event counts, a phenomenon frequently observed in bursty failure processes [
31].
From a computational perspective, the use of INLA proved to be highly effective. The estimation of complex hierarchical joint models with non-Gaussian likelihoods is traditionally computationally expensive with MCMC methods, often requiring long burn-in periods and careful tuning [
10,
23]. In our study, INLA provided deterministic, stable, and fast posterior approximations. This efficiency is paramount for Industry 4.0 applications [
1,
2], where predictive models may need to be retrained frequently as new batches of devices are deployed and new data streams become available. Moreover, INLA enables full Bayesian inference within seconds on standard hardware, whereas equivalent MCMC implementations typically require substantially longer runtimes due to chain convergence, burn-in, and diagnostic procedures. This speed difference makes INLA suitable for frequent re-estimation or near-real-time monitoring in industrial IoT applications, even though explicit timing benchmarks were not the focus of this study.
The present study has several limitations that should be acknowledged. First, the analysis is based on fully synthetic data, which may limit direct generalisation to real-world industrial IoT deployments. The synthetic setup was intentionally chosen to provide full control over the ground truth and to enable rigorous validation of model identifiability and posterior calibration; nevertheless, empirical validation on real industrial datasets remains necessary.
Second, the batch-level latent effects considered in this work are static and do not evolve over time. In real operational settings, degradation processes may exhibit temporal dynamics or progressive deterioration, which are not captured by the current formulation.
Third, the proposed models focus on batch-level heterogeneity and do not explicitly incorporate additional covariates such as environmental conditions, usage intensity, or firmware updates. Incorporating such covariates, as well as extending the framework to dynamic or time-dependent latent structures, constitutes an important direction for future research. Also, proposed models do not account for temporal autocorrelation or dynamic degradation effects, which are common in time-series IoT data and will be addressed in future extensions using dynamic latent processes.
The practical implication of our findings for PdM is twofold. First, reliability models must account for production-level heterogeneity. In reality, IoT devices are rarely identical; they suffer from batch-dependent variations in manufacturing quality. A model that ignores this hierarchy risks generating “average” predictions that are useless for detecting specific anomalies. Second, the joint modelling of secondary symptoms (resets) and primary failure metrics (lifetime) significantly enhances diagnostic power. By observing high reset counts early in the device’s life, the shared-effect model can infer a high probability of latent degradation and predict a shortened lifespan before a catastrophic failure occurs. The proposed hierarchical structure naturally generalises to settings with a larger number of batches or deeper hierarchies, such as devices nested within batches and production sites. Due to the sparse GMRF representation underlying INLA, the computational cost scales favourably with the number of latent effects, making the framework suitable for large-scale industrial IoT deployments.
While this study utilises controlled synthetic data to rigorously validate model identifiability and calibration, real-world industrial data may exhibit more complex dependencies. For instance, degradation processes might be time-varying or spatially correlated, scenarios which were not considered here but for which INLA is well suited [
12]. Future research will focus on applying the proposed Poisson–Gamma–INLA framework to empirical datasets from deployed sensor networks. Additionally, we aim to explore the integration of temporal autoregressive terms to model the evolution of reset intensity over time, moving from static batch effects to dynamic health monitoring.
5. Conclusions
This paper presented a Bayesian hierarchical framework for modelling skewed reliability data in IoT systems using INLA. We showed that a joint model combining Poisson/Negative Binomial and Gamma likelihoods via shared latent effects effectively captures the correlation between operational anomalies (resets) and device lifetime. In particular, the hierarchical model with shared batch-level effects achieved a substantial improvement in predictive performance, reducing the Deviance Information Criterion (DIC) by more than 67% compared to pooled baseline models and accurately identifying severely defective production batches. The superiority of the hierarchical approach over pooled baselines emphasises the need for batch-aware modelling in industrial predictive maintenance. The results demonstrate that early indications such as elevated reset counts can be probabilistically linked to shortened device lifetimes, enabling timely maintenance actions, targeted quality inspection, and informed replacement or warranty decisions. By leveraging INLA, we demonstrated that rigorous uncertainty quantification and complex non-Gaussian modelling can be achieved with the computational speed required for modern industrial applications. Overall, the proposed framework provides a practically applicable and interpretable tool for data-driven predictive maintenance in heterogeneous IoT systems, bridging the gap between probabilistic reliability modelling, and real-world industrial decision-making.