Data-Driven Predictive Maintenance for Aircraft Components Through Sparse Event Logs

Sezenoğlu Çetin, Fulin; Üngör, Ufuk; Koyuncu, Emre; Özkol, İbrahim

doi:10.3390/aerospace13010110

Open AccessArticle

Data-Driven Predictive Maintenance for Aircraft Components Through Sparse Event Logs

¹

Aerospace Research Center, Istanbul Technical University, Istanbul 34469, Türkiye

²

Turkish Technic Inc., Istanbul 34912, Türkiye

^*

Author to whom correspondence should be addressed.

Aerospace 2026, 13(1), 110; https://doi.org/10.3390/aerospace13010110

Submission received: 19 December 2025 / Revised: 19 January 2026 / Accepted: 21 January 2026 / Published: 22 January 2026

(This article belongs to the Collection Air Transportation—Operations and Management)

Download

Browse Figures

Versions Notes

Abstract

Effective predictive maintenance is crucial for ensuring aircraft reliability, reducing operational disruptions, and supporting spare part inventory management in airline operations. However, maintenance data is often sparse, with irregular observations, missing records, and imbalanced failure distributions, making accurate forecasting a significant challenge. This study proposes a data-driven framework for maintenance prediction under sparse observational data. We implement and compare two distinct methodologies: survival analysis via DeepHit for time-to-event prediction, and a latent space classifier with autoencoder backbone. Each method is evaluated on historical aircraft maintenance logs and component installation records, addressing challenges posed by limited and imbalanced datasets. Both models are trained and tested on ten years of maintenance logs and component installation records sourced from an airline MRO (Maintenance, Repair and Overhaul) company that services a fleet of more than 500 aircraft, offering a realistic and scalable setting for fleet-wide maintenance analysis. The latent space classifier demonstrates superior overall performance and consistency across diverse components and prediction horizons compared to DeepHit, which is constrained by its sensitivity to probability thresholds. The encoder-based method effectively transfers knowledge from high-data components to those with sparse maintenance histories, enabling reliable maintenance forecasting and enhanced inventory planning for large-scale airline operations.

Keywords:

predictive maintenance; aircraft components; sparse event logs; autoencoder; latent space classification; survival analysis; transfer learning

1. Introduction

Predictive maintenance is a critical pillar of operational efficiency in the airline industry, as unexpected aircraft groundings can lead to substantial revenue losses and operational setbacks. Yet high uncertainty in forecasting component failures hampers effective inventory planning, increasing the risk of delays and compounding financial losses. For mixed fleet airlines managing tens of thousands of components, relying on a single predictive model—especially those rooted in conventional methodologies—falls short of capturing the diverse and complex behavior of all parts. Despite advancements, this challenge remains unresolved, highlighting the need for more sophisticated, data-driven approaches to enhance predictive maintenance strategies.

The comprehensive exploration of the advancements and potential of predictive maintenance as a superior alternative to traditional strategies—especially in optimizing the utilization of the remaining useful life (RUL) of components through data-driven methods and AI technologies—is detailed in [1]. Yan et al. proposed a predictive maintenance framework leveraging Prognostics and Health Management (PHM) data to optimize maintenance costs and improve reliability for aircraft air-conditioning systems [2]. However, the framework’s focus on specific subsystems limits its scalability to more complex, interconnected systems. Similarly, ref. [3] explored the use of digital twin technology for predictive maintenance, providing a foundation for more dynamic and precise maintenance routines. Despite its innovative approach, the approach is quite data-intensive, requiring extensive high-quality operational datasets to realize its full potential. Moreover, ref. [4] introduced a scheduling framework for predictive maintenance based on RUL prediction using a CNN-BiLSTM (Convolutional Neural Networks and a bidirectional long short-term memory network) hybrid deep learning model, demonstrating accurate RUL estimations for aircraft engines. Nevertheless, the method’s reliance on high-quality labeled datasets and its lack of consideration for data imbalance and real-time processing constraints present notable challenges for broader adoption. Similarly, by minimizing the size of data and computational demands, the summarization strategies suggested in the paper improve predictive maintenance workflows. However, these strategies often depend on a high level of preprocessing knowledge and have a limited ability to generalize to a variety of aviation data types, such as minimally detected inconsistent logs or components [5].

Long short-term memory (LSTM) networks have extensively been employed in predictive maintenance to improve prediction accuracy by utilizing time series maintenance data. Ref. [6] presents a hybrid framework where an autoencoder-based anomaly detector flags rare aircraft faults, and a bidirectional gated recurrent unit (GRU) network predicts their subsequent occurrence; the model ingests error codes, flight-deck effect reports, and maintenance logs, and its performance is assessed via accuracy, precision, recall, and confusion-matrix metrics. Similarly, ref. [7] applies a Random Forest classifier to a real aircraft central maintenance system log-based dataset, achieving above seventy percent precision while correctly forecasting over half of the unscheduled aircraft component replacements. Ref. [8] designed an effective multi-modal LSTM network to preprocess time series data and to predict anomalies in piston engine aircraft. Ref. [4] proposed an ensemble model combining CNN and bidirectional LSTM with Bayesian optimization for RUL prediction. Ref. [9] developed a hybrid CNN-LSTM framework for jointly optimizing production and predictive maintenance scheduling that effectively reduced total costs while improving efficiency. Ref. [10] couples an LSTM model with swarm-based optimization and a Support Vector Machine (SVM) to improve hazard prediction from aircraft communications data. Despite their strong predictive capacity, LSTM-based architectures generally require large volumes of finely sampled, continuous time series data to capture degradation dynamics and mitigate overfitting. As a result, their performance can drop significantly when applied to sparse, irregularly sampled, or purely event-driven maintenance records.

These limitations become even more pronounced in maintenance datasets that contain only event logs—often hundreds of flight hours apart—where the long stretches of “no-event” data give an LSTM little temporal structure to learn and can cause vanishing-gradient problems or over-regularization. As a result, recent research has shifted toward representation-learning and data-augmentation techniques that can extract robust health indicators from sparse streams and supplement scarce failure cases before any sequence model is applied. Ref. [11] leveraged a deep autoencoder to derive a latent space health indicator for accurate RUL prediction. Ref. [12] presents a transfer-learning framework that employs consensus self-organizing models to improve predictions of equipment remaining useful life. Ref. [13] provides a domain-agnostic survey of data-scarcity remedies and explicitly single out Generative Adversarial Networks (GAN) as a principal strategy for synthesizing additional training samples, arguing that GAN-based augmentation can mitigate small, imbalanced or non-generalizable datasets. To estimate the RUL of aircraft auxiliary power units, ref. [14] uses routine auxiliary power unit (APU) monitoring data to train Random Forest models of normal behavior, rates each unit’s health by how far it drifts from the best model, and then applies a Bayesian time series method to predict the RUL. Ref. [15] proposes a hybrid method that generates cautious point estimates of remaining useful life and translates them into a precise schedule for predictive maintenance. Even though these approaches are applied to predictive maintenance and RUL estimation, they often do not account for the challenges that arise when maintenance event logs are used and there is data imbalance across multiple components.

When maintenance datasets primarily consist of event logs, survival analysis provides a principled framework to address the associated data sparsity and imbalance across components. This approach enables the estimation of failure probabilities and RUL without requiring dense temporal measurements. Survival analysis has gained traction in predictive maintenance for aviation, providing crucial insights into the RUL of aircraft components and enabling proactive maintenance scheduling. Ref. [16] introduced DeepHit, a deep learning-based survival model that directly learns the distribution of survival times and competing risks, reducing the need for the strong parametric assumptions typical of traditional approaches. Ref. [17] adopts a survival analysis approach, fitting a Cox proportional hazards model to estimate the survival curve while treating occurrences of other faults or events as explanatory features. Ref. [18] demonstrates that, for log data that fit a Weibull pattern, event-driven predictive maintenance can be clearly interpreted. Ref. [3] combined probability-based survival models with digital twin technology to enhance maintenance strategies, enabling crews to forecast support needs, record part changes in real time, and keep each plane’s configuration up to date. Collectively, these studies underscore the utility of survival analysis techniques for aviation predictive maintenance, while also highlighting unresolved challenges such as data sparsity and the need for more adaptable models.

In this study, we propose a latent space classification framework that utilizes a shared encoder backbone trained on the dataset to address the challenge of data imbalance across part numbers (PNs). By learning global feature representation through an autoencoder, the model effectively transfers knowledge from high-data components to those with sparse maintenance histories, enabling reliable classification performance even in low-sample regimes. The study is based on datasets sourced from an airline MRO company operating with a fleet of over 500 aircraft, offering a realistic and scalable setting for fleet-wide maintenance analysis. This approach is compared against DeepHit, a survival analysis model designed for time-to-event prediction, allowing for a systematic evaluation of probabilistic versus discriminative modeling strategies in the context of predictive maintenance. Through extensive experiments across varying prediction horizons, we analyze the sensitivity, robustness, and practical utility of each method for fleet-wide maintenance planning and inventory decision support under sparse and irregular observational data.

The structure of this paper is as follows. In Section 2, the data preparation process is described, covering data sources, preprocessing techniques, and feature selection methods used to improve model performance. In Section 3, we focus on the maintenance event prediction model, describing the methodology, machine learning techniques used, and the model architecture. Finally, in Section 4, the results and validation are provided, where the predictive accuracy of the model is evaluated and its implications for the management of aircraft maintenance inventory are discussed.

2. Data Preparation

The dataset in this study includes aircraft maintenance data collected over ten years, from 2014 to 2024. The data, sourced from two comprehensive datasets, provides valuable insights into the lifecycle of aircraft components, covering both their installation and subsequent maintenance activities. These databases are sourced from an airline MRO company that operates under a large airline with a fleet of over 500 aircraft.

The first dataset focuses on installed components and contains detailed records of parts installed on aircraft during the operation period. This dataset includes information on part numbers, serial numbers, the specific aircraft on which components were installed, and the dates of installation. It also captures details such as the flight hours and cycles logged by each component since its installation, the initial condition of the parts, and the age of both the aircraft and the components at the time of installation. Specifically, the fault-free operation interval for each component is directly quantified by the SN_FH and SN_FC metrics, which record the continuous accumulated usage from the installation to the maintenance event. In addition, flight count and flight hour values for 3 months later are also given to be used in the estimation study according to the planning strategies. The mapping of calendar-based prediction horizons to operational metrics was established using the projected monthly utilization rates derived from the airline’s flight schedules. In Figure 1, the real installation data is shown.

The second dataset includes maintenance logs, documenting activities carried out on the aircraft over the period. This dataset provides a rich account of maintenance transactions, including the types of maintenance performed, the components involved, and the brief reasons behind the interventions. Key information includes part and serial numbers, sub-categories of the components, aircraft identifiers, and the dates of maintenance actions. Additionally, it records whether the maintenance was scheduled or unscheduled, the condition of the components at the time of maintenance, and flight data such as hours and cycles linked to specific components. The age metrics of both aircraft and components are given, and the total flight counts and flight hours for each of the part numbers are also given. The real maintenance data is shown in Figure 2.

To construct a unified event-level dataset suitable for predictive modeling, two complementary sources were merged: one containing records of currently installed components and another comprising historical maintenance logs. Each entry was labeled as either INSTALLED or MAINTENANCE to distinguish between components that are in service and those removed or intervened upon. Operational attributes were extracted to capture distinct dimensions of component life: flight hours (SN_FH) and flight cycles (SN_FC) were selected to quantify operational exposure, while component age (SN_AGE) and aircraft age (AC_AGE) were included to account for calendar-based aging effects that progress independently of flight operations.

Since maintenance records are temporally sparse and irregular, we avoid using sequential models such as LSTMs or recurrent neural networks (RNNs), which typically require dense and consistent time series data. Instead, we adopt a first-order Markovian abstraction, assuming that the current feature vector

x_{t}

acts as a sufficient statistic for the system’s degradation history. This assumption allows the probability of a maintenance event

y_{t}

to be approximated solely on the instantaneous state, rather than conditioning on the entire historical trajectory:

P (y_{t} ∣ x_{t}, x_{t - 1}, \dots, x_{0}) \approx P (y_{t} ∣ x_{t})

(1)

This memoryless formulation is valid because

x_{t}

is explicitly engineered to include cumulative integrals of operational stress—such as SN_FH and SN_FC. By embedding these cumulative metrics into the input vector, the historical usage trajectory is effectively encoded into the current state representation, enabling the use of static classifiers without sacrificing temporal context.

By structuring the input data under this assumption, we aim to preserve temporal relevance while maintaining compatibility with static models. This abstraction facilitates learning from maintenance histories without requiring fully sequential modeling, which is often infeasible in aviation datasets due to missing records, asynchronous updates, and inconsistent sampling frequencies. To capture component-level usage history, cumulative metrics—TOTAL_SN_FH and TOTAL_SN_FC—were computed by aggregating previous flight hours and cycles associated with the same serial number. These cumulative metrics quantify the component’s total operational usage over its lifespan. Additionally, the number of prior maintenance events was encoded using the NO_OF_PREV_MAINTENANCE feature, offering insight into individual part reliability over time by capturing the recurrence frequency of failures.

Following integration, variables representing the operational context were processed to capture distinct risk factors. AC_TYPE accounts for utilization differences across fleets, while CONDITION (e.g., new vs. repaired) represents the component’s baseline reliability profile upon installation. Subsequently, these categorical variables were transformed into binary representations using one-hot encoding. Continuous features were scaled to the

[0, 1]

range via min-max normalization to ensure numerical stability across models. The resulting dataset comprises both current operational states and enriched maintenance history, forming a structured basis for training latent space classifiers and survival analysis models under sparse and imbalanced conditions.

Table 1 provides a detailed breakdown of the data fields available in the unified dataset, categorizing them by their origin and utility in the modeling process. The table distinguishes between raw operational metrics and those derived through feature engineering, such as cumulative usage statistics (TOTAL_SN_FH, TOTAL_SN_FC) and previous maintenance numbers (NO_OF_PREV_MAINT). The final column explicitly identifies the subset of features selected as inputs for the predictive models. While unique identifiers (PN, SN) remain in the dataset for tracking and validation, they are excluded from the training feature vector to prevent overfitting, ensuring the model generalizes based on operational behavior rather than specific identity tags. Similarly, fields such as ATA_CHAPTER, AC, LONGITUDE/LATITUDE, DATE, and REASON_CATEGORY are listed to reflect the original data structure but are excluded from training as they do not contribute to the learning process. Collectively, these selected features construct a robust operational profile, enabling the model to correlate specific usage patterns and installation attributes with the likelihood of maintenance events.

Figure 3 illustrates the skewed distribution of available maintenance data across various part numbers. While some PNs exhibit a large number of maintenance records, many have relatively few observations. Such data scarcity can impede the effectiveness of predictive maintenance algorithms, as these models rely on sufficiently large and diverse training examples to learn robust patterns. In extreme cases where a PN has fewer than 100 maintenance records, the lack of data hinders the model’s ability to capture important failure modes or maintenance needs, ultimately degrading its overall performance and generalizability.

The final dataset comprises a total of 35,005 records collected from a fleet of over 500 aircraft between 2014 and 2024, covering 80 distinct PNs. While the aggregate dataset shows a balanced distribution with maintenance events constituting approximately 50.92% of the total observations, a significant data imbalance exists across varying part numbers, as illustrated in Figure 3. Regarding data quality, missing values in the SN_AGE field (observed in 57.84% of records) were handled via mean imputation, while missing entries in the CONDITION field (4.63%) were encoded as a distinct “No_Condition” category to preserve information. No missing values were observed in the remaining fields. As listed in Table 1, continuous features (e.g., SN_FH, SN_FC) represent cumulative usage metrics and were normalized to the range [0, 1] for model stability.

3. Maintenance Event Prediction

This section details the methodological frameworks employed for the prediction of aircraft maintenance events, a critical component in optimizing fleet availability and supply chain management. We investigate two distinct predictive paradigms: a probabilistic survival analysis approach and a hybrid classification-based approach. First, we introduce the application of DeepHit, a deep neural network tailored for time-to-event prediction, which estimates the probability of component survival over continuous flight hours to derive actionable risk thresholds. Subsequently, we present a proposed hybrid architecture that leverages an autoencoder for dimensionality reduction coupled with traditional machine learning classifiers—specifically Random Forest, K-Nearest Neighbors, and Decision Trees—to categorize maintenance needs within a compact latent feature space.

To provide a comprehensive operational overview, the proposed framework proceeds in a structured workflow. Initially, raw data from maintenance logs and installation records are merged and preprocessed, involving the engineering of cumulative usage metrics—such as total flight hours and cycles—and feature normalization. Following data preparation, the methodology applies two parallel modeling strategies. The first utilizes the DeepHit network to learn probability distributions for survival analysis directly from the processed data. The second path implements a hybrid latent space classification strategy, where an autoencoder first compresses high-dimensional inputs into a lower-dimensional latent representation. These learned latent features are then utilized to train Random Forest, K-Nearest Neighbors, and Decision Tree classifiers to predict maintenance events. Finally, the models are evaluated across varying prediction horizons (3-month, 6-month, and 1-year) to assess their utility for fleet-wide inventory planning.

3.1. Maintenance Prediction with Survival Analysis: DeepHit

DeepHit is a deep learning model designed to address challenges in time-to-event prediction, particularly in multi-risk scenarios with data imbalance [16]. Unlike traditional survival analysis methods, it learns the joint probability distribution of event times and outcomes directly from data, making it robust and flexible for predictive maintenance tasks in aviation. Its architecture combines a shared sub-network that captures common latent features with risk-specific sub-networks that model each competing risk separately. This structure enables DeepHit to predict the likelihood of various risks occurring at specific time points, with a softmax layer ensuring normalized and interpretable outputs.

DeepHit’s training process is centered on a composite loss function that effectively handles partially observed data. The primary component is the log-likelihood loss,

L_{1}

, which ensures that the model accurately predicts observed event times and types while accommodating instances with incomplete information. This loss is defined in [16] as

L_{1} = - \sum_{i = 1}^{N} [I (k_{i} \neq \emptyset) \cdot log (y_{k_{i}, t_{i}}) + I (k_{i} = \emptyset) \cdot log (1 - \sum_{k = 1}^{K} {\hat{F}}_{k} (t_{i}))]

(2)

where

k_{i} \in {1, \dots, K}

denotes the specific event type observed for the i-th data point, while

k_{i} = \emptyset

indicates that the observation is censored.

I (\cdot)

is the indicator function,

y_{k_{i}, t_{i}}

is the predicted probability of event k occurring at time

t_{i}

, and

{\hat{F}}_{k} (t_{i})

represents the cumulative incidence function for event k.

To refine the discrimination capability, a ranking loss

L_{2}

is incorporated to penalize the incorrect ordering of risk predictions between pairs of aircraft components. This is expressed as

L_{2} = \sum_{k = 1}^{K} \sum_{i \neq j} η_{i, j} \cdot η ({\hat{F}}_{k} (t_{i} ∣ x_{i}), {\hat{F}}_{k} (t_{i} ∣ x_{j}))

(3)

where

η_{i, j}

is defined as

I (t_{i} < t_{j})

, an indicator function that selects valid pairs for comparison where subject i experiences an event earlier than subject j, and

η (z_{1}, z_{2}) = exp ((z_{2} - z_{1}) / σ)

is a convex loss function quantifying the concordance error with a scaling parameter

σ

. The parameter

σ

acts as a scaling hyperparameter for the ranking loss function, controlling the steepness of the penalty for incorrect risk orderings. The total loss combines these objectives, balanced by a hyperparameter

α

:

L_{Total} = L_{1} + α \cdot L_{2}

(4)

The architecture employed in this study, illustrated in Figure 4, is a deep neural network specifically adapted for survival analysis with competing risks. The model features a multi-task architecture comprising a shared sub-network and K parallel cause-specific sub-networks (corresponding to the K maintenance reasons). The input covariates

x

are first processed by the shared sub-network to capture latent representations

f_{s} (x)

common to all event types. To preserve original feature information while leveraging these learned representations, a residual connection concatenates the original covariates with the shared output to form the vector

z = (f_{s} (x), x)

. This vector serves as the input for the subsequent cause-specific sub-networks.

DeepHit produces a set of survival functions for each aircraft part, predicting the probability of the part remaining installed (i.e., not undergoing maintenance) at each specific flight hour. For each part j, the model outputs a survival function

S_{j} (t)

over discrete flight hours

t \in {t_{0}, t_{1}, t_{2}, \dots, T_{max}}

, where

T_{max}

is the maximum considered flight hour. The survival function

S_{j} (t)

represents the probability that part j has not undergone maintenance up to flight hour t. The maintenance probability (cumulative failure probability) at time t, denoted here as

M_{j} (t)

, is the complement of the survival function:

M_{j} (t) = 1 - S_{j} (t)

(5)

This relationship ensures that

M_{j} (t)

represents the probability that part j has undergone maintenance by or at flight hour t.

To predict the overall maintenance need, a threshold-based approach is applied to determine specific maintenance requirements for individual parts. If there are N parts under consideration, the maintenance likelihood for each part j at flight hour t is

M_{j} (t)

. A maintenance event is flagged only when this probability exceeds a predefined threshold

τ

. The binary decision indicator,

D_{j} (t)

, for each part j at time t is defined as

D_{j} (t) = \{\begin{matrix} 1, & if M_{j} (t) \geq τ \\ 0, & otherwise \end{matrix}

(6)

where

τ

is a tunable probability threshold (e.g.,

0.1, 0.2, 0.3

). Consequently, the total count of predicted maintenance events across all parts at flight hour t, denoted as

C_{τ} (t)

, is computed by summing the active decision indicators:

C_{τ} (t) = \sum_{j = 1}^{N} D_{j} (t)

(7)

This formulation ensures that only components with a sufficiently high likelihood of failure are included in the forecast, effectively filtering out false positives arising from low-confidence predictions. By varying the threshold

τ

, different maintenance strategies can be simulated, allowing operators to balance between early interventions (low

τ

) and conservative maintenance policies (high

τ

).

3.2. Proposed Method: Maintenance Prediction with Autoencoder and Latent Space Classifier

A unified model capable of handling diverse part numbers with varying data availability is essential for ensuring robust predictive maintenance. However, maintenance events are rare, and for certain part numbers, the number of recorded observations is limited. We adopt a hybrid approach, where a deep learning-based autoencoder is employed as a backbone feature extractor, and machine learning classifiers are used for final classification within the latent space. This strategy allows us to leverage the representational power of neural networks while ensuring effective learning with limited data using traditional classifiers.

The autoencoder first transforms high-dimensional input data into a lower-dimensional latent representation, capturing essential feature structures while eliminating redundant information. The learned latent space serves as an effective feature space for classification, where three machine learning algorithms, (i) K-Nearest Neighbors, (ii) Decision Trees, and (iii) Random Forest methods, are evaluated. These algorithms were selected due to their ability to work efficiently with moderate-sized datasets, unlike deep learning models that require extensive training data. This approach enables the model to adapt to different part numbers, ensuring robust classification even when maintenance records are sparse.

3.2.1. Autoencoder Architecture

In this section, we describe the autoencoder architecture, which transforms high-dimensional input data into a compact latent space while preserving essential features.

The autoencoder model used in this study consists of three main components, an encoder, a latent space, and a decoder, as illustrated in Figure 5, enabling dimensionality reduction and reconstruction of the input data while preserving essential information and eliminating redundant features. The encoder transforms the high-dimensional input into a compact latent representation, capturing the core attribute features of the data and effectively reducing its dimensionality. The latent space serves as the compressed representation of the input data, retaining the critical features necessary for reconstruction and acting as an effective feature space for downstream tasks such as classification. The decoder reconstructs the original input from the latent representation, aiming to closely replicate the input while minimizing the reconstruction error. Training of the autoencoder focuses on reducing this reconstruction error, ensuring that the model captures the intrinsic structure of the input data in the latent space. These latent features are then utilized to train machine learning classifiers to effectively predict component maintenance schedules.

The autoencoder is trained using a mean squared error (MSE) loss function, which quantifies the reconstruction error between input data

x

and its reconstructed output

x^{'}

:

L_{M S E} = \frac{1}{N} \sum_{i = 1}^{N} {∥ x_{i} - x_{i}^{'} ∥}^{2} .

(8)

3.2.2. Latent Space Classifiers

In this section, we introduce the machine learning classifiers applied to this latent space and justify their selection for handling limited data (Figure 6). Three different machine learning algorithms—K-Nearest Neighbors (KNNs), Decision Trees, and Random Forest—are employed for classification in the latent space. These models are selected based on their ability to learn effectively from limited data, making them suitable for scenarios where the number of available samples for certain part numbers is relatively small. Unlike deep learning models, which require large-scale datasets to generalize well and avoid overfitting, traditional machine learning algorithms can efficiently extract patterns from moderate-sized datasets without extensive hyperparameter tuning or computational demands.

KNN is chosen due to its instance-based learning approach, which relies on local neighborhood relationships to make predictions. This property allows it to adapt well to variations in feature distributions within the latent space. Decision Trees, on the other hand, provide an interpretable structure that recursively partitions the feature space to maximize information gain at each step. This method is particularly useful for identifying key decision boundaries within the reduced latent representation. Finally, Random Forest, an ensemble-based technique composed of multiple Decision Trees, enhances classification robustness by aggregating predictions across multiple trees to reduce variance and improve generalization. These models collectively offer a balance between interpretability, computational efficiency, and adaptability to limited data, making them an appropriate choice for predictive maintenance tasks in aviation [19].

The Random Forest classifier, an ensemble of T Decision Trees, is trained on bootstrap samples of the data, with each tree t producing a prediction

h_{t} (x)

. The final prediction is obtained through majority voting [20]:

H (x) = mode {h_{1} (x), h_{2} (x), \dots, h_{T} (x)}

(9)

The model also calculates feature importance

I_{k}

for each feature k, based on the reduction in impurity

Δ I

across all splits involving k:

I_{k} = \sum_{t = 1}^{T} \sum_{s \in S_{k}^{t}} Δ I_{s}

(10)

where

S_{k}^{t}

is the set of splits on feature k in tree t.

The K-Nearest Neighbors classifier assigns a label to each instance based on the majority class among its k Nearest Neighbors in the feature space. Given an input z, the predicted class

\hat{y}

is [21]

\hat{y} = mode {y_{i} : i \in N_{k} (z)}

(11)

where

N_{k} (z)

represents the set of k Nearest Neighbors of z.

Decision Trees partition the latent space into a hierarchy of decision nodes. Each split is determined by finding the feature

x_{k}

that maximizes the reduction in impurity

I

[22]:

Δ I = I (D) - \frac{| D_{L} |}{| D |} I (D_{L}) - \frac{| D_{R} |}{| D |} I (D_{R})

(12)

where D is the dataset at the current node, and

D_{L}

and

D_{R}

are the left and right child nodes, respectively.

Model evaluation is performed across different temporal horizons, including one-month and three-month prediction periods. For each test instance, the encoder maps the input features into the latent space, which is then passed to the classifiers for prediction.

The proposed latent space representation, combined with the aforementioned classifiers, provides a robust framework for predictive maintenance. The backbone model effectively captures and encodes critical features across the entire dataset, mapping high-dimensional input data into a compact latent space. By leveraging this learned representation, the machine learning classifiers operate on a feature space that preserves essential information while reducing noise and redundancy. This approach enhances classification performance, even in scenarios with limited data availability, as the latent space enables more efficient learning of underlying patterns. Consequently, the integration of deep feature extraction with traditional machine learning classifiers improves predictive maintenance outcomes by ensuring more reliable and generalizable decision-making.

4. Data Driven Validations and Results

To evaluate the proposed framework against a rigorous baseline, we selected DeepHit, a prominent deep learning approach in survival analysis. While traditional methods such as Cox proportional hazards (CPHs) and Random Survival Forests (RSFs) are common baselines, DeepHit has demonstrated superior performance in capturing non-linear relationships and handling complex, time-varying covariates in multi-risk scenarios. Its architecture, combining a shared sub-network with risk-specific sub-networks, allows it to learn joint probability distributions directly from data, making it robust for predictive maintenance.

4.1. Experimental Setup

The DeepHit model utilizes a multi-task neural network architecture configured with a shared sub-network to extract common latent features, feeding into distinct cause-specific sub-networks. Each sub-network comprises two fully connected layers of 64 neurons. Batch normalization was applied to the outputs of these dense layers to mitigate internal covariate shift, immediately preceding the non-linear tanh activations. Training was optimized using the Adam optimizer with a learning rate of

0.001

for 200 epochs, utilizing a batch size of 128, minimizing a composite loss of log-likelihood and ranking constraints. To quantify maintenance predictions, we used multiple probability thresholds

τ \in {0.1, 0.25, 0.5}

to convert the predicted cumulative incidence functions (CIFs) into discrete maintenance events.

The proposed autoencoder-based latent space classifiers was trained using the configuration detailed in Section 3.2. In the proposed method, the encoder consists of three fully connected layers with progressively decreasing dimensions: 64, 32, and 16 neurons, respectively. We conducted an ablation study on the latent dimension size (testing sizes of 8, 16, 32, and 64). The selected dimension offered the optimal trade-off between reconstruction fidelity (MSE loss) and classification separability. To prevent overfitting and stabilize learning, batch normalization is applied after the first two layers. Additionally, an

ℓ_{2}

regularization term is introduced with a gain factor

λ

to constrain weight magnitudes and improve generalization. The decoder mirrors the encoder’s structure, reconstructing the original input by successively expanding the feature space through layers of 16, 32, and 64 neurons before the final output layer, which employs a sigmoid activation function. This final activation ensures that reconstructed values remain within a normalized range, making the model particularly effective for datasets with bounded feature distributions. The model was trained for 200 epochs using the Adam optimizer (learning rate

1 \times 10^{- 4}

) and a batch size of 256. The hyperparameters for the latent space classifier were selected based on extensive empirical analysis to maximize performance, and the results presented correspond to the best-performing configurations. Specifically, for the K-Nearest Neighbors classifier, the number of neighbors was set to

k = 10

. For the Decision Tree classifier, the maximum depth was set to 100. Additionally, for the Random Forest classifier, the number of estimators was set to 200 to ensure robust ensemble learning.

For all methods, learning performed an 80–20% train–test split. The data splitting process employed a stratified sampling strategy based on part numbers. This ensures that every component type is proportionally represented in both training and testing sets, preventing scenarios where data-sparse components are isolated solely in the test set.

4.2. Classifier Performances at Test Set

Figure 7 provides a side-by-side comparison of different classification strategies on the test set. In these boxplots, the interquartile ranges represent the spread of average results for each part number, illustrating how performance varies across different PNs. Since the primary utility of the proposed framework is maintenance prediction to support binary inventory decisions such as stock or do not stock at fixed horizons, we prioritized classification metrics F1, accuracy, precision, and recall over rank-based survival metrics (e.g., C-index). Converting standard classifier outputs to continuous risk scores for such metrics was avoided to prevent misleading comparisons arising from uncalibrated probability estimates.

The F1 scores and accuracies highlight the overall effectiveness of each model in correctly identifying the positive class, while precision and recall give deeper insights into trade-offs between false alarms and missed detections. Here, KNN, Decision Tree, and Random Forest show relatively high median values with tight interquartile ranges for most metrics, reflecting more consistent performance. In contrast, the DeepHit classifiers exhibit varying results depending on the chosen probability threshold. Even though a threshold of 0.10 is optimal when maximizing the F1 score, we present results across a range of thresholds to explicitly demonstrate the behavioral shifts in performance: lower thresholds (e.g., 0.10) typically increase sensitivity but can reduce precision, whereas higher thresholds (e.g., 0.50) lower the false alarm rate at the cost of missing more positive cases.

Figure 8, Figure 9 and Figure 10 illustrate the predicted versus actual maintenance decisions for 80 parts using three latent space classifiers: K-Nearest Neighbors, Decision Tree, and Random Forest. In each subplot, the classifiers aim to distinguish components that require maintenance at the test set. Overall, all three models demonstrate a strong alignment between predicted and actual outcomes, with Random Forest outperforming the others in terms of consistency and accuracy. This visual comparison reinforces the suitability of traditional classifiers in leveraging learned latent representations for effective binary classification, especially when part-level features are compactly encoded even when the data is limited.

Figure 11, Figure 12 and Figure 13 show the predictive results of the DeepHit model applied with different probability thresholds (0.10, 0.25, and 0.50). The aggregate predicted maintenance counts at a probability threshold of 0.25 align most closely with the ground truth values, avoiding the systematic overestimation observed at the 0.10 threshold. However, analysis of the classification metrics in Figure 7 reveals that this aggregate alignment is misleading regarding predictive utility. While the 0.25 threshold brings the total number of predicted events closer to reality, the boxplots demonstrate a significant reduction in the recall metric compared to the 0.10 threshold. At lower thresholds, the model tends to predict more maintenance events, closer to reality, capturing more true positives but also increasing the risk of false alarms. As the threshold increases, predictions become more conservative, reducing false positives but also missing potential failures.

4.3. Classifier Performances Across Prediction Horizons

Figure 14 compares the performance of various latent space classifiers—KNN, Decision Tree, Random Forest, and DeepHit (at thresholds 0.10, 0.25, and 0.50)—across three different prediction horizons: 3 months, 6 months, and 1 year. Each row in the figure corresponds to a specific time period, and each column shows one of the evaluation metrics: F1 score, accuracy, precision, and recall. Across all periods, traditional classifiers (KNN, Decision Tree, and Random Forest) generally outperform DeepHit-based methods in terms of F1 score and accuracy. Random Forest consistently shows strong performance with high median scores and low variability, indicating robust generalization across different parts.

As the prediction window lengthens from 3 months to 1 year, a general decline is observed in F1 score, accuracy, and recall across most classifiers, reflecting the increasing difficulty of making accurate predictions over extended horizons. Notably, the interquartile range for F1 scores tightens at the 1-year mark, indicating more consistent performance across different parts despite the lower average metrics. One possible explanation is that by one year, most components either clearly require maintenance or clearly do not, reducing variability in predicted outcomes across parts. Precision remains relatively stable for many models and even shows slight improvement in some cases, especially for the more conservative classifiers like Random Forest and DeepHit with higher thresholds (e.g., 0.50). This behavior suggests that while models become less sensitive to detecting true maintenance events over time (lower recall), their specificity in correctly identifying necessary interventions does not degrade as markedly. DeepHit models exemplify this trade-off clearly: higher thresholds lead to higher precision but lower recall, whereas lower thresholds capture more true positives at the expense of false positives.

Figure 15, Figure 16 and Figure 17 show the predictions of various latent space classifiers and DeepHit (with thresholds 0.10, 0.25, and 0.50) for 3-month, 6-month, and 1-year prediction horizons, respectively. The results indicate that DeepHit significantly overestimates maintenance counts at lower thresholds (0.10 and 0.25), demonstrating a tendency to produce false positives when adopting less restrictive criteria.

As the prediction horizon expands from 3 months (Figure 15) to 6 months (Figure 16) and ultimately to 1 year (Figure 17), the variability in classifier predictions tends to decrease, leading to more consistent and stable results. This phenomenon could be attributed to reduced data noise over longer horizons; for instance, certain components predicted to require immediate maintenance within 3 months may continue operating beyond that period, thus introducing uncertainty into short-term predictions. Longer-term predictions allow classifiers to better generalize and reduce false positives and negatives caused by transient anomalies or short-term operational variations. These observations emphasize the necessity of carefully selecting appropriate thresholds and classifiers tailored to specific maintenance objectives and prediction horizons.

The predicted maintenance outcomes produced by the proposed framework can be directly utilized to support seamless airline operations by enabling proactive planning and timely interventions. By anticipating which components are likely to require maintenance within a specified horizon, airlines can align their maintenance schedules and logistics operations to reduce unplanned downtimes. These predictions also serve as a basis for dynamic inventory management—allowing planners to better estimate required quantities of spare parts to stock, lease, or redistribute across maintenance bases. This predictive capability ensures high fleet availability, minimizes last-minute part sourcing, and enhances overall operational resilience, particularly for high-rotation or mission-critical components.

Effective inventory planning in aviation depends not only on accurate maintenance forecasts but also on aligning predictions with operational planning horizons. Short-term predictions (e.g., 3-month) are essential for immediate procurement and reactive inventory management but are prone to higher variability due to transient operational anomalies and noise in maintenance records. In contrast, longer-term forecasts (6-month to 1-year) enable strategic planning by smoothing out short-term fluctuations and providing more stable demand estimates. The proposed latent space classification framework, supported by a shared encoder trained on global data, demonstrates superior robustness across all horizons, especially in data-scarce settings. By maintaining consistent prediction accuracy and minimizing false alarms, it enables both responsive short-term actions and reliable long-term stocking strategies. This dual capability provides airlines with a practical advantage in balancing just-in-time logistics and cost-efficient spare part provisioning across varied planning intervals. Furthermore, unlike complex sequential models that require processing long historical dependencies, the proposed framework operates on aggregated snapshot features. This architectural simplicity ensures low computational overhead, enabling rapid retraining and near real-time inference, making it highly suitable for daily operational updates in an airline environment.

5. Conclusions

This study presents a predictive maintenance framework tailored for aviation, addressing challenges posed by sparse, imbalanced, and irregular maintenance records across diverse aircraft components. Designed for a large-scale airline operation managing a fleet of over 500 aircraft, the framework leverages a shared encoder backbone trained on all available part data. By leveraging a shared encoder backbone trained on all available part data, the proposed latent space classification approach mitigates data imbalance and enhances generalization for part numbers with limited historical observations. This encoder-based method is systematically compared with DeepHit, a state-of-the-art survival analysis model, enabling a thorough evaluation of discriminative versus probabilistic strategies for forecasting component-level maintenance needs.

The comparative results show that while DeepHit provides valuable insights into survival probabilities and maintenance timing, its sensitivity to threshold selection and performance degradation in low-data regimes limits its robustness across heterogeneous fleets. In contrast, the latent space classifiers—particularly the Random Forest model—maintain consistent accuracy across part numbers and prediction horizons, demonstrating superior adaptability in sparse data environments. This stability becomes increasingly important in longer-term forecasts, where operational noise is reduced and demand trends become clearer.

Accurate predictions of maintenance events have a direct and measurable impact on inventory control and logistics coordination in airline operations. By aligning forecast horizons—such as 3-month, 6-month, and 1-year intervals—with planning cycles, the proposed framework enables both short-term tactical decision-making and long-term strategic inventory management. The encoder-based classification approach supports seamless scheduling by producing reliable estimates across all horizons, thereby reducing unexpected component shortages and improving spare part provisioning process. This capability is particularly critical in high-scale operations, where minimizing disruption and maximizing fleet availability hinge on anticipating part demand with high fidelity.

In conclusion, the proposed framework offers a practical and scalable solution for predictive maintenance and inventory planning in aviation. By combining global feature learning with part-specific classifiers, it bridges the gap between data-rich and data-poor components, delivering actionable insights that enhance reliability, reduce downtime, and support cost-effective operations across complex fleets.

Beyond its predictive accuracy, the real-world implication of this framework lies in its potential to transform reactive maintenance into a proactive strategy. By integrating these forecasts into inventory planning, airlines can optimize spare part logistics and significantly reduce costs associated with unexpected Aircraft-on-Ground (AOG) events. However, a limitation of the current study is its reliance on historical maintenance logs and cumulative usage metrics alone. While effective for capturing general degradation trends, this approach does not account for external operational variances—such as harsh environmental conditions or specific route characteristics—that may accelerate wear for individual components independently of flight hours. Future work will focus on improving the interpretability of model decisions, particularly in high-stakes maintenance scenarios. In addition, incorporating additional data sources, such as sensor streams, environmental records, or operational context logs, can further enhance the predictive power and robustness of the models. Such integration will help contextualize maintenance needs more accurately and enable deeper insight into component behavior beyond what is captured in log-based datasets alone.

Author Contributions

Conceptualization, methodology, software, writing—original draft preparation, F.S.Ç.; writing—supervising, review and editing, U.Ü., E.K., and İ.Ö. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partly funded by Turkish Technic Inc. (Grant No. 4502295549).

Data Availability Statement

The data presented in this study are not publicly available due to commercial restrictions and confidentiality agreements.

Acknowledgments

During the preparation of this manuscript/study, the authors used Gemini 3 Pro for the purposes of grammar and spell check. The authors have reviewed and edited the output and take full responsibility for the content of this publication. The authors would like to express their sincere gratitude to Turkish Technic Inc. and the Component Services Directorate for their invaluable support in the scope of the Turna Project. Their contributions and insights have been instrumental in facilitating this study, providing essential data, and guiding the research toward practical applications in aircraft maintenance optimization.

Conflicts of Interest

The author Ufuk Üngör was employed by the company Turkish Technic Inc. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MRO	Maintenance, Repair and Overhaul
RUL	Remaining Useful Life
AI	Artificial Intelligence
PHM	Prognostics and Health Management
CNN-BiLSTM	Convolutional Neural Networks and a Bidirectional Long Short-Term Memory Network
LSTM	Long Short-Term Memory
GRU	Gated Recurrent Unit
CNN	Convolutional Neural Networks
SVM	Support Vector Machine
GAN	Generative Adversarial Networks
APU	Auxiliary Power Unit
RNN	Recurrent Neural Network
PN	Part Number
SN	Serial Number
MSE	Mean Squared Error
KNNs	K-Nearest Neighbors
CPHs	Cox Proportional Hazards
RSFs	Random Survival Forests
AOG	Aircraft-on-Ground

References

Stanton, I.; Munir, K.; Ikram, A.; El-Bakry, M. Predictive Maintenance Analytics and Implementation for Aircraft: Challenges and Opportunities. Syst. Eng. 2023, 26, 216–237. [Google Scholar] [CrossRef]
Yan, H.; Zuo, H.; Tang, J.; Wang, R.; Ma, X. Predictive Maintenance Framework of the Aircraft System Based on PHM Information. In Proceedings of the 2020 Asia-Pacific International Symposium on Advanced Reliability and Maintenance Modeling (APARM), Vancouver, BC, Canada, 20–23 August 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar] [CrossRef]
Heim, S.; Clemens, J.; Steck, J.E.; Basic, C.; Timmons, D.; Zwiener, K. Predictive Maintenance on Aircraft and Applications with Digital Twin. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 4122–4127. [Google Scholar] [CrossRef]
Wang, L.; Chen, Y.; Zhao, X.; Xiang, J. Predictive Maintenance Scheduling for Aircraft Engines Based on Remaining Useful Life Prediction. IEEE Internet Things J. 2024, 11, 23020–23031. [Google Scholar] [CrossRef]
Spexet, A.; Breen, N.; Satish, A.; Alvord, D. Intelligent Summarization of Aviation Data for Maintenance. In Proceedings of the 2024 IEEE Aerospace Conference, Big Sky, MT, USA, 2–9 March 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–10. [Google Scholar] [CrossRef]
Dangut, M.D.; Jennions, I.K.; King, S.; Skaf, Z. A rare failure detection model for aircraft predictive maintenance using a deep hybrid learning approach. Neural Comput. Appl. 2023, 35, 2991–3009. [Google Scholar] [CrossRef]
Dangut, M.D.; Skaf, Z.; Jennions, I.K. An integrated machine learning model for aircraft components rare failure prognostics with log-based dataset. ISA Trans. 2021, 113, 127–139. [Google Scholar] [CrossRef] [PubMed]
Khattak, W.R.; Salman, A.; Ghafoor, S.; Latif, S. Multi-modal LSTM network for anomaly prediction in piston engine aircraft. Heliyon 2024, 10, e25120. [Google Scholar] [CrossRef] [PubMed]
Shoorkand, A.; Nourelfath, M.; Hajji, A. A hybrid CNN-LSTM model for joint optimization of production and imperfect predictive maintenance planning. Reliab. Eng. Syst. Saf. 2024, 241, 109707. [Google Scholar] [CrossRef]
Zhou, D.; Zhuang, X.; Zuo, H.; Wang, H.; Yan, H. Deep Learning-Based Approach for Civil Aircraft Hazard Identification and Prediction. IEEE Access 2020, 8, 103665–103683. [Google Scholar] [CrossRef]
González-Muñiz, A.; Díaz, I.; Cuadrado, A.A.; García-Pérez, D. Health indicator for machine condition monitoring built in the latent space of a deep autoencoder. Reliab. Eng. Syst. Saf. 2022, 224, 108482. [Google Scholar] [CrossRef]
Fan, Y.; Nowaczyk, S.; Rögnvaldsson, T. Transfer learning for remaining useful life prediction based on consensus self-organizing models. Reliab. Eng. Syst. Saf. 2020, 203, 107098. [Google Scholar] [CrossRef]
Alzubaidi, L.; Bai, J.; Al-Sabaawi, A.; Santamaría, J.; Albahri, A.S.; Al-Dabbagh, B.S.N.; Fadhel, M.A.; Manoufali, M.; Zhang, J.; Al-Timemy, A.H.; et al. A survey on deep learning tools dealing with data scarcity: Definitions, challenges, solutions, tips, and applications. J. Big Data 2023, 10, 46. [Google Scholar] [CrossRef]
Wang, F.; Sun, J.; Liu, X.; Liu, C. Aircraft auxiliary power unit performance assessment and remaining useful life evaluation for predictive maintenance. Proc. Inst. Mech. Eng. Part A J. Power Energy 2019, 234, 804–816. [Google Scholar] [CrossRef]
Chen, C.; Lu, N.; Jiang, B.; Wang, C. A Risk-Averse Remaining Useful Life Estimation for Predictive Maintenance. IEEE/CAA J. Autom. Sin. 2021, 8, 412–422. [Google Scholar] [CrossRef]
Lee, C.; Zame, W.R.; Yoon, J.; van der Schaar, M. DeepHit: A Deep Learning Approach to Survival Analysis with Competing Risks. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar] [CrossRef]
Yuan, Y.; Zhou, S.; Sievenpiper, C.; Mannar, K.; Zheng, Y. Event Log Modeling and Analysis for System Failure Prediction. Proc. Inst. Mech. Eng. Part A J. Power Energy 2011, 43, 647–660. [Google Scholar] [CrossRef]
Petsinis, P.; Naskos, A.; Gounaris, A. Analysis of key flavors of event-driven predictive maintenance using logs of phenomena described by Weibull distributions. arXiv 2021, arXiv:2101.07033. [Google Scholar] [CrossRef]
Mathew, V.; Toby, T.; Singh, V.; Rao, B.M.; Kumar, M.G. Prediction of Remaining Useful Lifetime (RUL) of Turbofan Engine using Machine Learning. In Proceedings of the 2017 IEEE International Conference on Circuits and Systems (ICCS 2017), Thiruvananthapuram, India, 20–21 December 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 306–311. [Google Scholar] [CrossRef]
Kizito, R.; Scruggs, P.; Li, X.; Kress, R.; Devinney, M.; Berg, T. The Application of Random Forest to Predictive Maintenance. In Proceedings of the 2018 IISE Annual Conference, Orlando, FL, USA, 19–22 May 2018. [Google Scholar]
Susto, G.A.; Schirru, A.; Pampuri, S.; McLoone, S.; Beghi, A. Machine Learning for Predictive Maintenance: A Multiple Classifier Approach. IEEE Trans. Ind. Inform. 2015, 11, 812–820. [Google Scholar] [CrossRef]
Kaparthi, S.; Bumblauskas, D. Designing predictive maintenance systems using decision tree-based machine learning techniques. Int. J. Qual. Reliab. Manag. 2020, 37, 659–686. [Google Scholar] [CrossRef]

Figure 1. Example from fleet-wise component maintenance planning data (proprietary data fields have been anonymized for confidentiality).

Figure 2. Example from fleet-wise component maintenance log (proprietary data fields have been anonymized for confidentiality).

Figure 3. Distribution of maintenance data across different part numbers. The x-axis indicates the part number index (ranked by total number of maintenance records), while the y-axis represents the frequency of maintenance data. Notice that some part numbers have significantly fewer records, highlighting data imbalance between part numbers.

Figure 4. Employed DeepHit neural network architecture with K maintenance reason sub-networks.

Figure 5. Architectural diagram of an autoencoder. The input

x

is compressed by the encoder into a lower-dimensional latent space representation z, and then reconstructed by the decoder into the output

x^{'}

. The goal is for the output to equal the input.

Figure 5. Architectural diagram of an autoencoder. The input

x

is compressed by the encoder into a lower-dimensional latent space representation z, and then reconstructed by the decoder into the output

x^{'}

. The goal is for the output to equal the input.

Figure 6. The submodel for each component.

Figure 7. Evaluation metrics of different classifiers on test set (F1 score, accuracy, precision, and recall).

Figure 8. Latent space KNN classifier.

Figure 9. Latent space Decision Tree classifier.

Figure 10. Latent space Random Forest classifier.

Figure 11. DeepHit maintenance results (threshold 0.10).

Figure 12. DeepHit maintenance results (threshold 0.25).

Figure 13. DeepHit maintenance results (threshold 0.50).

Figure 14. Classifier performances for 3 month, 6 month, and 1 year periods.

Figure 15. Three month predictions of latent space classifiers and DeepHit for 10 parts.

Figure 16. Six month predictions of latent space classifiers and DeepHit for 10 parts.

Figure 17. One year predictions of latent space classifiers and DeepHit for 10 parts.

Table 1. Description of data fields, feature engineering status, and usage in the predictive model.

Data Field	Description	Feature Eng.	Used as Feature in Model
`PN`	Unique identifier for the component type (part number).	-	-
`SN`	Unique identifier for the specific unit (serial number).	-	-
`AC`	Aircraft tail number identifier.	-	-
`AC_TYPE`	The model or type of the aircraft (e.g., A320, B737).	-	✓
`DATE`	The calendar date of the record/ transaction.	-	-
`AC_AGE`	The operational age of the aircraft since manufacture.	-	✓
`SN_AGE`	The operational age of the specific component serial number.	-	✓
`SN_FH`	Flight hours accumulated on the component since last install.	-	✓
`SN_FC`	Flight cycles accumulated on the component since last install.	-	✓
`TOTAL_SN_FH`	Total accumulated flight hours over the component’s lifetime.	✓	✓
`TOTAL_SN_FC`	Total accumulated flight cycles over the component’s lifetime.	✓	✓
`NO_OF_PREV_MAINT`	Aggregated count of previous maintenance events for the SN.	✓	✓
`CONDITION`	The condition of the removed part.	-	✓
`ATA_CHAPTER`	System classification code (Air Transport Association).	-	-
`REASON_CATEGORY`	The reported reason for the removal or maintenance.	-	-
`LONGITUDE/LATITUDE`	GPS coordinates of the maintenance location.	-	-

Note: ✓ indicates the field was created via feature engineering or used as a model feature; - indicates it was not.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sezenoğlu Çetin, F.; Üngör, U.; Koyuncu, E.; Özkol, İ. Data-Driven Predictive Maintenance for Aircraft Components Through Sparse Event Logs. Aerospace 2026, 13, 110. https://doi.org/10.3390/aerospace13010110

AMA Style

Sezenoğlu Çetin F, Üngör U, Koyuncu E, Özkol İ. Data-Driven Predictive Maintenance for Aircraft Components Through Sparse Event Logs. Aerospace. 2026; 13(1):110. https://doi.org/10.3390/aerospace13010110

Chicago/Turabian Style

Sezenoğlu Çetin, Fulin, Ufuk Üngör, Emre Koyuncu, and İbrahim Özkol. 2026. "Data-Driven Predictive Maintenance for Aircraft Components Through Sparse Event Logs" Aerospace 13, no. 1: 110. https://doi.org/10.3390/aerospace13010110

APA Style

Sezenoğlu Çetin, F., Üngör, U., Koyuncu, E., & Özkol, İ. (2026). Data-Driven Predictive Maintenance for Aircraft Components Through Sparse Event Logs. Aerospace, 13(1), 110. https://doi.org/10.3390/aerospace13010110

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Data-Driven Predictive Maintenance for Aircraft Components Through Sparse Event Logs

Abstract

1. Introduction

2. Data Preparation

3. Maintenance Event Prediction

3.1. Maintenance Prediction with Survival Analysis: DeepHit

3.2. Proposed Method: Maintenance Prediction with Autoencoder and Latent Space Classifier

3.2.1. Autoencoder Architecture

3.2.2. Latent Space Classifiers

4. Data Driven Validations and Results

4.1. Experimental Setup

4.2. Classifier Performances at Test Set

4.3. Classifier Performances Across Prediction Horizons

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI