1. Introduction
The recent surge in Foundation Models as a Service (FMaaS) is reshaping the paradigm of intelligent computation [
1], especially in scenarios involving mobile, embedded, and resource-constrained devices. Instead of running large-scale models locally, which would require substantial memory, computational capacity, and energy, users can now leverage on-demand access to powerful Foundation Models (FMs) deployed across distributed infrastructures. These FMs include Large Language Models (LLMs), vision transformers, and multimodal architectures capable of performing a wide range of complex tasks such as text generation [
2], image captioning [
3], translation, summarization [
4], and question answering [
5].
In this context, inference and model adaptation (e.g., fine-tuning or prompt engineering) are offloaded to edge servers or cloud-based platforms, enabling end-user devices to interact with state-of-the-art intelligence through lightweight queries. This decoupling between the model and the device fosters new capabilities in fields such as autonomous systems, smart sensing, and ubiquitous computing while also introducing new challenges related to latency, reliability, and the efficient use of network and computational resources.
The flexibility and effectiveness of FMs make them attractive for real-time decision-making and adaptive service provisioning; yet, their integration into edge networks requires careful orchestration. This includes task placement, resource allocation, and lifetime-aware scheduling, particularly when considering the fragility and aging of edge infrastructure components. As such, FMaaS represents not only a shift in the software delivery model but also a new optimization challenge across the communication-computation continuum. However, the computation nodes hosting FMs typically run containerized or virtualized instances of these learning models within software-defined environments deployed on general-purpose hardware. While such flexibility allows for scalability and dynamic allocation, it also exposes the system to challenges stemming from the software lifecycle [
6]. Furthermore, the phenomenon of software aging, i.e., the gradual degradation of software quality due to long-running execution, memory leaks, and resource exhaustion, can significantly impact the reliability and responsiveness of deployed services [
7]. In particular, controlled experiments designed to reveal anomalous software behavior or failures are described in [
8]; a one-dimensional metric for evaluating SA and its temporal evolution is proposed in [
9]; and Reference [
10] outlines a model for analyzing software rejuvenation in continuously running applications, quantifying both downtime and the associated costs incurred during rejuvenation. Although conventional offloading strategies consider metrics such as queue length, processing time, and radio link quality, they often disregard the internal health of the software stack. Yet, the operational state of the software plays a central role in determining whether an edge node can effectively support a prolonged session of inference or on-the-fly fine-tuning. In this context, forecasting software aging evolution is key to anticipating service degradation. To this end, we propose a lightweight time-series prediction module at each edge node based on Echo State Networks (ESNs). ESNs are a class of Recurrent Neural Networks (RNNs) particularly suitable for modeling complex nonlinear dynamics with minimal training overhead. In our architecture, each node runs an ESN that predicts the short-term trajectory of its software age. Based on these forecasts, incoming user requests can be directed to nodes that are not only close in terms of latency but also expected to remain stable in the near future.
This paper introduces a reliability-aware FMaaS offloading framework that combines ESN-based software aging prediction with task assignment logic across a network of edge nodes. The proposed system aims to extend the duration and quality of intelligent sessions by proactively avoiding nodes with deteriorating software conditions.
The main contributions of this paper are:
We propose a distributed FMaaS architecture that enables users to offload inference and fine-tuning requests to a pool of heterogeneous edge nodes.
We integrate an ESN at each service node that is trained to forecast the evolution of its software age, thereby enabling predictive reliability estimation. A lightweight offloading policy is designed to account for communication cost, inference cost, and predicted software age, allowing us to optimize user-to-node mapping.
We evaluate the proposed framework through simulations, demonstrating significant improvements in session continuity and failure prevention compared to traditional strategies.
To the best of the authors’ knowledge, this is the first study to incorporate software age forecasting into FMaaS offloading strategies with the aim of proactively mitigating degradation risks and enhancing the resilience of foundation model services in dynamic software-driven environments.
The rest of the paper is organized as follows:
Section 2 presents an in-depth review of the related literature;
Section 3 details the problem statement;
Section 4 proposes the integrated communication-inference-software age approach; the system architecture according to the 1+5 Architectural Views Model is discussed in
Section 6; the performance evaluation is addressed in
Section 7; finally, we discuss the limitation of the proposed approach and present our conclusions are in
Section 8 and
Section 9, respectively.
2. Related Works
Software Aging (SA) has been widely studied as a phenomenon that affects long-running software systems, leading to performance degradation [
11], increased failure rates, and eventual system crashes. Recent studies have extensively investigated SA in containerized edge architectures [
12] and heterogeneous virtual networks [
6]. The main goal has been to identify aging-related symptoms such as memory leaks, resource exhaustion, and numerical error accumulation, often by relying on analytical models, threshold-based monitoring, and rejuvenation strategies scheduled at fixed or adaptive intervals. In particular, the problem of online failure prediction has been investigated in numerous papers. In [
13], the authors proposed a lightweight failure prediction approach to reduce the overhead due to data collection on which the runtime prediction of failure manifestation is performed. A feature extraction-based disk device failures approach was developed in [
14], where long-short term memory was applied. In [
15], an entropy-based aging metric was formulated to develop a three-failure prediction approach. Experimental evaluations considered a system supporting on-demand video streaming and highlighted the validity of the approach. In [
16], the authors integrated data analytics within a fifth-generation network in order to perform runtime time series analysis, allowing them to forecast any threats in the system which could lead to system failure. A proactive fault tolerance framework for modern large-scale storage systems was developed in [
17], where disk failures were predicted by designing a unified framework able to extract features in real time, label samples, and train a prediction model. The framework incorporated an online transfer learning procedure. In another approach, a structured Hierarchical Temporal Memory (HTM)-based failure forecasting framework was proposed by [
18].
In recent years, data-driven approaches have gained prominence in addressing SA. Classical machine learning techniques such as regression models, support vector machines, decision trees, and clustering algorithms have been employed to predict aging indicators and estimate time to failure. These methods typically rely on handcrafted features extracted from system-level metrics such as CPU usage, memory consumption, Input/Output (I/O) statistics, and response times. While effective in controlled environments, their applicability is often limited by feature engineering costs, poor generalization across heterogeneous systems, and sensitivity to workload variations.
Deep learning models such as RNNs, LSTMs, and autoencoders have been proposed to capture temporal dependencies and nonlinear aging patterns in time series monitoring data. For example, hybrid LSTM models have demonstrated improved forecasting accuracy for SA prediction tasks compared to classical approaches [
19]. In broader time-series contexts, deep architectures such as LSTM and RNN variants achieve higher anomaly detection accuracy and richer temporal pattern extraction than traditional Machine Learning (ML) methods [
20]. Similarly, LSTM-based autoencoder frameworks have been effectively used for unsupervised anomaly detection by reconstructing normal behavior and highlighting deviations [
21]. However, these models usually require significant amounts of labeled data, system-specific training, and dedicated computational resources, which can hinder their adoption in large-scale or multi-tenant environments.
More recent studies have explored Artificial Intelligence (AI)-driven and self-adaptive frameworks for SA management, integrating monitoring, prediction, and rejuvenation decision-making into closed control loops. In particular, reinforcement learning has been investigated to dynamically schedule rejuvenation actions by balancing availability, performance, and operational costs [
22]. Systematic reviews have shown that Reinforcement Learning (RL) methods can enable adaptation policies in software systems by learning from interactions with the runtime environment [
23], and online reinforcement learning has been proposed to continuously update adaptation strategies in the presence of environmental uncertainty [
24]. Although promising, these approaches often assume a static system model and struggle to adapt to evolving software architectures such as microservices, cloud-native platforms, and serverless environments.
The emergence of FMs has recently introduced a paradigm shift in artificial intelligence, moving from task-specific models toward large-scale [
25] pretrained architectures capable of general-purpose reasoning [
26] and transfer learning across domains [
27]. The effectiveness of this paradigm has been further demonstrated by LLMs, which exhibit strong few-shot and zero-shot learning capabilities [
28] along with emergent reasoning behaviors across complex analytical tasks [
29]. In the context of software systems, FMs have attracted increasing attention within the software engineering community. Recent surveys have reported their successful application to activities such as code understanding, configuration analysis, log interpretation, anomaly detection, and root cause analysis [
30]. In particular, LLM-based approaches have shown promising results in processing unstructured and semi-structured operational data [
31], including system logs and execution traces, often outperforming traditional rule-based or ML techniques [
32]. These capabilities are especially relevant for SA, where degradation symptoms are often subtle, temporally distributed, and spread across heterogeneous data sources. Alongside the evolution of FMs, their deployment paradigm has also shifted toward cloud-based delivery models, commonly referred to as FMaaS. In [
33], the authors describe FMaaS as a key enabler for scalable and cost-effective AI adoption that allows organizations to access continuously updated models without the need for local training or infrastructure management. From an operational perspective, FMaaS aligns naturally with modern cloud-native and microservice-based systems, where observability data are already centralized and AI-driven analysis can be seamlessly integrated into monitoring pipelines. Despite these advances, the application of FMs and FMaaS to SA remains largely unexplored. While traditional SA studies focus on analytical modeling and system-specific predictors [
30], FM-based approaches offer the potential to reason holistically over performance metrics, logs, configuration changes, and workload evolution. Moreover, FMs can act as cognitive components within self-healing and autonomic computing frameworks, allowing them to support aging-aware diagnosis and decision-making. However, open challenges persist regarding explainability, reliability of predictions, data privacy, and the validation of FM-driven actions in production environments. Overall, existing literature suggests that FMs and FMaaS represent a promising yet under-investigated direction for SA analysis and mitigation, motivating further research into unified, AI-driven, and service-oriented aging management frameworks. In particular, existing solutions typically impose significant computational overhead, rendering them unsuitable for continuous FM inference at the edge. To address this gap, unlike existing approaches, which mainly tackle SA through system-specific predictors, centralized analytics, or reactive rejuvenation mechanisms, this work introduces a distributed FMaaS-oriented perspective for SA-aware service management. We propose a framework in which FM inference and fine-tuning are dynamically offloaded across heterogeneous ENs, explicitly accounting for the SA state of each service instance. Unlike prior data-driven SA solutions that focus on failure prediction in isolation, our approach directly integrates predictive reliability estimation (obtained via lightweight ESNs trained locally at each EN) into the offloading decision process. This enables proactive SA-aware user-to-node mapping that jointly considers communication overhead, inference cost, and predicted software degradation.
4. Software Aging Forecasting
ESNs are a specific class of RNNs developed within the Reservoir Computing paradigm, which enables temporal pattern learning with minimal training complexity. Unlike conventional RNNs, ESNs avoid issues such as vanishing gradients during backpropagation thanks to their distinctive architecture in which only the output layer undergoes supervised training [
36]. The ESN architecture consists of three main components [
36]:
The input weight matrix that maps the external input signal into the reservoir. This matrix is not trained; it is randomly initialized and kept fixed during learning;
The recurrent reservoir matrix that defines the internal recurrent connections among the reservoir neurons;
The output weight matrix that maps the reservoir state to the network output and represents the only trainable component of the model.
The reservoir itself comprises a large number of sparsely connected nonlinear units with internal connections that remain fixed after initialization. Combined with random initialization and a properly scaled spectral radius, this sparsity enables the reservoir to encode temporal dependencies over time with high efficiency. Formally, let
be the input vector at time
q. The reservoir state vector
evolves according to
where the tanh activation captures the nonlinear dynamics of the internal state update. The output is then computed as
where
is the only trainable parameter set, typically learned via ridge regression over a set of collected internal states and desired outputs.
The reservoir component plays a central role in retaining memory of past inputs, making ESNs suitable for time series prediction problems such as estimating the residual life or failure likelihood of ENs based on SA indicators. In our case, each
is associated with a time series
representing its software age evolution, i.e., the degradation trend due to factors such as memory leaks or resource saturation. The ESN is trained to forecast the software age for time horizons
ranging from 1 up to a value typically matching the total service time experienced by the client [
37], i.e.,
.
To improve prediction accuracy, we adopt a Genetic Algorithm (GA) to optimize the ESN hyperparameters, specifically the number of reservoir neurons R and the spectral radius . Unlike exhaustive grid searches or manual tuning, the GA provides an adaptive search over the hyperparameter space, favoring configurations that minimize prediction error.
The GA operates iteratively on a population of candidate solutions
, evaluating each individual via a fitness function defined as the ESN’s prediction error on the training set. At each generation, the top-performing individuals (the “elite”) are retained, while new individuals are generated through crossover and mutation operations. This process continues until a maximum number of generations
q is reached [
38] or convergence criteria are met [
39].
5. Proposed Algorithm
In this section, we detail the offloading heuristic designed to minimize the SA risk during task assignment in FMaaS systems (see Algorithm 1). The goal is to assign each user request to an EN that offers the best trade-off between latency and reliability, without violating capacity constraints and while accounting for potential software degradation over the task execution window.
The algorithm proceeds iteratively: each user selects the node minimizing the latency-risk metric, given by
where the last term represents the maximum forecast software age of the
during the expected task execution interval, with
. This pessimistic evaluation captures the worst-case aging condition that may impact session continuity.
A task computation proposal i is submitted to the minimizing . Upon receiving a set of proposals, each evaluates the candidates based on their associated SA risk, i.e., the maximum during execution. The node accepts the proposal with the lowest such risk as long as its available computational budget is sufficient to accommodate the task. All other proposals are rejected.
Rejected clients are allowed to reattempt assignment in subsequent rounds. Importantly, ENs are not excluded from future proposals by clients that were previously rejected; clients may retry the same EN if they still consider it the most reliable according to (
16). This feedback-driven mechanism continues until all tasks are successfully assigned or all nodes are saturated.
| Algorithm 1 Risk-Aware Offloading Heuristic |
- 1:
Input: Clients , edge nodes , forecasts , task memory , capacities , weight - 2:
Initialize all clients as unassigned - 3:
while there exist unassigned clients do - 4:
for all unassigned clients do - 5:
for all nodes do - 6:
Compute: - 7:
end for - 8:
Propose to - 9:
end for - 10:
for all nodes j with proposals do - 11:
Let be the set of proposals received - 12:
Select - 13:
if available capacity of j then - 14:
Accept , update - 15:
end if - 16:
Reject all - 17:
end for - 18:
end while
|
6. System Architecture
The proposed system architecture is illustrated in
Figure 2 and is described according to the 1+5 Architectural Views Model, with the use-case view acting as the integrating perspective. The 1+5 Architectural Views Model [
40,
41] is an architecture description approach that represents a system through six complementary views, all driven by a single set of architectural requirements and scenarios. The model separates concerns to address different stakeholder needs while maintaining architectural consistency. The views include Integrated Processes, Use Cases, Logical, Contracts, Integrated Services, and Deployment Perspectives, enabling comprehensive understanding, communication, and validation of the system architecture. From a logical view, the system consists of clients connected through a wireless access network to a set of ENs, which jointly provide FMaaS capabilities. Each EN embeds a modular execution stack comprising a foundation model inference engine, a software aging monitoring component, and an ESN-based aging predictor, following architectural principles similar to those adopted in reliability-aware edge systems such as those discussed in [
41]. The process view captures the dynamic interaction among components, including task offloading, parallel inference execution, and a closed-loop monitoring–forecasting–decision process that proactively incorporates predicted software aging into resource selection. From a development view, the architecture is organized into loosely coupled and reusable components, enabling extensibility and facilitating the integration of predictive reliability mechanisms, as advocated by recent edge intelligence frameworks. The Deployment view, shown in
Figure 1, depicts the deployment of ENs in proximity to Small Base Stations within a distributed wireless edge infrastructure, where nodes operate under constrained computational and memory resources. Finally, the use-case view consolidates all architectural perspectives by modeling the main operational scenario in which a client request is dynamically assigned to the most suitable EN by jointly considering latency, inference performance, and predicted software aging, thereby enhancing service continuity and system resilience.
7. Performance Analysis
A performance analysis is provided to validate the effectiveness of the proposed prediction scheme and demonstrate the overall framework’s ability to ensure service continuity in the presence of EN failures. The objective of our analysis is to answer to the following two research questions:
Q1. How does the prediction accuracy vary as the temporal forecasting horizon increases, including in a comparison with a different prediction method?
Q2. How does the proposed FMaaS framework perform in comparison with the same scheme without the SA forecasting?
In particular, three accuracy error metrics have been taken into account in order to deeply test the performance of the proposed ESN. We have considered the Mean Squared Error (MSE), defined as
where
N represents the number of samples in the test dataset, while
and
represent the actual and forecasted values at time
, respectively. In addition, we evaluate the Mean Percent Error (MAPE), defined as
and the Mean Absolute Deviation (MAD), for which the definition is
To generate synthetic traces of SA, we simulate each EN’s lifetime using log-normal distributions to capture the variability in software degradation patterns. For each configuration, 400 samples are generated, representing different time evolutions of software age. The system consists of three ENs serving a spatially distributed population of clients with positions that follow a Poisson point process. To assess scalability, the number of clients is varied from 10 to 40. Software age predictions are learned using ESNs, trained to minimize the MSE between predicted and actual aging trajectories. The ESN hyperparameters are tuned through a genetic algorithm with a fixed budget of generations and population size, i.e., a population of 30 individuals and a budget of 50 generations. The critical software age threshold is set to
, reflecting the point beyond which the risk of failure significantly increases in the simulated aging traces. We evaluate the predictive accuracy of the proposed ESN-based forecasting model against a baseline Moving Average (MA) approach in which the prediction horizon is varied from short-term to long-term windows. As shown in
Figure 3, the ESN consistently achieves lower values of MSE across all horizons, demonstrating superior ability to capture the nonlinear aging dynamics. The performance gap becomes more pronounced as the prediction horizon increases, confirming the advantage of temporal memory and internal state evolution in ESNs when modeling complex software degradation patterns. This behavior is further confirmed in
Figure 4 and
Figure 5, where the ESN also outperforms the baseline in terms of MAPE and MAD, respectively. These results underline the robustness of ESN-based forecasting for aging-aware edge management.
Figure 6 presents the failure rate as a function of the number of clients, comparing the full proposed strategy with an ablated version that omits the forecasting module. In the latter, task assignment decisions are based solely on communication and inference latency, without considering SA trends.
Figure 7 shows the failure rate as a function of the critical software age threshold. By increasing the critical software age indicator, the system progressively reduces the number of potential failure events. Higher threshold values implicitly lower the probability of software-induced failures. This behavior moves the system closer to an ideal (but not practically attainable) operating regime in which software components do not fail and the system achieves full reliability.
Figure 8 reports the system failure rate as a function of the number of edge nodes. As expected, increasing the number of nodes leads to a higher overall failure rate due to the larger set of software instances and execution points involved in task processing. However, the proposed forecast-aware strategy consistently achieves lower failure rates than the baseline approach across all configurations. This indicates that predictive information remains effective in mitigating software-induced failures even as the system scales. The presence of narrow confidence intervals further suggests that the observed performance gap is statistically stable and not attributable to random fluctuations.
Table 1,
Table 2 and
Table 3 jointly provide an ablation and sensitivity analysis of the proposed GA-based hyperparameter optimization.
Table 1 isolates the impact of reservoir size on prediction accuracy, highlighting the systematic performance improvement achieved by the GA-optimized ESN with respect to the standard configuration.
Table 2 and
Table 3 further analyze the sensitivity of the optimization process to the population size, reporting the corresponding gains in accuracy and the associated computational cost. As evidenced by the reported values, we selected a configuration that represents a balanced tradeoff between prediction accuracy and computational effort. As the figure shows, removing software age forecasting leads to a significant increase in failure events, especially under higher load conditions. This confirms the central insight of our work: that integrating predictive information on software degradation into the decision process is essential to improving reliability and service continuity in FMaaS systems.
8. Discussion and Limitstions
The proposed framework presents several strengths. First, it introduces a predictive reliability dimension into FMaaS offloading decisions, enabling proactive mitigation of service degradation rather than reactive fault handling. Second, the use of lightweight ESNs ensures low training overhead and scalability across distributed ENs, making the approach suitable for resource-constrained environments. Third, the integration of communication, computation, and software health indicators within a unified decision metric provides a holistic orchestration strategy that better reflects real operational conditions compared to latency-only policies. Nevertheless, some limitations must be acknowledged. The proposed model relies on synthetic SA traces and assumes that software degradation follows a known statistical distribution, which may not fully capture the complexity and nonstationarity of real-world systems. In addition, the effectiveness of the prediction module depends on the quality and representativeness of monitoring data, which in practice can be noisy, incomplete, or delayed. Another potential limitation concerns scalability in highly dynamic scenarios where frequent workload fluctuations or software updates may alter aging patterns faster than the predictor can adapt. Furthermore, while ESNs offer efficiency advantages, they may exhibit reduced interpretability compared to more transparent analytical models, potentially complicating root-cause analysis and system diagnostics. Finally, the proposed heuristic focuses on minimizing failure risk rather than jointly optimizing multiple long-term system objectives such as fairness, energy efficiency, or economic cost. Addressing these aspects requires extending the framework toward multi-objective and adaptive formulations, which constitutes a relevant direction for future work.