Proactive Service Caching in a MEC System by Using Spatio-Temporal Correlation among MEC Servers

: Optimizingthe cache hit rate in a multi-access edge computing (MEC) system is essential in increasing the utility of a system. A pivotal challenge within this context lies in predicting the popularity of a service. However, accurately predicting popular services for each MEC server (MECS) is hindered by the dynamic nature of user preferences in both time and space, coupled with the necessity for real-time adaptability. In this paper, we address this challenge by employing the Convolutional Long Short-Term Memory (ConvLSTM) model, which can capture both temporal and spatial correlations inherent in service request patterns. Our proposed methodology leverages ConvLSTM for service popularity prediction by modeling the distribution of service popularity in a MEC system as a heatmap image. Additionally, we propose a procedure that predicts service popularity in each MECS through a sequence of heatmap images. Through simulation studies using real-world datasets, we compare the performance of our method with that of the LSTM-based method. In the LSTM-based method, each MECS predicts the service popularity independently. On the contrary, our method takes a holistic approach by considering spatio-temporal correlations among MECSs during prediction. As a result, our method increases the average cache hit rate by more than 6.97% compared to the LSTM-based method. From an implementation standpoint, our method requires only one ConvLSTM model while the LSTM-based method requires at least one LSTM model for each MECS. Thus, compared to the LSTM-based method, our method reduces the deep learning model parameters by 32.15%.


Introduction
Online mobile services are evolving from traditional information searching and retrieval towards innovative artificial intelligence (AI)-based services.This transformative shift encompasses a spectrum of functionalities, including language translation, image recognition, autonomous driving, artificial intelligence of things (AIoT), and augmented reality/virtual reality [1,2].The advent of these advanced services not only necessitates substantial computing power and data but also demands the swift delivery of results to end users.The prevalent cloud computing service paradigm has been instrumental in supporting computationally intensive services [3,4].However, a common challenge arises due to the considerable distance between end users and cloud servers.Furthermore, the escalating volume of service requests contributes to heightened congestion levels in the backhaul network.Consequently, while the cloud computing service paradigm provides significant computing power and extensive storage to end users, it encounters challenges in meeting the service delay requirements of AI-based services [5,6].In response to this challenge, the Multi-Access Edge Computing (MEC) system has emerged [7,8].In the MEC system, a cluster of MEC servers (MECSs) strategically positions itself in proximity to end users.This strategic placement enables users to offload computing-intensive and delay-sensitive tasks to the nearest MECS.Subsequently, the MECS processes these tasks and transmits the results back to the users.By bringing servers into close proximity to users, the MEC system effectively mitigates service delays and alleviates the strain on the backhaul network.
In the realm of MEC system design, a pivotal challenge lies in the task offloading decision problem.Various task offloading methods have been proposed [9,10], generally falling into two primary categories.The first group involves users making intelligent offloading decisions, aiming to optimize service delay and the energy required for service completion [11,12].Conversely, another research strand focuses on MEC system operators striving to optimize the utilization of computing and communication resources while ensuring users experience reasonable service delays [13,14].However, these approaches often overlook the crucial aspect of service availability for the offloaded tasks.Essentially, they assume that services requested by tasks offloaded from users to the MECS are always available at the MECS.However, due to the inherent limitations of MECS in terms of computing power and storage compared to cloud servers, it can only accommodate a subset of the services available on cloud servers.Consequently, if MECS lacks the service requested by the offloaded task, it must fetch the service from the cloud server.Given that the installation and loading of services on MECS entail time, the effective management of service caching emerges as a critical factor in mitigating service delays within the MEC system.
Ensuring the inclusion of the most popular services in each MECS cache is important in achieving a higher cache hit rate.The popularity of services within MECS is intricately linked to the service preferences of users in the respective MECS service area.The dynamic nature of user mobility causes changes in the set of users within the MECS service area over time.Moreover, user service preferences evolve over time as well.Consequently, determining the optimal service set to be cached in a specific MECS within the MEC system presents a significant challenge.To address this challenge, popularity prediction methods have been proposed [15,16].These methods delve into diverse metrics for each service, scrutinize their changing patterns, and extrapolate the future popularity of each service.By leveraging these identified patterns, they make estimates about the future popularity of each service.However, these methods primarily rely on the temporal relationships among services within individual MECS to forecast service popularity.Since users change locations over time, not only the users currently residing in the service area of a MECS but also the users served by the neighboring MECSs affect the future service popularity in the MECS.Consequently, integrating the spatial-temporal correlations between MECS enhances the accuracy of service popularity predictions.From an implementation standpoint, the number of predictors required in these popularity prediction methods escalates with the expansion of services and MECSs.Thus, concerns about scalability may surface as the MEC system grows.
In response to the identified challenges, we propose a proactive service caching methodology inspired by the Convolutional Long Short-Term Memory (ConvLSTM) model, recognized for its efficacy in video frame prediction [17,18].Our approach seeks to exploit the inherent spatio-temporal relationships among service popularity within a MEC system.To operationalize the ConvLSTM model for this purpose, we discretize time into time slots and construct a heatmap that collectively depicts the distribution of service popularity in each MECS during a given time slot.Treating each heatmap in a time slot as a frame, we conceptualize the sequence of heatmaps as a video.Subsequently, we train a ConvLSTM model to predict the upcoming heatmap based on a few recent heatmaps.At the beginning of each time slot, we leverage the trained model to identify the most popular services from the predicted heatmap.These services are then selected to be cached on each MECS for the ensuing time slot.Utilizing a heatmap containing the popularity of each service across MECS as input enables us to comprehensively predict the popularity of all services in all MECS simultaneously.This holistic approach contributes to an enhanced service cache hit rate in each MECS.In addition, our methodology stands out by requiring only a single predictor model, as opposed to conventional methods necessitating separate prediction models for each service and MECS.This consolidation not only streamlines the predictive process but also significantly improves the scalability of a MEC system.Our main contributions can be summarized as follows.

•
We propose a framework for proactive service caching by taking a deep learning approach.Inspired by the ConvLSTM model that successfully predicts video frames by exploiting the hidden spatio-temporal relationships in the frames, we incorporate the ConvLSTM model as a fundamental element in our proactive service caching methodology.

•
We propose a procedure that utilizes ConvLSTM model for accurately predicting service popularity for each MECS over time.We construct heatmaps collectively representing the distribution of service popularity in a MEC system during each time slot.Treating each heatmap in a time slot as a frame, we conceptualize the sequence of heatmaps as a video.We predict the next heatmap by using the ConvLSTM model and identify the most popular services for the upcoming time slot from the predicted heatmap.

•
Through simulation studies using real-world datasets, we verify that our method outperforms conventional LSTM-based method in terms of the cache hit rate and the amount of model parameters required to predict service popularity in a MEC system.
The rest of the paper is organized as follows.In Section 2, we discuss the related works.We explain the system model and formalize the problem in Section 3. In Section 4, we present our proactive caching strategy.In Section 5, we verify the proposed method by comparing its performance with that of the conventional method through simulation studies using real-world datasets.We conclude the paper with future research directions in Section 6.Before we proceed, in Table 1, we present the notations used in this paper.
a heatmap image of a MEC system at the end of a time slot t Ω(t) predicted heatmap image of a MEC system at the start of a time slot t δ m,k (t) Various cache management methods have been proposed for a MEC system.In [19,20], a game-theoretic approach is taken for making a caching decision.After modeling the interaction between a MECS and wireless devices as a two-stage Stackelberg game under incomplete information, the authors in [19] devise two strategies for the game.The first strategy is used for a MECS to make the caching and the price decision.The second strategy is used for a wireless device to make a task offloading decision.A stochastic differential game is used in [20] to formulate a dynamic cache control problem.Since the computational complexity of the problem is huge, both the mean-field game theory and the stochastic-geometry are used to transform the original problem into a more tractable form.Then, authors propose an iterative algorithm for optimal caching control.However, these game-theoretic methods require information exchange among the players.Thus, the control overhead increases in proportion to the number of devices and the number of MECSs.In addition, they require an iterative process to reach an equilibrium state.Therefore, they cannot make a decision until the iterative process converges nor adapt quickly to topology changes.
In [21], a service cache placement problem and a task offloading problem are jointly investigated.The joint problem is formulated as an average service response time minimization problem with a long-term energy consumption constraint.To resolve the problem without unknown future information, an online algorithm is devised by using the Lyapunov optimization framework and deep reinforcement learning (DRL).DRL-based methods are also proposed in [22,23].The authors in [22] propose a cooperative video caching method for a MEC system by combining the knowledge graph (KG) and DRL.The KG is used to determine a set of candidate videos by inferrinIn [23], a model-free reinforcement learning algorithm called RL-Cache is proposed to decide whether or not to accept a requested object into the content distribution network's cache.RL-Cache uses the size, frequency, and request recency of an object as an input feature for a feed-forward neural network that determines the admission probability of an object.However, these DRL-based methods do not exploit the spatio-temporal relation among the cached services in each MECS.In addition, they are reactive in that they make a caching decision in an on-demand manner.In [24], the authors analyze the road side unit (RSU) historical content request data and calculate the spatial-temporal correlation among the RSUs in terms of the request number, which is used to predict the service popularity in each RSU.They also employ multi-agent RL (MARL) to cope with the diversity of the popularity.However, unlike the method in [24] that uses a traditional statistical tool to directly calculate the correlation, we adopt a deep learning model to use the hidden features in the spatio-temporal relationships.
To make a caching decision in a proactive manner, the dynamics of service popularity is often considered as a time series, and a deep learning model that can handle sequential data is adopted to predict the future service popularity.An LSTM encoder and decoder model is used in [25] to map a sequence of objects requested so far to the top-k objects most likely to be requested in the future.In [26], a two-step model is proposed.In the first step, the popularity of a video genre in a future time slot is predicted.Then, the popularity of a video belonging to the prevalent genre in the same time slot is further predicted in the second step.An LSTM model is also used in [27].The authors predict the number of content requests by the seq2seq LSTM model.Using the priority of the content and predicted popularity, they solve the cache placement problem with the binary particle swarm optimization (BPSO) technique.However, since BPSO is an iterative process, it takes time to find a solution.
Attention-based methods have been proposed for a caching problem in a MEC system.In [15], the authors propose a multi-head attention-based popularity prediction model (MAPP).After proposing an architecture that integrates MEC in social content-centric network, they use MAPP to predict content popularity by considering the history of content popularity, social relationships, and geographic location.However, they mainly use temporal correlations in the data when predicting the popularity.A transformer-based edge (TEDGE) caching scheme based on the attention-based vision transformer (ViT) is proposed in [28].At the end of each time window, they aim to determine the top-k popular contents in the future by directly mapping the request patterns of all contents during the current time window to their future request patterns.In [29], authors improve the TEDGE by proposing a self-supervised caching scheme called CoPo.CoPo distinguishes the input samples by utilizing the contrastive learning paradigm.Since CoPo does not need all the input request patterns of all the contents at the same time, it can reduce the algorithm complexity, and even the number of contents increases.In addition, CoPo does not require manual labeling for model training.In [30], a parallel ViT with cross attention (ViT-CAT) fusion is proposed, which is composed of two ViT networks.The first ViT network collects the temporal correlation of a content, while the second ViT network captures the spatial correlation between different contents.
In [31], the authors focus on the popularity prediction problem while preserving the privacy of users.They propose an efficient content popularity prediction of a privacypreserving (CPPPP) scheme based on federated learning and the Wasserstein generative adversarial network (WGAN).The goal of CPPPP is to predict future content popularity by generating fake samples that can represent the overall trend of content popularity.However, since they use federated learning, it is inevitable to exchange model parameters between users and MECSs multiple times.Thus, there is a high possibility that learning will be delayed due to the increased load on the access network.
Our method differs from these methods mainly in two ways.The popularity of a service within a MECS is determined by the set of users within its service region and the service preference of the users, both of which change over time due to users mobility and changes in their service request patterns.Therefore, during a time interval, the popularity of a service within a single MECS is influenced not only by the users that it serves but also by the users moving from its neighboring MECSs.However, previous methods consider only the temporal correlation among the service popularity within a single MECS when they predict the popularity of services.On the contrary, we consider not only the temporal correlation in a single MECS but also the spatial correlation among MECSs to predict the service popularity distribution in a MEC system.In other words, we comprehensively consider the service popularity distribution over a MEC system and predict the popularity of each service in each MECS at the same time.In terms of implementation, our method reduces the amount of storage required to predict the service popularity.Since previous methods independently predict the service popularity in each MECS, the number of predictors required in a MEC system increases as the number of services and the number of MECSs increases.However, since our method simultaneously predicts the future popularity of all services in all MECSs by utilizing the spatio-temporal correlation structure, it needs only one predictor.We summarize the related works in Table 2.

Ref. Control
Objective Algorithm [19] pricing maximize MECS Profit Stakelberg game, knapsack [20] cost function optimize distributed caching Mean-Field Game [21] virtual queue minimize completed task delay Lyapunov Opt. with DRL [22] graph among contents minimize service delay Knowledge graph with DRL [23] admission probability maximize cache hit rate Model-free RL [24] spatio-temporal correlation maximize cache hit rate Multi-Agent RL [25] temporal correlation boost cache hit rate LSTM model [26] temporal correlation improve cache hit rate LSTM model [27] temporal correlation maximize cache hit rate LSTM model with BPSO [28] temporal attention predict top-K popular contents ViT model [29] temporal attention predict content popularity LSTM with CL [30] spatio-temporal correlation predict content popularity ViT with cross attention [31] fake data high cache hit rate FL with GAN proposed spatio-temporal correlation predict top-K popular services ConvLSTM

Time Series Prediction Methods
Time series data analysis and prediction have been extensively researched for a long time and have found applications in various industries, including finance, climate, health-care, transportation, and more.Research endeavors to analyze or predict time series data through traditional statistical models have been actively conducted.The drawback of the traditional statistical models lies in their application of regression to certain fixed factors using the most recent historical data.Therefore, the prediction performance tends to decrease when they are applied to volatile time series.
Time series analysis techniques employing deep learning technologies demonstrate superior performance by overcoming these challenges.These methods are categorized into RNN-based, Transformer-based, and MLP-based techniques.The RNN-based approaches [32][33][34] enable effective representation of the dynamic characteristics inherent in time series data.However, the RNN-based approach requires the computation and management of hidden states at every time step, leading to an increased computational cost and prolonged training times, particularly when dealing with extensive datasets.
The Transformer-based method improves the performance by addressing the structural limitations of the RNN-based method.Unlike RNN-based techniques that utilize recurrent networks, the Transformer-based technique employs positional encoding to indicate the temporal sequence of the data [35,36].Through selectively assigning weights to crucial information from the past, the Transformer-based method learns to focus on significant features and past trends.However, since the Transformer-based model is characterized by its intricate architecture, it presents challenges in terms of configuration complexity and result interpretation.
MLP-based models, such as N-BEATS [37] and NHITS [38], revolve around the challenges faced by the Transformer-based method.Since they use a simple MLP architecture, they are simple, flexible, and make it easy to interpret the learned time series features.These methods exhibit excellent performance in a single variate and a multi-variate time series data prediction.However, they predict solely based on the temporal correlations within the given time series data.In our service popularity prediction problem, the time series data in each MECS have a relationship not only in the time domain but also in the space domain.If these spatio-temporal relationships can be utilized, the service popularity prediction accuracy will increase.Therefore, we adopt the ConvLSTM model to utilize the spatio-temporal correlation inherent within the service popularity time series data among the MECSs.

ConvLSTM Model
The ConvLSTM model is proposed to solve a spatio-temporal sequence forecasting problem by combining the fundamental principles of the LSTM model and the CNN model [17].The LSTM model has a memory cell storing state information.The information flow through a memory cell is controlled by three gates, named as an input gate, a forget gate, and an output gate.By efficiently handling temporal correlations in the sequence data, the LSTM model has effectively solved real-life sequence modeling problems.However, since the LSTM model uses 1D tensor, it is not adequate for spatial data.
The ConvLSTM model resolves the issue by encoding the spatial information and incorporating a convolutional structure into the LSTM model.The ConvLSTM model resolves the issue by encoding the spatial information and incorporating a convolutional structure into the LSTM model.In Figure 1, we show the operation of a ConvLSTM cell at time step t.The notations used in this figure are summarized in Table 3.
In the ConvLSTM model, all inputs X t , cell outputs c t , hidden states h t , and the outputs of all gates (i t , f t , o t ) are 3D tensors.The main process in a ConvLSTM cell is described as where the operator * represents the convolutional operation, and • denotes the Hardmard product.The ConvLSTM model has been used to predict the next frame in a video [18,39].Bias in an input gate.

W ci
Model parameters used for element-wise product with c t−1 in an input gate.

W x f
Model parameters used for convolution of X t in a forget gate.

W h f
Model parameters used for convolution of h t−1 in a forget gate.b f Bias in a forget gate.

W c f
Model parameters used for element-wise product with c t−1 in a forget gate.

W xc
Model parameters used for convolution of X t when making c t .

W hc
Model parameters used for convolution of h t−1 when making c t .b c Bias used when making c t .

W xo
Model parameters used for convolution of X t in an output gate.

W ho
Model parameters used for convolution of h t−1 in an output gate.b i Bias in an output gate.

W ci
Model parameters used for element-wise product with c t in an output gate.

System Model
We consider a MEC system composed of M MECSs and a remote server hosting a controller.We denote the set of MECSs in the system as M = {1, . . ., M}.We depict the system model in Figure 2, which also shows the overall process flow.We assume that each MECS m ∈ M is colocated with a base station, and a server is connected to all the MECSs through a backhaul network.Periodically, the controller collects the service popularity information from each MECS and makes a caching decision for all m ∈ M. We denote the set of services that a MEC system provides as K and K = |K|, where |K| is the cardinarity of a set K. Each user asks for a service by offloading its tasks to the nearest MECS by using a wireless link between the user and the base station colocated with the MECS.If an user u offloads its task to a MECS m and the task is processed by m, the MECS m is called the serving MECS of a user u.We consider a discrete time controller by dividing time into slots of length τ.A server can provide all the services in K while a MECS m can serve a subset of K depending on its service cache size η m and a caching decision made by a controller.We introduce a variable a m,k (t) to represent the service cache state of a MECS m during a time slot t.Specifically, a m,k (t) = 1 indicates that a MECS m contains the service k in its service cache during a time slot t.Otherwise, a m,k (t) = 0.Then, the service cache state of a MECS m during a time slot t is represented by the following service cache vector.
Accordingly, the service cache state of a MEC system during a time slot t is represented by a M × K matrix We denote the set of services in the service cache of a MECS m during a time slot t as Our goal is to devise a controller that determines an optimal S m (t), ∀m ∈ M (i.e., an optimal A M×K (t)) at the beginning of a time slot t, so that the cache hit rate in a MEC system during the time slot is maximized.We denote the set of users in the service area of a MECS m during a time slot t as U m (t).We introduce an indicator function b m,u,k (t).When b m,u,k (t) = 1, it represents the situation where a task offloaded from a user u to a MECS m during a time slot t requests for a service k.Otherwise, b m,u,k (t) = 0.Then, our problem is formally stated as follows.
An optimal A M×K (t) is determined by the popularity of each service in each MECS during a time slot t.In other words, if we know the popularity of each service within the service area of each MECS during a time slot, the optimal S m (t), which we denote as S * m (t), can be determined by selecting the top η m services with the highest popularity in each MECS m.Then, the element a * m,k (t) in the optimal a m (t) denoted by a * m (t) is determined as and the optimal A M×K (t) becomes A * M×K (t) = {a * m (t) : m ∈ M} T .If we denote the popularity of a service k in a MECS m during a time slot t as f m,k (t), it is determined by the service preference of the users in U m (t), which is quantified as ∑ u∈U m b m,u,k (t).However, in terms of implementation, a controller cannot know b m,u,k (t) at the beginning of a time slot t, which makes it difficult to find an optimal A M×K (t) at the beginning of each time slot.
To resolve these issues, we take a measurement-based approach inspired by a deep learning model for a sequence forecasting problem.The popularity of a service k in a MECS m is affected by ∑ u∈U m (t) b m,u,k (t).User service preferences (b m,u,k (t)) vary in space and time because both U m (t) and the service preferences in U m (t) change in time and space.For example, the services used by users during daytime at the workplace differ from those utilized in the evening at home.Specifically, since a user can move from a MECS n to its neighboring MECS m during a time slot and vice versa, U m (t) is influenced not only by U m (t − 1), U m (t − 2), . . ., but also by U n (t − 1), U n (t − 2), . . ., where n ∈ M − {m}.Therefore, the service popularity distribution in a MECS m and that in MECS n during a time slot exert mutual influence on each other.In other words, f m,k (t) is affected by the spatio-temporal correlation among the MECSs in terms of the set of users (U m (t) and U n (t)) and their service preferences (b m,u,k (t) and (b n,u,k (t))).Therefore, we transform the problem in Equation ( 5) into a spatio-temporal sequence forecasting problem as follows.We denote the service popularity matrix of a MEC system during a time slot t as M × K matrix F M×K (t) = { f m (t)|m ∈ M} T , where f m (t) = { f m,k (t)|k ∈ K}.Then, we convert the problem in Equation ( 5) into finding an optimal F M×K (t) at the end of each time slot as follows.
where w a is the number of the most recent service popularity matrix used for predicting F * M×K (t), and the notation Pr(X) represents the probability of X. Hereafter, we will call w a as a window size.Once F * M×K (t) = { f * 1 (t), . . ., f * M (t)} is determined, the optimal service cache in each MECS m is constructed by selecting the top η m services with the highest popularity in f * m (t).To fast resolve the popularity prediction problem in Equation ( 7) at the beginning of each time slot, we propose a deep learning approach, which will be detailed in Section 4.

Proactive Service Caching Method
In Figure 3, we show the overall procedure of our proactive service caching scheme.Our method is composed of two main modules.The first module is responsible for collectively representing the popularity distribution of each service within the MEC system.A controller periodically collects ∑ u∈U m (t) b m,u,k (t) from each MECS m ∈ M and builds a M × K heatmap image to collectively represent the service popularity of each service in each MECS.The second module collectively determines the services to be cached by each MECS for the next time slot with the ConvLSTM model.In the second module, a controller takes the recent w a consecutive heatmap images and predicts the next heatmap image.By using the estimated service popularity of each service in each MECS contained in the predicted heatmap image, a controller simultaneously determines the services to be cached by each MECS for the next time slot.We will detail the operation of each module in Sections 4.1 and 4.2.
By considering each f m,k (t) as the (m, k)-th pixel in the heatmap image, a controller constructs a M × K grayscale heatmap image with the set of f m,k (t)s.
Since each f m,k (t) << Then, a controller obtains a normalized service popularity vector for a MECS m during a time slot t as follows.
Since the cache size of a MECS m is η m , a controller determines the services for S m (t + 1) by selecting the most popular η m services according to νm (t + 1).In other words, S m (t + 1) is composed of the services whose predicted popularity is larger than or equal to νm,η m (t + 1).We summarize the service cache decision algorithm in Algorithm 1.

Performance Evaluation
In this section, we evaluate the performance of our method through simulation studies using real-world datasets.We compare our method with an LSTM method in terms of the service popularity prediction behavior and the cache hit rate.In the LSTM method, each MECS has an LSTM predictor that estimates the popularity of each service during the next time slot.After predicting the popularity of all services, the LSTM method determines S m (t) by choosing the top η m services with the highest predicted popularity.We use the publicly known default values of the ConvLSM model and LSTM model for configuring their hyperparameters.Specifically, we use the hyperparameters in [40] to configure the hyperparameters of the ConvLSTM model and employ the hyperparameters in [41] to configure the LSTM model.We summarize the parameters used for each model in Table 4.For our simulation study, we use a computer equipped with an Intel i9-10980XE CPU and four Nvidia GeForce RTX 3080.The size of the random access memory is 128 GB, and its operating system is Window 10.When we run the ConvLSTM model, we use Python

Simulation Setup
In Figure 4, we show the topology of a MEC system we configured for simulation studies.We evenly deploy M = 9 MECSs in a 3 × 3 grid, which ranges from (0, 0) to (2 km, 2 km).We locate each MECS at each grid point.We set the number of services that a MEC system provides to 64 (K = 64) and the cache size of each MECS to η.To configure the popularity of each service in each MECS, we use the MovieLens 25 M dataset D measured from 1 January 2019 to 21 November 2019 by GroupLens Research [42].The dataset contains 1,202,602 ratings for 41,440 movies by 10,619 users.Assuming the patterns that the users rate the movies are similar to the patterns that they request for the MEC services, we configure the popularity of each service according to the popularity of movies in the dataset.Specifically, we define the popularity of a movie i in D as g i = n i / ∑ j∈D n j , where n i is the number of times that a movie i is rated in the dataset.We sort g i in a descending order and investigate its distribution.We find that the popularity distribution follows the Pareto principle in that the top 20% popular movies take 84.36% of the total movies.Assuming that the popularity of a service in a MEC system will also follow the Pareto principle, we configure the popularity of a service k ∈ K in a MECS m by setting its popularity to that of the set of movies as follows.We denote the set of movies belonging to the top 20% popular movies as P. Without loss of generality, we assume that both the elements in P and those in D − P are sorted in a descending order according to g i .We set p m = 0.2K and divide the elements in P evenly into p m subsets P 0 , . . ., P p m −1 .We also divide the elements in D − P evenly into K − p m subsets P p m , . . ., P K−1 .To determine the popularity of each service in each MECS, we randomly assign a service identification number k ∈ [0, K − 1] to each subset P 0 , . . ., P K−1 without duplication.Therefore, if a number k is assigned to P j , the popularity of the service k in a MECS m becomes α m,k = ∑ i∈P j g i .
We denote the service popularity distribution in a MECS m during a time slot t as Ψ m (t) = {α m,k (t)|k ∈ K}.For each MECS m ∈ M, we set Ψ m (0) by randomly assigning a number k ∈ [0, K − 1] to each subset P 0 , . . ., P K−1 without duplication.In other words, we set Ψ m (0) = Ψ n (0) if m = n ∈ M. In the beginning of the simulation, we randomly deploy U=1000 users in the grid according to the Poisson point process and number them from 0 to 999.As time elapses, Ψ m (t) changes because of the mobility of users in the MEC system.We set the mobility pattern of each user by using a real-world dataset.Specifically, we use the Divvy historical trip dataset, which records the trip start day and time, trip end day and time, trip start station, and trip end station [43].Among the historical trip data, we use the dataset T containing 767,650 trip history in Chicago during July 2023.We set the x-th record in the dataset as the mobility pattern of the user x mod U in the MEC system.Therefore, the service popularity distribution in the MEC system changes in time and space.
To configure the service request of each user in the MEC system, we use the records in D. For instance, let us consider the a-th record in D, which says that a movie b ∈ P j is rated at time c.If a MECS m is the nearest MECS of a user u = a mod U and k is assigned to P j in the MECS m, we set that the user u requests the MECS m for a service k at time c.Since there are average U ratings during six hours in D, we set the duration of a time slot τ to 6 h so that the average number of service requests by each user in the MEC system is one during a time slot.We set the window size w a = 12 for both our method and the LSTM method.By using the datasets D and T , we measure Ω(t) = {ν 1 (t), . . ., ν m (t)} at the end of each time slot.We shuffle all Ω(t)s and randomly select 90% of them as a training dataset and use the rest as a validation dataset.We use the last 10% of the total Ω(t)s as a test dataset.We summarize the parameters used for the simulation study in Table 5.

Prediction Accuracy
We denote the error involved in predicting the popularity of each service k in a MECS m during a time slot t as δ m,k (t) = ν m,k (t) − νm,k (t) and compare the distribution of δ m,k (t) for all k ∈ K in different MECSs in Figure 5.In this figure, we illustrate the results for four different cases based on the applied methods and MECSs.In each subfigure, the x-axis is the service index k ∈ [0, K − 1] and the y-axis is the prediction error.Thus, for each service k, each subfigure shows the distribution of δ m,k (t) in a MECS m as a box plot.We observe in the figure that regardless of the true popularity of a service, the 75th percentile of δ m,k obtained by our method is smaller than that resulting from the LSTM method.For example, in MECS 4, the 75th percentile of δ m,k is 0.054 when our method is used, while it is 0.11 when the LSTM method is used.In addition, when the LSTM method is used, there are services whose median δ m,k s are much higher than those obtained by our method.For example, the δ m,k of the service k = 33 in MECS 4 obtained by our method is 0.080, while it is 0.90 when the LSTM method is used.We also observe that the interquartile range (IQR), which is the difference between the 75th percentile and the 25th percentile, is smaller when our method is applied compared to when the LSTM method is used.For example, in MECS 4, the average IQR acquired by our method is 0.034, while it is 0.046 when the LSTM method is applied.The results are attributed to the manner that each method predicts the service popularity.When the LSTM method is used, each MECS predicts the popularity of each service within its service range, regardless of the service popularity distributions in the other MECSs.On the contrary, our method predicts the popularity of each service in each MECS at the same time by comprehensively considering the service popularity distributions all over the MECSs.Therefore, our method reduces the amount of error in predicting the popularity of each service in each MECS.We obtain the same results in all the other MECSs.To investigate the influence of the service popularity prediction error on the service caching decision, we inspect the difference between the true S m (t) and the predicted S m (t) in each MECS m.We note that the true S m (t) is composed of the top η popular services during a time slot t, which can be known at the end of a time slot t.On the contrary, the predicted S m (t) comprises the top η popular services whose popularities are estimated by a predictor at the beginning of a time slot t.We denote the true S m (t) as S T m (t) and the predicted S m (t) as S P m (t).To quantify the similarity between S T m (t) and S P m (t), we calculate a m (t) = |S T m (t) ∩ S P m (t)|/η and show the results in Figure 6 with different ηs.In the figure, we draw the ranges of the y-axis to be the same on purpose to make comparison easier.This figure has four subfigures.In each subfigure, the x-axis is a MECS index m and the y-axis is a m .Each subfigure shows the distribution of a m (t) in each MECS as a box plot.In Figure 6, we observe that our method improves the similarity of the service cache regardless of the MECS position in the grid and the cache size.When η = 8, compared with the LSTM model, our method achieves an 11.26% increase in the cache similarity.Specifically, the average a m (t) is 54.71% when the LSTM method is used.The average a m (t) increases to 60.87% when our method replaces the LSTM method.The standard deviation of a m (t) is 5.99% when our method is used, while it is 5.74% when the LSTM method is used.When η = 16, the proposed method improves the average service cache similarity by 13.52%.When the LSTM method is used, the average a m (t) is 54.45%.Our method increases the average a m (t) to 61.81%.The standard deviation of a m (t) is 6.56% when our method is used, and it is 5.09% when the LSTM method is used.

Hit Rate Comparison
In Figure 7, we show the variations of a cache hit rate in different MECSs with different cache sizes over time.To facilitate the comparison, the ranges of the y-axis in all subfigures are shown to be the same.We observe that the hit rates obtained by our method are higher than those acquired by the LSTM method, regardless of the MECS locations and η.To further verify our method, we compare the distribution of a cache hit rate in each MECS.Specifically, for each MECS, we investigate the difference between the maximum cache hit rate that can be obtained when S T m (t) is used and that acquired when S P m (t) is used.We denote the cache hit rate when S T m (t) is used as h T m (t).We also denote the cache hit rate obtained by a popularity prediction method as h P m (t).For each time slot, we calculate the relative hit rate ρ m (t) = h P m (t)/h T m (t) and show its distribution in Figure 8.This figure has four subfigures.In each subfigure, the x-axis is a MECS index m and the y-axis is ρ m (t).
Each subfigure shows the distribution of ρ m (t) in each MECS as a box plot.
In this figure, we observe that the proposed method outperforms the LSTM method in terms of ρ m in all MECSs and ηs.When η = 8, the average ρ m (t) is 82.13% when our method is used, while it is 76.78% when the LSTM method is used.In other words, our method improves the average cache hit rate by 6.97%.In the case of η = 16, the proposed method enhances the average cache hit rate by 8.48%.The average ρ m (t) is 81.08% when our method is used, while the average ρ m (t) obtained by the LSTM method is 74.74%.We also observe that our method decreases the variance in the cache hit rate.When η = 8, the average IQR resulted by the proposed method is 14.19%, while the LSTM method produces an average IQR of 15.88%.In the case of η = 16, the proposed method achieves an average IQR of 11.76%, while it is 13.55% when the LSTM method is used.By decreasing the average IQR by more than 10%, our method makes the cache hit rate in a MEC system stabler than the LSTM method.
We also note that the proposed method needs only one ConvLSTM model to predict the popularity of each service in each MECS.On the contrary, since each MECS needs at least one LSTM model to predict the popularity of each service within only its service area, the LSTM method needs the M LSTM model to predict the service popularity in a MEC system.As we show in Table 4, we use a total of 746,689 parameters for the ConvLSTM model and 121,301 parameters for a single LSTM model.Since the LSTM method uses a total of M × 121, 301 model parameters, the proposed method reduces the total model parameters by 32.15%.Therefore, when we consider these facts along with the results observed in Figure 8 in an integrated manner, our method can achieve a higher cache hit rate with a smaller amount of model parameters compared to the LSTM method.

Caching Behavior
We also scrutinize the caching behavior of each method.Instead of predicting the exact order of the service popularity in each MECS, we aim to predict a set of η services whose popularity is relatively higher than those of the rest of the services.To quantify the service caching behavior, for each time slot, we calculate b m (t) = |S T m (t) − S T m (t + 1)|/η and bm (t) = |S P m (t) − S P m (t + 1)|/η and plot them in Figure 9 for different m and η.We observe that both our method and the LSTM method are conservative in that they do not change the elements in the service cache abruptly.In addition, we observe that the changing pattern of bm (t) is more similar to that of b m (t) when the LSTM method is used compared to when our method is used.This difference in the caching behavior comes from the manner that each method predicts the service popularity.The LSTM method uses only the history of ν m,k (t) when it predicts νm,k (t + 1) without considering the history of ν n,j (t)s (n ∈ M − {m}, k ∈ K − {k}).On the contrary, when our method predicts the service popularity, our method collectively considers not only the temporal correlation but also the spatial correlation.Consequently, compared with the LSTM method, our method responds less sensitively to the instant changes in the service popularity at a single MECS.

Effect of User Mobility
To inspect the effect of user mobility on the service cache performance, we conducted the same simulation but with a change in the user mobility model from the previous one to a random mobility model, while maintaining the same experimental environment.In a random mobility model, a user changes its moving direction from (0, 360 • ) and speed from [0, ς] at each time slot according to the Uniform random distribution.Considering the size of the simulation topology, we conduct simulations for the cases where the ς is 0.2 km and 0.5 km.
In Figure 10, we show the distribution of the relative cache hit rate ρ m (t) with different ς when the cache size is η = 8.In this figure, we observe that the cache hit rate decreases as ς increases.This is attributed to the fact that the increased mobility speed leads to a larger change in the service popularity distribution across each MECS.As a result, the popularity prediction accuracy decreases.In this figure, we also observe that for all the moving speeds, the median ρ m (t) at all MECS is higher when using the proposed method compared to using the LSTM method.When ς = 0.2 km, the average cache hit rate rate across all MECS (ρ(t) = 1 M ∑ i∈M ρ m (t)) is 85.32% when using the proposed method, while it is 80.88% when using the LSTM method.When ς = 0.5 km, the average ρ(t) obtained through the proposed method is 80.92%, while the average ρ(t) is 74.79% when the LSTM method is used.

Effect of Service Popularity Variation
To assess the impact of the degree of service popularity change at each MECS on the caching performance, we conduct the following simulations.We extend the topology by deploying M = 16 MECSs in a 4 × 4 grid, which ranges from (0,0) to (3 km,3 km).We locate each MECS at each grid point.We also increase the number of services from 64 to 128 and set η to 16.For each service k and each MECS m, we randomly change the popularity of a service k a MECS m at each time slot.To control the degree of service popularity change at each MECS, at the beginning of each time slot, we randomly configure v m,k (t + 1) = v m,k (t) + x(t)y(t) (∀m ∈ M, k ∈ K), where x(t) is determined with a probability of 1/2 to be either 1 or −1.We randomly select y(t) from [0, ξ] according to the Uniform distribution.
In Figure 11, we show the distribution of the cache similarity degree (a m (t)) with different ξ.We oberve that as the degree of service popularity change at each MECS (i.e., ξ) increases, the cache similarity degree decreases.In the case of the proposed method, the average a m (t) is 98.47% when ξ = 0.05, while is is 96.19% when ξ = 0.15.We also observe in this figure that the proposed method outperforms the LSTM method.For all MECSs and ξs, the cache similarity degree is higher when using the proposed method than when using the LSTM method.

Conclusions and Future Works
In this paper, we address the service caching problem in a multi-access edge computing system.To increase the cache hit rate, we comprehensively exploit the spatio-temporal correlation structure in the service popularity distribution among the MECSs by using the ConvLSTM model.To achieve the goal, we construct a heatmap to collectively represent the service popularity distribution in a MEC system during a time slot.Using a sequence of heatmaps as an input to the ConvLSTM model, we simultaneously predict the popularity of each service in each MECS.We evaluate the performance of the proposed method through trace-driven simulations.The results verify that compared with a conventional method based on the LSTM model, the proposed method increases the cache hit rate by more than 6.97%.In addition, our method reduces the amount of storage required for predicting the popularity of each service in each MECS by 32.15%.
Our future works are as follows.Firstly, we will continue our research to further enhance the prediction accuracy.To verify our method, we used the MovieLens dataset and Divvy mobility dataset.We will use other datasets on the user mobility and the service request patterns and show the generality of the proposed method.We will also investigate the explainability of our cache decision algorithm.

Figure 1 .
Figure 1.The inner structure of a ConvLSTM cell at time step t.The operator * represents the convolutional operation, and • denotes the Hardmard product.In addition, σ represents the sigmoid function, and tanh represents the hyperbolic tangent function.

4. 1 .
Collective Service Popularity Representation When a sequence of video frames is given, the ConvLSTM model can predict the next frame.Specifically, the ConvLSTM model predicts the value of each pixel in the next frame by considering the spatio-temporal correlation in a set of past video frames.To take advantage of this feature of the ConvLSTM model, we construct a heatmap representing the service popularity distribution over a MEC system during a time slot and use it as an input to the ConvLSTM model.Specifically, each MECS m maintains a set Z m (t) = {b m,1 (t), . . .., b m,K (t)} during a time slot, where b m,k (t) = ∑ u∈U m (t) b m,u,k (t).At the end of each time slot, a controller collects Z m (t) from each MECS m ∈ M.Then, a controller calculates f m,k (t)s for all m ∈ M and k ∈ K at the end of each time slot as follows. f while i ≤ η m do 14: 15:

Figure 4 .
Figure 4. Topology of a MEC system.

Figure 6 .
Figure 6.Service cache similarity at each MECS with different ηs.Each subfigure shows the distribution of the service cache similarity in each MECS as a box plot.

16 Figure 7 .
Figure 7.Comparison of a cache hit rate over time in different MECSs.

Figure 8 .
Figure 8.Comparison of the distribution of the relative cache hit rate in each MECS.Each subfigure shows the distribution of ρ m (t) in each MECS as a box plot.

Figure 10 .
Figure 10.Distribution of the relative cache hit rate at each MECS when a cache size is η = 8.
1: a service k is cached at MECS m during a time slot t, 0: otherwise.A M×K (t) Service cache state of a MEC system during a time slot t S m (t) A set of services cached at MECS m during a time slot t U m (t) A set of users in the service area of MECS m during a time slot t b m,u,k (t) 1: a service k requested by user u is at MECS m during a time slot t. 0: otherwise f m,k (t) popularity of a service k in MECS m during a time slot t F M×K (t) service popularity matrix of a MEC system during a time slot t w a the number of recent past heatmap images used for prediction b m,k (t) the number of times MECS m receives the request of service k during a time slot t Z m

Table 3 .
Notations used in the inner structure of a ConvLSTM cell at time step t.
t Input at time step t. i t Output of an input gate at time step t. f t Output of a forget gate at time step t. o t Output of an output gate at time step t. c t Cell state at time step t. h t Hidden state of a cell at time step t.W xiModel parameters used for convolution of X t in an input gate.W hiModel parameters used for convolution of h t−1 in an input gate.b i 1 in general, it is not suitable to directly use f m,k (t)s for training a ConvLSTM model.To enhance the model training performance, we normalize f m,k (t)s in each MECS m as follows.At the end of each time slot, a controller derives the popularity of the most popular service in each MECS m, f M m

Table 4 .
Parameters used to train each model.

Table 5 .
Parameters used for the simulation study.