Path Segmentation-Based Hybrid Caching in Information-Centric Networks

: Information-centric networks (ICNs) have received wide interest from researchers, and in-network caching is an important characteristic of ICN. The management and placement of contents are essential due to cache nodes’ limited cache space and the huge Internet trafﬁc. This paper focuses on coordinating two cache metrics, namely user access latency and network resource utilization, and proposes a hybrid caching scheme called the path segmentation-based hybrid caching scheme (PSBC). We temporarily divide each data transmit path into a user-edge area and non-edge area. The user-edge area adopts a heuristic caching scheme to reduce user access latency. In contrast, the non-edge area implements caching network content migration and optimization to improve network resource utilization. The simulation results show that the proposed method positively affects both the cache hit ratio and access latency.


Introduction
Information-centric networks (ICNs) is the main direction of future network research. Current Internet architecture depends on the end-to-end correspondence between hosts; this is referred to as host-centric networks [1]. At present, the number of Internet users and traffic from multimedia applications are overgrowing, and users' primary demand for the web has changed to access massive content information. Host-centric networks do not adequately address the current needs, such as content distribution, mobility, etc. The features of the ICN are an excellent solution to this transition. ICN architectures utilize content naming, in-network caching, and a request-response model to achieve an effective and dependable content distribution [2].
Several ICN architectures have been proposed, such as DONA [3], NDN [4], PUR-SUIT [5], and NetInf [6]. In-network caching is an essential part of every ICN. An ICN node caches content by configuring additional storage. The user's requests can be responded to by the network node that caches the related content, thereby significantly reducing bandwidth utilization and content access latency. To improve an ICN caching system's overall efficiency, from the user's perspective, popular content should be copied to the edge of the network. From the overall network perspective, the cache's diversity should be increased as much as possible [7]. An excellent caching scheme requires effective coordination between these two targets [8].
Caching schemes in ICN can be divided into several categories, including cooperative and non-cooperative caching, off-path, and on-path caching [9]. In cooperative caching, cache nodes make cooperative caching decisions based on network topology, as well as partial or global cache status [8], while in non-cooperative caching, each cache node makes caching decisions individually. Compared to non-cooperative caching, cooperative caching requires additional communication and computation overhead but is more practical. In on-path caching, content is cached on nodes on the data return path [10], while off-path caching [11] provides additional caching at a strategic location. On-path caching is more flexible and scalable than off-path caching.
This paper focuses on combining the advantages of on-path caching and cooperative caching and proposes a caching scheme based on path segmentation referred to as the path segmentation-based hybrid caching scheme (PSBC). We choose content migration between nodes to implement cooperative caching. Our goal is to coordinate the two cache targets described above with good performance in user access latency and network resource utilization. The main contributions of this paper are as follows:

•
We propose a caching idea of path segmentation. We divide the cache nodes on the data return path into a user-edge area and a non-edge area according to the data transmission distance. Different caching schemes are adopted for nodes of different areas.

•
We adopt a heuristic mechanism to implement on-path caching in the user-edge area and use a lightweight mechanism based on path competition to select one node in the non-edge area as the migration node to implement content migration of the user-edge area.

•
We use real-world network topologies to evaluate the performance and scalability of the proposed caching scheme. The result shows that our scheme has better performance in different network topologies.
The rest of this paper is organized as follows. Section 2 reviews the related research of the ICN caching scheme. Section 3 introduces our proposed scheme PSBC, including the cache placement strategy and content migration mechanism. In Section 4, we carry out simulation experiments and analyze the results and performance of PSBC. Finally, we conclude and look into future research.

Related Work
Cache placement strategy is the subject of much research. Chai et al. [12] proposed a selection algorithm based on node centrality, in which the content is cached on the node with the highest betweenness centrality. In [13], the author proposed an auction-based caching, which allows the nodes on the data return path to coordinate caching decisions, with minimal overhead, through auctioning their "right to cache". Ming et al. [14] proposed an age-based cooperative caching that aims at spreading popular content to the network edge based on lightweight collaboration mechanisms. In [15], the author proposed a caching strategy that considers nodes' social attributes and selects nodes with high social attribute scores to cache contents.
The above approaches may lead to a situation where the content is too concentrated in a few nodes, which is not conducive to the full utilization of the network's overall storage resources. One solution to this problem is to select content for migration. Luo et al. [16] proposed that the cache node can coordinate with neighbor nodes and migrate part of the content to the appropriate node in case of insufficient cache space. In [17], a cooperative caching policy is proposed, in which the on-path nodes optimally cache the requested content and are supported by a central off-path cache node for an additional level of caching. However, there is an additional communication overhead when choosing the migration node, and for Ref. [17], how to allocate the off-path cache node in a larger network is also a problem.
More ideally, Ref. [18] introduces a cache manager to get a global view of all network states for ideal cache management. However, in the case of a high dynamic ICN network, it is not feasible to use this approach for obtaining the status accurately and timely. Li et al. [19] proposed a cooperative caching strategy called cooperative in-network caching (CIC). In CIC, content is divided into different chunks and is cached at more than one node, and CIC adds a cooperative router table and a cooperative content store to keep track records of the location and identity of all cached chunks. However, because the ICN cache networks are highly dynamic and the cached content is updated, this type of collaboration may incur additional communication overhead. A summary of the presented caching strategies in terms of their contributions is expressed in Table 1.

Caching Strategy Contributions
CL4M [12] Improving the caching gain by using the concept of betweenness centrality BidCache [13] Reducing caching redundancy by allowing nodes on the path to participate in an auction to win the right to cache the content ABC [14] Pushing popular content to the user edge adaptively by using "age" to control a content replica's lifetime SNCS [15] Reducing caching redundancy by selecting influential nodes as cache nodes NCBIC [16] Improving the utilization of cache resources by content migration Cooperative on-path and off-path caching [17] Reducing user access latency by employing heuristic algorithms on the edge routers and the additional central router, respectively Distribute cache management [18] Achieving efficient caching by adopting a distributed autonomic management architecture CIC [19] Improving the utilization of cache resources by storing different kinds of chunks

Path Segmentation-Based Hybrid Caching
In this section, we first introduce the network architecture, then we describe path segmentation and the caching schemes in different areas.

IP-Compatible ICN Network Architecture
The existing Internet architecture is entrenched and challenged to change radically, and so the new architecture should be deployed as an overlay network to accommodate existing IP networks [20]. In this paper, we consider an IP-compatible ICN network architecture shown in Figure 1. IP network compatibility is achieved by extending the existing network layer and transport layer to support ICN functions. In this network architecture, the name resolution system (NRS) is an important network component aside from the routers that support caching. The NRS is often used for overlay network design, the name resolution process, and the decoupled content routing process [21]. In the network, content publishers and cache nodes register the mapping of IP addresses and content names with the NRS. ICN routers can process both IP streams (IP-based routing) and ICN streams (caching decisions).

Caching Idea Based on Path Segmentation
The user may be multiple hops away from the content source or cache node, and executing the same caching scheme for each node on the path does not obtain good caching performance. We chose to divide each data return path into two segments-one segment is near the requesting user, called the user-edge area, and the other is relatively far from the user edge, called the non-edge area. Unlike node area division based on network topology information [8], this division of each return path is dynamic and temporary. We divide the nodes on the data return path according to the data transmission distance. The longer the data transmission distance is, the closer the node is to the user's edge.
Our research goal is to coordinate the two cache targets of user access latency and network resource utilization. User-edge area and non-edge area are biased towards different cache targets. To reduce user access latency, nodes in the user-edge area should cache the content with high popularity. We propose using the non-edge area as the secondary cache of the user-edge area. The content replaced in the user-edge area is migrated to one migration node chosen in the non-edge area. In order to avoid the communication overhead of selecting the migration node, when data are transmitted in the non-edge area, the nodes on the path collaborate in a competitive way to choose the most suitable node to be added to the data packet as the migration node of the user-edge area.
According to our mechanism, there are two types of network data packets: a normal data packet and a migrated data packet. We need to add several fields to carry some information in the header of the normal data packet and one field in the header of the migrated packet to implement this hybrid caching based on path segmentation. In order to reduce the transmitted data and network traffic overhead, related compression techniques can be adopted to compress the length of the packet header overhead, such as the small set of allowed coefficients (SSAC) algorithm [22].
The hop_count field: Since the network architecture we are considering is IP-compatible, the request and data transmission distance (hops) can be calculated by the Time To Live field (TTL). The hop_count is initialized to the distance between the hit node and the requesting user. The proportion of path segmentation is hop_count k : hop_count − hop_count k ; we set k = 2 for the convenience of calculation in this paper, and the proportion can be optimized as a follow-up study. As shown in Figure 2, when the data transmission distance is less than hop_count k , the node belongs to the non-edge area. Otherwise, the node belongs to the user-edge area. The cache_pressure and migrate_node _address fields: These two fields are used to record the minimum cache pressure value and the corresponding node address in the non-edge area on each data return path, respectively. The calculation of cache pressure value and the migration node's selection in the non-edge area will be discussed later.
The popularity field: This field is used by both migrated and normal data packets. In this paper, we present the concept of caching value. Each cache node calculates the cache value for the incoming content and updates the cache value for the hit content; caching decisions on incoming content and the replacement and migration of cached content are determined by cache value. The popularity of the content is a major factor in determining the size of the cache value. We choose the number of times the content is requested in the cache node as the content's popularity. The popularity field is used to record the popularity of the content delivered by the packet at the source node of the packet.

Caching in User-Edge Area
Each cache node in the user-edge area of the data return path makes the caching decision individually. To achieve better caching, each node i in the user-edge area can consider the following optimization problem when deciding on caching: Or E[h i ] is the average hit probability of the cache node i while serving the requests. E[d i ] is the average end-to-end latency. To improve cache performance, we consider two common performance metrics: user access latency and cache hit ratio. For the optimal caching placement, each cache node i should ensure the caching of content, leading to low user access latency and high cache hit probability.
In our proposal, to achieve a more practical solution, we can focus on optimizing one goal and relaxing the optimization of another.
To implement the above optimization problem for each cache node i, we consider calculating the cache value CV i (k) for each incoming content k. The higher the popularity, the closer the content is to the user, and the more noteworthy it is to be cached. We maintain a cache value for each content cached by the node. The cache value of the hit content is updated when the user request hits on this node. In addition, cache replacement and migration are carried out according to the cache value of the cached content. To prevent cache space from being occupied by high cache value content that has not been accessed for a long time, we set a time period for each cached content. Expired content that is not accessed will be expelled. According to the mechanism, the following formula is used to calculate the cache value CV i (k) of each content k: where f 1 and f 2 are weighting constants, D i (k) is the data transmission distance normalized by the distance between the requesting user and the hit node, and P i (k) is the content popularity normalized with the maximum popularity p max of cached content. If the current node is a hit node, the cache value of the hit content is updated to f 2 * P i (k) + 1 p max . The process of cache management in the user-edge area is described in Algorithm 1. The cache value is calculated for each incoming content (Step 2). We check the remaining cache space. When the cache space is insufficient, we select the content with the lowest cache value to replace, and we migrate the replaced content to the migration node carried by the packet (Steps 3-9).

Algorithm 1:
Caching in User-Edge Area current node: i remaining cache space: B i 1: for each incoming content k 2:

Caching in Non-Edge Area
Nodes in the non-edge area receive two types of data packets: a normal data packet and a migrated packet. When a normal data packet is transmitted in a non-edge area, the nodes in the non-edge area collaboratively select a node as the user-edge area's migration node and add the node address to the packet. When the migration node receives the migrated packet, due to the node's limited cache space, the migration node should use an appropriate caching mechanism to decide whether to cache the migrated content.

Migration Node Selection
We select the migration node by calculating and comparing the cache pressure value CPV i for each cache node i in the non-edge area. When calculating the cache pressure value for the cache node, we consider three parameters: cache space utilization, cache utilization, and cache replacement rate. Cache space utilization is the main indicator of the node cache status. The cache spaces of most cache nodes are close to saturation when the network is stable; because of this, we combine the cache space utilization with the cache replacement rate (the ratio of the content replaced to the total cache in a period) to measure the node cache status, and at the same time, we combine the cache utilization (the ratio of the hit content to the total cache in a period) to calculate cache pressure value CPV i . The calculation formula of the cache pressure value CPV i of node i is as follows: where U A(i) = represent cache utilization, cache space utilization, and cache replacement rate, respectively. CS(i) is the total cache size, and c(j) is the size of each content.
When the data packet passes from the hit node to the first node in the non-edge area, its address and cache pressure value are added to the data packet. As described in Algorithm 2, the subsequent cache node in the non-edge area calculates its cache pressure value and compares it with the cache pressure value recorded in the data packet, which updates the corresponding information of the packet record if smaller. In this way, when the data packet is delivered to the user-edge area, the packet will record the node address that is most suitable for migration. This path competition-based migration node selection avoids additional communication overhead relative to other migration node selection mechanisms [16][17][18]. k.cache_pressure = CPV i 5: k.migrate_node_address = i 6: end if

Caching in the Migration Node
In Section 3.3, we mentioned that we maintain a cache value for each cached content, and the placement of incoming content, the replacement, and migration of cached content are all decided based on cache value. Therefore, the caching decision made by the migration node j on migrated content m should also be based on this mechanism. The calculation of cache value CV j (m) for the incoming migrated content should be different. We mainly consider the popularity of the content, and the calculation formula is as follows: where f 2 is the weighting constant and P i (k) is the content popularity normalized with the maximum popularity p max of cached content. As described in Algorithm 3, the migration node j calculates the cache value for each incoming migrated content (Step 3) and makes caching decisions according to the cache value. When the replacement occurs, the replaced content is directly expelled (Steps 4-9). if B j == 0 then 5: if CV j (m) < min CS j (n) ∀n then 6: do not cache m 7:

Performances Evaluations
We carried out a preliminary simulation of PSBC, and the results show the characteristics of our mechanism and its performance under different network topologies. We will introduce the simulation environment, experimental setup, and simulation results in detail in the following parts.

The Simulation Environment
We used the Icarus simulator [23] to run the simulation. Icarus is a Python-based ICN cache simulator used to simulate and evaluate the caching performance of various caching strategies in the ICN network environment. It could efficiently complete simulations consisting of millions of requests in a few minutes, and it supports users to implement their designed caching scheme and simulation.

The Experimental Setup
We chose some more typical cache placement strategies to compare with our scheme, such as LCE, LCD [24], ProbCache [25], CL4M [12]. For the caching strategies used for comparison, we chose LRU as the cache replacement algorithm. In our experiment scenario, to verify the scalability of PSBC in different network topologies, we chose the following real-world network topologies: GEANT (European academic network), WIDE (Japanese academic network), and GARR (Italian academic network). The topologies are shown in Figure 3. In the three topologies, nodes of degree 1 were set as receivers. According to our optimization objectives, we used two performance metrics (KPIs), i.e., cache hit ratio and average access latency, to measure the performance of the strategies. Cache hit ratio measures the portion of content requests served by a cache, and average access latency measures the delay taken to deliver content.
The selection of experimental parameters is shown in Table 2. In our simulation, we set the amount of content in the network to 3 × 10 5 , which is a reasonable amount of content in network topology and can be used to simulate real scenes. We set the number of warm-up requests to 3 × 10 5 and the number of requests tested to 6 × 10 5 . We assumed that the user's requests follow the Poisson distribution at a rate of 10 requests per second. The cache size for each node is the same. Cache size ratio reflects the total network cache size as a percentage of the total content, and we set its range to [0.001, 0.002, 0.004, 0.01] for simulation. We assume the content requests follow the Zipf popularity distribution, and the parameter skewness (α) of Zipf distribution is set to the range [0.2, 0.4, 0.6, 0.8, 1.0].

Simulation Result
In this part, we present the results of the simulation experiment. Each experiment was performed five times, and the result was an average of five experiments. Based on the simulation results, a summary of the contrastive caching strategies in terms of their limitations is expressed in Table 3. We change the size of skewness (α) to observe the impact on cache hit ratio and average access latency in three network topologies. In this part of the experiment, the cache size ratio is set to 0.04. The results are shown in Figure 4. A larger skewness (α) means more requests from users for popular content. As shown in Figure 4a-c, the cache hit ratio increases with skewness (α). The cache hit ratio is related to network resource utilization [8]. In the three network topologies, LCE causes cache redundancy, and the hit ratio of LCE is the lowest. CL4M caches content on nodes with the greatest betweenness centrality on the data return path, which results in an uneven distribution of content, so the cache hit ratio of CL4M is also relatively low. LCD caches content in the next-hop node of the hit node, and ProbCache caches content probabilistically according to data transmission distance and caching space available on the return path. The hit ratios of these two strategies are relatively close. PSBC has the highest cache hit ratio because popularity is the main factor for PSBC's caching decision, and the content migration mechanism improves the utilization of network resources. For example, when skewness (α) is 0.8, PSBC's cache hit ratio in GEANT topology is 48%, 40%, 18%, and 22% higher than LCE, CL4M, PROB_CACHE, and LCD, respectively. As shown in Figure 4d-f, the average access latency decreases as skewness (α) increases. In PSBC, nodes close to users focus on caching popular content. We make a temporary area division for each data return path, which takes different users' access locations into account. The results show that the average access latency of PSBC is lower than the other four caching strategies under different skewness values (α).

Impact of Cache Size Ratio
We changed the cache size ratio to see the impact on cache hit ratio and average access latency in three network topologies. In this part of the experiment, skewness (α) is set to 0.6. The results are shown in Figure 5. The results show that PSBC performs better than the other four caching strategies in both cache hit ratio and average access latency. The increased cache size means that more contents can be cached. As shown in Figure 5a-c, the cache hit ratio increases as the cache size ratio increases in the three network topologies. LCE has the lowest cache hit ratio due to cache redundancy. LCD reduces cache redundancy relative to LCE and potentially considers the content's popularity, with the cache hit ratio being second only to PSBC in three topologies. ProbCache makes caching decisions based on data transmission distance and the cache space available on the return path but ignores differences in content popularity. CL4M cannot effectively utilize network cache resources as the cache size is increased. The cache hit ratio of these two caching strategies is lower than PSBC. PSBC takes both network resource utilization and content popularity into account. With the increase of cache size ratio, PSBC has a more obvious advantage in cache hit ratio. In Figure 5d-f, we observe the cache size ratio's impact on the average access latency. To reduce user access latency, nodes near users should cache as much popular content as possible. In PSBC, nodes in the user-edge area mainly consider the popularity of content when making caching decisions. The non-edge area is used as a secondary cache of the user-edge area to improve cache resource utilization. This mechanism gives PSBC an advantage. Nodes near the user's edge can cache more popular content as the cache size increases, which greatly reduces the average access latency.

Conclusions
In this paper, we proposed PSBC, a hybrid caching scheme based on path segmentation. The proposed scheme divides each data return path into the user-edge area and non-edge area according to the data transmission distance. The two areas adopt different caching schemes to coordinate user access latency and network resource utilization. The user-edge area implements optimal caching decisions to provide faster responses for users. The non-edge area is used for content migration from the user-edge area to improve network resource utilization. The simulation results show that the proposed scheme is better than other popular caching strategies regarding the cache hit ratio and access latency. In future work, we plan to optimize the path segmentation method further, propose a more flexible path segmentation algorithm, and apply the caching scheme to other application scenarios for testing.