TA-CLOCK: Tendency-Aware Page Replacement Policy for Hybrid Main Memory in High-Performance Embedded Systems

: Recently, high-performance embedded systems have adopted phase change memory (PCM) as their main memory because PCMs have attractive advantages, such as non-volatility, byte-addressability, high density, and low power consumption. However, PCMs have disadvantages, such as limited write endurance in each cell and high write latency compared to DRAMs. Therefore, researchers have investigated methods for enhancing the limitations of PCMs. In this paper, we propose a page replacement policy called tendency-aware CLOCK (TA-CLOCK) for the hybrid main memory of embedded systems. To improve the limited write endurance of PCMs, TA-CLOCK classiﬁes the page access tendency of the victim page through access pattern analysis and determines the migration location of the victim page. Through the classiﬁcation of the page access tendency, TA-CLOCK reduces unnecessary page migrations from DRAMs to PCMs. Unnecessary migrations cause an increase in write operations in PCMs and the energy consumption of the hybrid main memory in embedded systems. Thus, our proposed policy improves the limited write endurance of PCMs and enhances the access latency of the hybrid main memory of embedded systems by classifying the page access tendency. We compared the TA-CLOCK with existing page replacement policies to evaluate its performance. In our experiments, TA-CLOCK reduced the number of write operations in PCMs by 71.5% on average, and it enhanced the energy delay product by 38.3% on average compared with other page replacement policies.


Introduction
Recently, demands for high-performance embedded systems have emerged because of the development of various smartphones, wearable devices, and Internet of Things (IoT) devices [1][2][3][4][5]. However, existing embedded systems have limitations because of their low performance, low battery capacity, and low storage capacity. Therefore, the demand for high-performance embedded devices has significantly increased in various embedded systems. For improved performance, embedded systems require high computing power, low power consumption, and high memory capacity [6]. Particularly, among the embedded system components, the power consumption of the main memory is 30-50% of the overall power consumption of the system [7]. Nevertheless, the battery capacity of embedded systems is limited; therefore, several studies on embedded systems have focused on reducing their power consumption to improve the limited battery lifespan.
Dynamic random access memories (DRAMs) are widely used in the main memories of embedded systems because of their unlimited write endurance and low read/write latency. However, it is difficult to use DRAMs directly in the main memory of an embedded system with limited battery capacity because they have high power consumption, thereby requiring leakage power. Therefore, researchers have proposed a hybrid main memory architecture for resolving the limitations of DRAM and improving the performance of DRAM alone main memory [8][9][10][11]. Phase change memory (PCM) is an attractive memory for embedded systems owing to its advantages, such as non-volatility, high density, and low power consumption [12][13][14][15][16]. However, PCMs have disadvantages, such as limited write endurance and high write latency compared to DRAMs. Therefore, hybrid main memories have been proposed to enhance the limitations of DRAMs and PCMs. Figure 1 shows the organization methods of the hybrid main memory. The organization methods of hybrid main memories can be classified into hierarchical organization and parallel organization. Hierarchical organization comprises an upper DRAM cache layer and a lower PCM memory layer, where the DRAM cache layer incurs an additional hardware cost. Meanwhile, parallel organization comprises DRAMs and PCMs as the same layer. Parallel organization can reduce the additional hardware cost and use a wide range of the main memory space. Therefore, several researchers have studied parallel organization [8][9][10]. In this paper, we propose a hybrid main memory architecture and its page replacement policy called tendency-aware CLOCK (TA-CLOCK). TA-CLOCK consists of DRAM CLOCK and PCM CLOCK, and it determines the optimized location of requested pages using tendency classification. When the host system requires new read/write pages, TA-CLOCK first allocates pages in the DRAM CLOCK. To select the victim page in the DRAM, the DRAM CLOCK classifies its page access tendency. Afterward, TA-CLOCK determines the location of the selected victim page using write/read thresholds, which are determined by the write and read counts of the victim page. Based on the classification of the page access tendency, TA-CLOCK reduces unnecessary migrations between the DRAM CLOCK and PCM CLOCK. Additionally, it migrates dirty pages from the PCM CLOCK to the DRAM CLOCK to reduce the number of write operations in the hybrid main memory. As a result, TA-CLOCK improves the limited write endurance of the PCM and enhances the energy delay product (EDP) of the hybrid main memory.
The remainder of this paper is organized as follows. In Section 2, the background of PCMs is described and existing hybrid main memory policies are reviewed. The proposed TA-CLOCK hybrid main memory policy with its structure, algorithm, and operations is presented in Section 3. In Section 4, the performance of the TA-CLOCK is evaluated by comparison with existing main memory policies. Finally, conclusions are presented in Section 5.

Background and Related Works
In this section, we describe the background of PCMs and discuss existing hybrid main memory management policies.

Phase Change Memory
Recently, PCMs have been studied for use in the hybrid main memory of highperformance embedded systems because they have characteristics, such as low leakage power, high density, and non-volatility, compared to DRAMs [12][13][14][15][16]. Table 1 presents the details of the comparison of DRAMs and PCMs. PCMs are non-volatile, and they comprise phase change materials, such as chalcogenide glass [14]. Furthermore, they store data using the state transition of phase change materials. Phase change materials trasition into crystalline and amorphous states by heat to use electrical pulses. The crystalline state represents SET (value 1), and it has high optical reflexivity and low resistivity. Meanwhile, the amorphous state represents RESET (value 0), and it has low optical reflexivity and high resistivity. Although PCMs have attractive advantages, such as low leakage power, high density, byte-addressability, and non-volatility, to replace DRAMs in embedded systems [12][13][14], they have disadvantages, such as high read/write latency with even asymmetric operation cycle requirement and limited write endurance. Thus, it is difficult to adopt PCMs in write-intensive systems because the write endurance of each cell in the PCM is approximately 10 8 -10 9 . Therefore, existing research has been focused on controlling the write operations of PCMs and providing improved access latency and write endurance by altering their hardware design. Nevertheless, PCMs cannot completely replace DRAMs in the main memory architecture owing to the advantages of DRAMs, such as unlimited write endurance and low read/write latency. Hence, recent research has proposed a hybrid main memory architecture to improve the limitations of PCMs and DRAMs by selectively exploiting their advantages.

Hybrid Main Memory
The hybrid main memory adopts DRAMs and PCMs to improve the performance of embedded systems by selectively exploiting their advantages [8][9][10][11]. However, existing page replacement policies for main memories do not consider the characteristics of the hybrid main memory, especially PCMs, which are non-volatile memories. Therefore, when the hybrid main memory system adopts conventional page replacement policies, the performance of the embedded system is limited, and the policies cannot fully utilize the performance potential of non-volatile memories.
Zhang et al. proposed a hybrid main memory management policy called triple-based hybrid main memory (TriBHMM) [11], which comprises three components: L1 DRAM, L2 DRAM, and L2 PCM. It manages pages in the main memory using priority and the least-recently-used (LRU) algorithm to reduce energy consumption, memory access time, and PCM write counts. Lin et al. proposed history-aware LRU (HA-LRU), which is a page replacement policy for NAND flash-based storages [17]. HA-LRU manages pages in the main memory through the access history information using four LRU lists. However, these policies are based on the LRU algorithm and have a disadvantage in that they require a computation overhead when used in the main memory. This requirement is especially critical in embedded systems that utilize limited computing resources. Therefore, to reduce the computation overhead, recent studies have utilized the CLOCK algorithm.
The CLOCK algorithm was proposed for overcoming the disadvantages of LRU [10,18]. CLOCK consists of the circular list and the clock hand for managing pages in the main memory. The circular list of CLOCK manages pages, and each page include a reference bit. When the main memory requires the page replacement, the clock hand of CLOCK selects a victim page through the reference bit of the pointed page. If the reference bit of the pointed page is 0 (false), the pointed page is determined as the victim page and it is evicted to storages. In contrast, when the reference bit of the pointed page is 1 (true), it is changed to 0 and the clock hand moves to the next page in the circular list. In some cases, CLOCK maintains a dirty bit to manage write operations effectively.
Migration-optimized CLOCK (M-CLOCK) manages each page in the hybrid main memory by operation type [9]. It also manages frequently accessed pages to the DRAM using the write operation, and migrates frequently referenced pages to the PCM using the read operation. Moreover, it secures free pages in the DRAM by evicting unnecessary pages, which are unlikely to be referenced. However, in the early stage, M-CLOCK evicts DRAM pages to the storage because of the difficulty of accessing pattern identification. These problems arise because the characteristics of the dynamic access pattern in applications are not considered. As a result, M-CLOCK re-allocates the evicted pages from the storage to the DRAM, thereby reducing the performance of M-CLOCK in applications that frequently change the read/write access pattern.
Meanwhile, CLOCK with dirty bits and write frequency (CLOCK-DWF) is a page replacement policy for the hybrid main memory [10]. Unlike M-CLOCK, CLOCK-DWF determines page locations by the access type of the newly inserted pages. It allocates write pages to DRAM and read pages to PCM. Additionally, when a write access occurs in a page allocated to PCM, CLOCK-DWF migrates the accessed page to DRAM to reduce additional write operations in the PCM. However, it is difficult to utilize the limited write endurance of PCMs, especially in applications that include frequent random read accesses, because it allocates the read accesses only to PCM.

Design of TA-CLOCK
In this section, we propose a novel page replacement policy for the hybrid main memory, which improves the limited write endurance of PCMs and reduces the energy consumption of hybrid main memories. We refer to the proposed page replacement policy as TA-CLOCK. TA-CLOCK analyzes the access tendency of pages through the write/read threshold, and determines suitable page locations to overcome the limitations of PCMs in high-performance embedded systems. It has some specific characteristics which are summarized below: 1.
TA-CLOCK improves the limited write endurance of PCMs by maintaining write pages in the DRAM CLOCK through the classification of the page access tendency.

2.
TA-CLOCK reduces the energy consumption of the hybrid main memory in an embedded system by enhancing the write hit ratio of the DRAM CLOCK, thus reducing unnecessary page migrations between the DRAM CLOCK and PCM CLOCK.

Structure of TA-CLOCK
TA-CLOCK comprises DRAM CLOCK and PCM CLOCK for managing the read and write pages in the hybrid main memory. The overall structure of TA-CLOCK is presented in Figure 2. To manage pages in the hybrid main memory, TA-CLOCK utilizes the CLOCK algorithm, and the DRAM CLOCK maintains the read/write counts of each page to analyze the page access tendency. In our proposal, the page access tendency is classified by the write/read threshold, and it is used to determine the victim page to secure a free page in the DRAM CLOCK. Through this process, it maintains write-intensive pages in the DRAM CLOCK and reduces unnecessary page migrations between the DRAM CLOCK and PCM CLOCK. The DRAM CLOCK of the TA-CLOCK is the location where the requested read/write pages are allocated from the host system. It maintains a reference bit (r), dirty bit (d), read count (rc), and write count (wc) for each page. When read/write requests occur from the host system, the requested pages are inserted into the DRAM CLOCK. Additionally, when read/write hits occurs on the inserted page in the DRAM CLOCK, the TA-CLOCK increases the read/write counts of the accessed page. In the TA-CLOCK, the read/write counts of each page are used to analyze the page access pattern and classify the page access tendency. If the DRAM CLOCK does not have a free page, the TA-CLOCK selects a victim page and the migration location using the page access tendency.
Meanwhile, the PCM CLOCK of the TA-CLOCK is the location that manages the read-intensive pages migrated from the DRAM CLOCK using reference bit (r) for each page. When a write operation occurs in the PCM CLOCK, the TA-CLOCK migrates the accessed page to the DRAM CLOCK.

TA-CLOCK Algorithm
In this section, we describe the main algorithm of our proposed page replacement policy. Table 2 describes the parameters used in this study. The DRAM CLOCK of the TA-CLOCK uses the read/write counts of each page to analyze and classify the page access tendency of a selected victim page for page replacement. The page access tendency is classified into four states: strong write (SW), weak write (WW), weak read (WR), and strong read (SR). The details of the classification of the page access tendency with their threshold values are described in Table 3. The calculation of the threshold value and its operation are as follows.
(2) Table 2. Description of parameters for the analysis of the page access tendency.

Parameter Description
i The index of DRAM CLOCK page.

Read_Threshold i
The read threshold of page i for the classification of the page access tendency.

Write_Threshold DRAM
The write threshold of DRAM CLOCK for the classificati-on of the page access tendency. page_count The number of pages in DRAM CLOCK. read_count i The read counts of page i. write_count i The write counts of page i. weight write The weight of write operations (constant). weight read The weight of read operations (constant).

Tendency Classification Condition
If the DRAM CLOCK does not have a free page, the TA-CLOCK selects a victim page in the DRAM CLOCK. Write_Threshold DRAM in Equation (1) is used to determine the tendency of the write operation in a specific DRAM CLOCK page. TA-CLOCK analyzes and classifies the page access tendency of the victim page through Write_Threshold DRAM which is calculated by the write counts of all the DRAM CLOCK pages (write_count i ) and the page counts in the DRAM CLOCK (page_count) by adjusting write weight (weight write ). The page access tendency of the selected page is classified by comparing the number of write counts of the selected page and the write threshold of the DRAM CLOCK. If the page access tendency of the selected page is SW, the TA-CLOCK maintains the selected page in the DRAM CLOCK to improve its write hit ratio and reduce the number of write counts in the PCM CLOCK.
Meanwhile, if the page access tendency of the selected page is not classified as SW, the TA-CLOCK analyzes the page access tendency of the selected page to determine the migration location of the selected page. Read_Threshold i in Equation (2) is used to determine the migration location of the selected page by both read counts of page i (read_count i ) and write counts of page i (write_count i ) by adjusting read weight (weight read ). The page access tendency of the selected page is classified as WW, WR, and SR by the read threshold. To reduce the write counts in the PCM CLOCK, the TA-CLOCK considers the ratio of read/write counts in the selected page. If the page access tendency of the selected page is WW, the TA-CLOCK maintains the selected page in the DRAM CLOCK and reclassifies the page access tendency. However, if the selected page is classified as WR, it is evicted to the storage because the page access tendency of the selected page is read operation. As a result, the TA-CLOCK reduces the number of unnecessary migrations from the DRAM to the PCM, and it secures a free page in the DRAM CLOCK. Meanwhile, SR is a state that is expected to occur in additional read operations on the selected page. Therefore, the TA-CLOCK migrates the selected page to the PCM CLOCK to secure a new free page in the DRAM CLOCK. When the SR pages are migrated, the TA-CLOCK reduces the power consumption of the embedded system and improves the limited write endurance of the PCM.
Thus, TA-CLOCK determines the location of the victim page by classifying its page access tendency. Particularly, to reduce unnecessary migrations to the PCM CLOCK, the TA-CLOCK classifies pages with weak tendencies as WW and WR. Therefore, the TA-CLOCK reduces the number of write operations in the PCM CLOCK and the number of unnecessary migrations, and enhances the write hit ratio in the DRAM CLOCK, which positively affects the EDP of the main memory system. Algorithm 1 describes the TA-CLOCK in detail, and Table 4 presents the parameter definition. First, when page p requested from the host system hits the DRAM CLOCK, the TA-CLOCK updates the parameters according to the operation type (lines 1-8). If write operation occurs in page p, the TA-CLOCK sets its dirty bit as 1 and increases its write count by 1 (lines 2-4). Conversely, if read operation occurs on page p, the TA-CLOCK sets its reference bit as 1 and increases its read count by 1 (lines 5-7). The reason for increasing the read/write counts is to calculate the write and read thresholds when the DRAM CLOCK becomes full at a later time. Meanwhile, if page p hits the PCM CLOCK with a write operation, it is migrated to the DRAM CLOCK (lines [10][11][12]. Conversely, if read operation occurs on page p in the PCM CLOCK, the TA-CLOCK sets its reference bit as 1 (lines [13][14][15]. If page faults occur in the TA-CLOCK, and the DRAM CLOCK becomes full, the TA-CLOCK migrates a page in the DRAM CLOCK to the PCM CLOCK based on its page replacement policy (lines 19-21) and inserts page p into a free page in the DRAM CLOCK. If the inserted page p is referenced by the write operation, the TA-CLOCK sets its dirty bit as 1 and increases its write count by 1 (lines 24-26). However, if it is referenced by a read operation, its reference bit is set to 1, and its read count is increased by 1 (lines 27-29

Parameter
Description The reference bit of page p. p dirty The dirty bit of page p. p read_count The read count of page p. p write_count The write count of page p. p tendency The page access tendency of page p.
To secure a free page in the DRAM CLOCK, the TA-CLOCK uses the DRAMPageReplacement() algorithm, whose details are described in Algorithm 2. The DRAM CLOCK hand points to page p in the DRAM CLOCK (line 1). When the reference bit of page p is set, the TA-CLOCK resets its reference as 0, and the clock hand points to the next page (lines [3][4][5]. Conversely, if the reference bit of page p is 0, the TA-CLOCK classifies its page access tendency to secure a free page in the DRAM CLOCK (lines 6-25). If the dirty bit of page p is set as 1, the TA-CLOCK calculates the write threshold of the DRAM CLOCK using Equation (1), and classifies the page access tendency of page p (lines 7-9). Subsequently, if the page access tendency of page p is SW, the TA-CLOCK retains it in the DRAM CLOCK and analyzes the page access tendency of the next page to secure a free page in the DRAM CLOCK (line 21). Meanwhile, if the page access tendency of page p is not SW, the TA-CLOCK calculates its read threshold using Equation (2) (lines [10][11]. Moreover, if its page access tendency is WW, the clock hand of TA-CLOCK points to the next page in the DRAM CLOCK and repeats the above processes (lines [13][14]. Furthermore, if the page access tendency of page p is WR, the TA-CLOCK evicts it to the storage and secures a free page in the DRAM CLOCK (lines [15][16]. In contrast, if the page access tendency of page p is SR, the TA-CLOCK migrates it to the PCM CLOCK (lines [17][18]. Finally, if page p is a clean page, it is discarded from the DRAM CLOCK (lines 22-23).
Algorithm 2 DRAMPageRelpacement (void). 1: p ← GetCurrentClockPointer(); 2: while page p is not empty do 3: if p re f is set then 4: p re f ← reset; 5: p ← GetNextClockPointer(); 6: else 7: if p dirty is set then // page p is dirty page. 8: Calculate Write_Threshold DRAM by Equation (1); 9: Classify the page access tendency of page p; 10: if p tendency is not SW then 11: Calculate Read_Threshold p of page p by Equation (2)

Scenarios of TA-CLOCK
In this section, we provide representative scenarios of the TA-CLOCK proposed in this paper. In these scenarios, pages that are allocated to the DRAM CLOCK are marked as P(x, y), where P represents the logical page number of each page. In the DRAM CLOCK pages, x and y represent the read and write counts, respectively. However, in the PCM CLOCK pages, x denotes the reference bits. In the presented scenarios, we assumed that the victim page of the DRAM CLOCK is already selected by the DRAM page replacement algorithm of the TA-CLOCK. Figure 3 shows an example of page allocation in the DRAM CLOCK and page migration from the DRAM CLOCK to the PCM CLOCK. In this example, the TA-CLOCK selects page A as the victim page. The Write_Threshold DRAM is 2.5, and Read_Threshold A is 0.2, using Equations (1) and (2), respectively. When a write operation occurs in page K, page A is selected as the victim page by the DRAM page replacement policy of TA-CLOCK (Step 1). Because the page access tendency of page A is SR, the TA-CLOCK migrates page A to the PCM CLOCK to secure a free page in the DRAM CLOCK for page K. To insert page A into the PCM CLOCK, the TA-CLOCK discards page O from the PCM CLOCK using the CLOCK algorithm (Step 2). Subsequently, the TA-CLOCK migrates page A to the location of the evicted page O in the PCM CLOCK (Step 3). Finally, it inserts page K into the location of the migrated page A in the DRAM CLOCK (Step 4).  When write operations occur again in the dirty page of the PCM CLOCK, the TA-CLOCK migrates or discards the selected victim page using its page access tendency. Figure 5 presents an example of page exchange between the DRAM CLOCK and PCM CLOCK. When a write operation occurs in dirty page A in the PCM CLOCK, the TA-CLOCK migrates page A to the DRAM CLOCK (Step 1). Subsequently, the TA-CLOCK selects the victim page to secure a free page page A in the DRAM CLOCK. In this example, the TA-CLOCK selects page N in the DRAM CLOCK as the victim page because Read_Threshold N and its page access tendency are 0 and SR, respectively (Step 2). Finally, page N is inserted into the location of page A in the PCM CLOCK (Step 3).

Performance Evaluation
In this section, we describe the evaluation environment and discuss the performance enhancement of our proposed approach in comparison with existing page replacement policies.

Experimental Setup
We used the gem5 simulator to evaluate the performance of the TA-CLOCK and existing main memory page replacement policies [19]. To guarantee the reliability of the experiments, we adopted the full system mode of the gem5 simulator. In our experiments, we measured the performance of our proposed policy by adjusting the ratio of DRAM to PCM in the hybrid main memory for comparison with existing hybrid main memory policies. In addition, we used ten billion simulated instructions in our experiments for evaluating the performance of the TA-CLOCK, compared with CLOCK, CLOCK-PRO, M-CLOCK, and CLOCK-DWF by the following indicators: the number of write operations in the PCM, the number of migrations between DRAM and PCM, the hit ratio of write operations in DRAM and the EDP [9,10,20]. We used weight write and weight read as 25 and 100, respectively. The remaining parameters of our experiments are summarized in Table 5.
We used the MiBench embedded benchmark program to compare and analyze the performance of our proposed policy with other policies [21]. MiBench is frequently used to evaluate the performance of various embedded systems [22][23][24]. In the experiments, we used benchmarks for performance evaluation with different memory footprint sizes and read/write ratios. In addition, we performed experiments by adjusting the size of the hybrid main memory to consider the corresponding memory footprint size of workloads, as described in Table 6.

Experimental Results
In this subsection, we present the number of write operations in the PCM, number of migrations between the DRAM and PCM, DRAM write hit ratio, and EDP of our proposed policy. The PCM write count is an important indicator of the performance of hybrid main memories. PCMs have a high write latency and limited write endurance compared to DRAMs. As a result, frequent write operations to the PCM reduces the performance of hybrid main memories. To solve this problem, the TA-CLOCK reduces the number of write operations in the PCM CLOCK by reducing page allocations to the PCM and uncessary migrations between the DRAM and PCMs. Figure 6 shows the number of write operations in the PCM of the TA-CLOCK compared with those of CLOCK, CLOCK-Pro, M-CLOCK, and CLOCK-DWF. The TA-CLOCK reduced the number of write operations in the PCM by averages of 93.6%, 92.1%, 41.6%, and 58.7% compared with those of CLOCK, CLOCK-Pro, M-CLOCK, and CLOCK-DWF, respectively. This shows that TA-CLOCK effectively performs state classification using Equations (1) and (2). Hence, it can utilize the limited endurance of PCM in the hybrid main memory of embedded systems.
When write operations frequently occur to a specific area of hybrid main memories, the write endurance of the system is reduced. TA-CLOCK alleviates the non-uniformity of write operations on a specific page by reducing the number of write operations in PCMs. Figure 7 shows the number of PCM writes, grouped by block units. TA-CLOCK shows the least number of write operations to whole areas of PCM, and write operations are evenly distributed. Table 7 described the average write counts and standard deviation on the PCM for each page replacement policy. Figure 8 shows     Figure 9 shows the write hit ratio of DRAM in the hybrid main memory. When frequent write operations occur in the PCMs of hybrid main memories, the performance of embedded systems is reduced, and it is difficult to utilize the limited lifespan of PCMs. Therefore, in embedded systems with hybrid main memories, research has been conducted toward designing write operations to occur in DRAM. In our experiments, TA-CLOCK had a higher DRAM write hit rate than other techniques, with averages of 21.1%, 19.6%, 0.3%, and 0.4% compared with those of CLOCK, CLOCK-Pro, M-CLOCK, and CLOCK-DWF, respectively, as shown in Figure 9. From this result, we can infer that TA-CLOCK effectively maintains write pages in the DRAM CLOCK by classifying the page access tendency of the victim page. Consequently, it reduces the PCM write count and the number of migrations between the DRAM and PCM, as observed in Figures 6 and 8.
The reduction of the energy consumption of hybrid main memories is important because embedded systems have limited battery capacity. Figure 10 shows the EDP of the TA-CLOCK normalized to CLOCK-DWF. EDP is an important metric for performance and energy consumption. As shown by Figure 10, TA-CLOCK reduced the EDP by averages of 51.6%, 48.1%, 3.8%, and 49.6% compared with those of CLOCK, CLOCK-Pro, M-CLOCK, and CLOCK-DWF, respectively, by reducing the number of write operations in the PCM. This result is supported by previous experimental results, in which write counts in the PCM and number of migrations between the DRAM and PCM were reduced, and the write hit ratio in the hybrid main memory was improved. Consequently, TA-CLOCK enhances the EDP of the hybrid main memory. Finally, the analysis of the time and memory complexity required by TA-CLOCK is as follows. At first, the results of time overhead are similar to those of EDP, which is the result of Figure 10. In fact, EDP considers the time overhead for each policy, as well as energy requirement, and the result of it shows final requirements in terms of time and energy. The memory space overhead is caused by the additional bits required for managing TA-CLOCK in hybrid main memories. Table 8 describes the average memory space overhead by adjusting the ratio of DRAM to PCM in the hybrid main memory for each page replacement policy. TA-CLOCK requires 8 bits as 1 reference bit, 1 dirty bit, 3 bits for read count and 3 bits for write count, and 1 bit for each PCM page as the reference bit. Therefore, the average memory space overhead of TA-CLOCK is 0.0115% for the whole memory capacity, and memory space overhead for all CLOCK based policies is negligible considering the whole memory capacity. The contributions of TA-CLOCK include reduction in the number of write operations in the PCM, which positively affects the write endurance and write latency, and reduction of the energy consumption of embedded systems. Figure 11 shows the average results of all workloads in the hybrid main memory. In the experiment results, TA-CLOCK efficiently reduced the number of write operations in the PCM by reducing the number of unnecessary migrations between the DRAM and PCMs. Additionally, it improved the hit ratio of write operations in the DRAM and enhanced the EDP. As shown by the average results in Figure 11a, the TA-CLOCK reduced the number of write operations in the PCM by 71.5% on average compared with other policies. Figure 11b shows the normalized migration counts between DRAM and PCM. The TA-CLOCK reduced the number of migrations by 57.3% on average compared with CLOCK-DWF and M-CLOCK. Figure 11c shows the hit ratio of the write operations in the DRAM. The TA-CLOCK improved the write hit ratio in the DRAM by 10.4% on average, compared with other policies. Finally, Figure 11d shows the normalized EDP in the hybrid main memory. In this experiment, TA-CLOCK reduced the EDP by 38.3% on average. Based on these results, TA-CLOCK improves the limited lifespan of PCMs with enhanced EDP compared with other policies.

Conclusions
In recent times, high-performance embedded systems have used PCM as their main memory. However, it is difficult to directly apply PCMs to the main memory of embedded systems owing to their disadvantages, including limited write endurance and high write latency. In this paper, we proposed TA-CLOCK, a novel page replacement policy for the hybrid main memory of high-performance embedded systems. TA-CLOCK classifies page access tendency in the hybrid main memory through access pattern analysis. To analyze the page access tendency, it uses read and write thresholds based on the page access counts of each page. By classifying the page access tendency, TA-CLOCK determines the location of the victim page, reduces unnecessary page migrations, and improves the hit ratio of write operations in the hybrid main memory. Additionally, it reduces the EDP in the hybrid main memory because it reduces the number of read and write operations in the storage. In the experiment, TA-CLOCK reduced the number of write operations in the PCM by 71.5% compared to CLOCK, CLOCK-Pro, M-CLOCK, and CLOCK-DWF, and it reduced the number of page migrations between the DRAM and PCM in the hybrid main memory by an average of 57.3%. Therefore, TA-CLOCK improves the write hit rate of DRAMs and enhances the EDP of hybrid main memories by averages of 10.4% and 38.3%, respectively, compared with existing techniques.