Adaptive Versioning in Transactional Memory Systems

Poudel, Pavan; Sharma, Gokarna

doi:10.3390/a14060171

Open AccessArticle

Adaptive Versioning in Transactional Memory Systems^†

by

Pavan Poudel

and

Gokarna Sharma

^*

Department of Computer Science, Kent State University, Kent, OH 44240, USA

^*

Author to whom correspondence should be addressed.

^†

This paper combines preliminary results that appeared in SSS’18 and SSS’19.

Algorithms 2021, 14(6), 171; https://doi.org/10.3390/a14060171

Submission received: 4 February 2021 / Revised: 27 May 2021 / Accepted: 28 May 2021 / Published: 31 May 2021

(This article belongs to the Special Issue Distributed Computing Theory, Systems, Algorithms, and Data Structures)

Download

Browse Figures

Versions Notes

Abstract

:

Transactional memory has been receiving much attention from both academia and industry. In transactional memory, program code is split into transactions, blocks of code that appear to execute atomically. Transactions are executed speculatively and the speculative execution is supported through data versioning mechanism. Lazy versioning makes aborts fast but penalizes commits, whereas eager versioning makes commits fast but penalizes aborts. However, whether to use eager or lazy versioning to execute those transactions is still a hotly debated topic. Lazy versioning seems appropriate for write-dominated workloads and transactions in high contention scenarios whereas eager versioning seems appropriate for read-dominated workloads and transactions in low contention scenarios. This necessitates a priori knowledge on the workload and contention scenario to select an appropriate versioning method to achieve better performance. In this article, we present an adaptive versioning approach, called Adaptive, that dynamically switches between eager and lazy versioning at runtime, without the need of a priori knowledge on the workload and contention scenario but based on appropriate system parameters, so that the performance of a transactional memory system is always better than that is obtained using either eager or lazy versioning individually. We provide Adaptive for both persistent and non-persistent transactional memory systems using performance parameters appropriate for those systems. We implemented our adaptive versioning approach in the latest software transactional memory distribution TinySTM and extensively evaluated it through 5 micro-benchmarks and 8 complex benchmarks from STAMP and STAMPEDE suites. The results show significant benefits of our approach. Specifically, in persistent TM systems, our approach achieved performance improvements as much as 1.5× for execution time and as much as 240× for number of aborts, whereas our approach achieved performance improvements as much as 6.3× for execution time and as much as 170× for number of aborts in non-persistent transactional memory systems.

Keywords:

transactional memory; persistent memory; versioning; conflicts; concurrency; switching

1. Introduction

Concurrent processes (threads) need to synchronize to avoid introducing inconsistencies while accessing shared data objects. Traditional synchronization mechanisms such as locks and barriers have well-known limitations and pitfalls, including deadlock, priority inversion, reliance on programmer conventions, and vulnerability to failure or delay. Transactional memory (TM) [1,2] has emerged as an attractive alternative. Several commercial processors provide direct hardware support for TM, including Intel’s Haswell [3] and IBM’s Blue Gene/Q [4], zEnterprise EC12 [5], and Power8 [6]. There are proposals for adapting TM to clusters of GPUs [7,8,9].

Using TM, program code is split into transactions, blocks of code that appear to execute atomically. Transactions are executed speculatively: synchronization conflicts or failures may cause an executing transaction to abort: its effects are rolled back and the transaction is restarted. In the absence of conflicts or failures, a transaction typically commits, causing its effects to become visible. Supporting this speculative execution requires data version management mechanism.

Many TM systems have been designed using the transactional concept in persistent memories as well as non-persistent (volatile) memories. Those designed TM systems can be distinguished on how they implement data version management. This distinction is true for TM systems implemented in hardware, called hardware TMs (HTMs) [10,11,12,13], as well as implemented in software, called software TMs (STMs) [14,15,16].

In this paper, we present a technique that improves on the existing data version management mechanisms used in both persistent and non-persistent TM systems. Essentially, a versioning mechanism handles data versions, i.e., the simultaneous storage of both new data (to be visible if transaction commits) and old data (retained if transaction aborts). At most one of these values can be stored “in place” (the original memory location), while the other value must be stored “on the side” (e.g., in cache or persistent/non-persistent main memory). On a store, a TM system can either use eager versioning and put the new value in place or use lazy versioning to (temporarily) leave the old value in place. Figure 1 and Figure 2 depict how a transaction

T_{x}

is executed using eager and lazy versioning in persistent and non-persistent (volatile) TM systems, respectively. Due to the working principle, in both the systems, lazy versioning makes aborts fast but penalizes commits, whereas eager versioning makes commits fast but penalizes aborts.

Although both eager and lazy versionings are studied heavily in the literature (details in Section 2) for both persistent [17,18,19,20] and non-persistent TM systems [10,11,12,13,14,15,16,21], whether to use eager or lazy versioning is still in hot debate for both the systems. In fact, there is no study in persistent/non-persistent TM systems that elaborates the performance gap between eager and lazy versioning with comprehensive practical evaluations, with one notable exception [22] which elaborates on the performance gap partially only for persistent TM systems. The conclusion from [22] is that lazy versioning is appropriate for write-dominated workloads and high contention scenarios whereas eager versioning is appropriate for read-dominated workloads and low contention scenarios. However, to improve performance using lazy or eager versioning, a priori knowledge on the workload as well as the contention scenario is needed.

We conducted a study to validate whether the conclusion from [22] also applies to non-persistent TM systems. Particularly, we executed genome and kmeans benchmarks from STAMP benchmark suite [23] using lazy and eager versioning and measured performance through execution time and number of aborts under varying number of threads (Refer to Section 6 for details on the experimental setup and implementation). The results obtained are shown in Figure 3.

The results show that lazy versioning performs well for genome whereas the opposite is true for kmeans, which is in line of the conclusion drawn in [22]. Again, this discrepancy in performance is mainly because of the fact that the versioning used is not appropriate for the workload and caused more number of aborts, subsequently increasing the execution time. This raises the question of how to choose the versioning method that is appropriate for an application, without a priori knowledge on the workload and contention scenario. However, this is a challenging issue, particularly due to the fact that (i) to select an appropriate versioning, a priori knowledge on the workload (write-dominated or read-dominated) and contention scenario (low or high) is needed, and (ii) such knowledge is difficult to obtain prior to runtime.

1.1. Contributions

In this article, we demonstrate that we can obtain the best of both worlds without any a priori knowledge on the workload and contention scenario. Particularly, we present an adaptive versioning for TM systems, which we call Adaptive, that dynamically switches the execution using either lazy or eager versioning at runtime, always achieving performance on any workload and contention scenario better than that is obtained using either lazy or eager versioning individually. We provide two different models of Adaptive, one for persistent TM systems and another for non-persistent TM systems. We reported a preliminary version of Adaptive for persistent TM systems in [24]. We reported a preliminary version of Adaptive for non-persistent TM systems in [25]. This article combines and extends those preliminary versions with a full set of experimental results. For the experimental evaluation in both the systems, we incorporated Adaptive in the latest TinySTM implementation [26,27] and ran experiments against a diverse set of TM benchmarks [26,27,28]. Specifically, we used 5 micro-benchmarks (bank, red black tree, hash set, linked list, and skip list) and 8 complex benchmarks (yada, vacation, ssca2, labyrinth, kmeans, intruder, genome, and bayes) from STAMP and STAMPEDE benchmarks [23,29]). We measured the performance of Adaptive w.r.t. four crucial performance metrics.

execution time: the total time to complete executing a set of transactions. This is the time interval from the beginning of the first transaction executed until the last transaction finishes and commits. In a dynamic setting, the execution time translates to throughput, the number of committed transactions per time step.
number of aborts: the total number of transaction aborts until the current time. If compared with the total number of transaction commits until the current time, it provides abort-to-commit ratio (ACR), a useful metric. The number of aborts directly affect execution time since it is likely that the execution time increases with the increasing number of aborts requiring more number of transaction restarts.
total number of data movements (for persistent TM systems only): the total number of movements of data to and from the original memory addresses. The execution time of transactions in persistent memory is directly affected by the number of reads and writes to the PM. The total number of reads and writes to the persistent memory addresses can also be defined as the total number of movements of data to and from the memory addresses. Thus, minimizing the total number of data movements decreases the total execution time of the transactions in PM.
total number of stores to persistent memory (for persistent TM systems only): the total number of writes to the persistent memory addresses. The motivation behind this performance metric is as follows. It has been heavily advocated that persistent memories significantly outperform traditional dynamic random access memories (DRAMs) due to low standby power, higher memory density, and much lower cost/bit [30,31]. However, persistent memories suffer from the write endurance problem, i.e., every persistent memory (PM) unit can sustain a very limited number of writes (i.e., stores) before it wears-out. Minimizing the total number of stores to the PM helps in mitigating the write-endurance problem in PM.

In persistent TM systems, the results suggest that, when using lazy versioning with encounter time locking (the two variants encounter-time-locking and commit-time-locking of the lazy versioning are described in Section 6.1), Adaptive achieves up to 1.21× better performance (i.e., 17% less execution time) than eager versioning and up to 1.27× better performance (i.e., 21% less execution time) than lazy versioning. When using lazy versioning with commit time locking, Adaptive achieves up to 1.39× better performance (i.e., 28% less execution time) than eager versioning and up to 1.5× better performance (i.e., 33% less execution time) than lazy versioning. Moreover, Adaptive has up to 240× and 17× less number of aborts compared to that in eager and lazy versioning, respectively.

In non-persistent TM systems, the results show that, when using lazy versioning with encounter time locking, Adaptive achieves up to 6.3× better performance than lazy versioning and up to 5.5× better performance than eager versioning. When using lazy versioning with commit time locking, Adaptive achieves up to 3.7× better performance than lazy versioning and up to 5× better performance than eager versioning. The minimum performance gain for Adaptive is 1.12.

These results suggest that switching between eager and lazy versioning dynamically at runtime provides a way to exploit the positive aspects of both versioning methods for both persistent memory and non-persistent TM systems. In summary, we have the following three contributions.

(Section 4) We introduce a novel versioning approach, Adaptive, that switches between eager and lazy versioning dynamically at runtime, and provide two models of Adaptive that are suitable for persistent memory and non-persistent TM systems, respectively.
(Section 5) We discuss the limitations of basic design of Adaptive for non-persistent TM system and present two optimizations.
(Section 6) We evaluate experimentally the performance of Adaptive in both persistent and non-persistent TM systems using five micro-benchmarks and 8 complex benchmarks from STAMP and STAMPEDE suites, report the results, and provide observations.

1.2. Organization

We discuss related work in Section 2. We discuss the memory model and some preliminaries in Section 3. We present our basic adaptive versioning approach in Section 4 and provide two models suitable for persistent and non-persistent TM systems, respectively. We discuss the limitation of the design of basic Adaptive in a non-persistent TM system in Section 5 and present some optimizations. We evaluate the performance of Adaptive in both the systems in Section 6. Finally, we provide concluding remarks in Section 7 with a short discussion on possible future work.

2. Related Work

We discuss here the persistent and non-persistent TM systems proposed in the literature, the use of eager and lazy versioning in those systems, and the conflict detection and resolution mechanisms. Table 1 summarizes the advantages and disadvantages of eager and lazy versioning in persistent and non-persistent TM systems.

2.1. Non-Persistent (Volatile) TM Systems

There are several previous studies in non-persistent TM systems, e.g., [10,11,12,14,15,16]. Table 2 shows the versioning mechanisms used in some widely-popular non-persistent TM systems. The previous studies used either eager or lazy versioning individually. There is no work that elaborates on the impact of using eager and lazy versioning on the performance of non-persistent TM systems. In fact, the majority of well-known non-persistent TM systems make contradictory conclusions on whether to use eager versioning or lazy versioning. For example, consider two widely popular HTM implementations LogTM [12] and UTM [10]. They advocate that TM should ideally use eager versioning and eager conflict detection (discussed in Section 3.5) since in eager versioning transaction commits are faster than transaction aborts. Moreover, commits are much more common than aborts in practical applications. In addition, eager conflict detection finds conflicts early and reduces the wasted work by conflicting transactions. On the other hand, consider another widely popular HTM implementation TCC [11]. They use lazy versioning and lazy conflict detection. Other HTMs such as VTM [13] and LTM [10] advocate lazy versioning with eager conflict detection. This is also the case in STMs as some use eager, some use lazy, and some use the combination of eager and lazy approaches of versioning and conflict detection methods, e.g., [14,15,16].

The other line of work is Hybrid TM systems (HyTMs) [32,33,34,35,36,37] where transactions are dynamically executed either in HTM or STM implementation. However, it is challenging and complicated to manage the concurrent execution of both hardware and software transactions in HyTM [33]. Therefore, to address this, in 2007, Lev et al. [38] proposed Phased Transactional Memory (PhTM) system to allow the execution of transactions in phases such that each phase is run in the same mode (HTM or STM) and the switching between them is supported seamlessly. PhTM benefits as it does not require coordination between transactions running in different modes. Recently, Carvalho et al. presented an improved version (PhTM*) [39] and its effectiveness in [40] by avoiding unnecessary switches to software mode. Both approaches, HyTM and PhTM, focus on getting better performance by dynamically switching between the HTM and STM implementations. This is different from the approach we present in this article since we deal with dynamically switching between eager and lazy versioning method at runtime to improve performance of a TM implementation, whereas the HyTM and PhTM approaches deal with switching between HTM and STM implementations.

2.2. Persistent TM Systems

The performance gap of using eager and lazy versioning is relatively well-studied in persistent TM systems. The most closely related work is due to Wan et al. [22], where they empirically evaluated eager and lazy versioning on the open source non-volatile memory library (NVML), PMDK, Ref. [41] for some constrained workloads, and suggested that “one versioning method does not fit all workloads”. Particularly, they reported that (i) lazy versioning significantly outperforms eager versioning for workloads in which a transaction updates large number of different objects, while it underperforms eager versioning for read-dominated workloads, and (ii) eager versioning is more sensitive to read-to-write ratios whereas lazy versioning is less sensitive to those ratios [22]. The other works mostly proposed methods through either eager or lazy versioning, and there is no work that elaborates the performance gap between eager and lazy versioning. For example, Coburn et al. [18] suggested an STM implementation for persistent memory, called NV-Heaps, using eager versioning. Volos et al. [19] suggested a TinySTM [26,27] variation, called Mnemosyne, for persistent memory using lazy versioning. NV-Heaps [18] and Mnemosyne [19] drew absolutely opposite conclusions on whether eager or lazy versioning is better for persistent memories. The former prefers to use eager versioning, and the latter opts to use lazy versioning. Furthermore, many other persistent TM systems such as [20,42] suggested using eager versioning.

Avni et al. [43] studied HTM-based transactions using lazy versioning. DudeTM [21] incorporates lazy versioning in their design where a transaction first runs in volatile memory using any HTM or STM implementation and produces a redo log for that transaction. The redo log is then flushed to persistent memory satisfying atomicity of data and then modify the original data in persistent memory according to the persistent redo log. Notice that this approach is different from ours and needs a shared shadow memory, besides persistent memory where that data is. Genc et al. [44] proposed a low overhead HTM compatible persistent transactional system, called Crafty, using eager versioning. Joshi et al. [45] proposed a persistent HTM system providing a hardware support for lazy versioning to reduce the performance overhead compared to software based implementations. Recently, Castro et al. [46] studied the scalability issues with the experimental results on Intel Optane DC [47] persistent memory and proposed scalable persistent hardware transactions (SPHT). Baldassin et al. [48] have introduced a phase based persistent TM system, NV-PhTM, where the transaction execution mode is switched between two phases, HTM and STM. The best execution mode among the two is selected according to the application’s characteristics. This is different than our approach where we switch between the versioning methods within a TM system instead of switching between hardware or software transaction execution mode.

The other related works in persistent memory study latency, scalability and ordering constraints problems. Krishnan et al. [49] proposed a persistent TM system, called TimeStone, that has minimal writes and low memory footprints. Gu et al. [50] presented a read-friendly persistent TM system, called Pisces, that maintains up to two versions of the data using dual-version concurrency control (DVCC) protocol and provides non-blocking reads. Kolli et al. [51] studied the ordering constraint problem for transactions in persistent memories and proposed deferred commit transactions (DCT) to achieve minimal ordering constraints. Lu et al. [52] proposed a system for reducing ordering constraints among persistent writes by distributing the commit status of a transaction among the data blocks. In [53], Memaripour et al. studied the latency overhead in byte addressable non-volatile memories and propose Kamino-Tx without requiring copying of data in the critical path.

Other several recent papers, e.g., [17,20,54,55,56,57,58], provided techniques to improve the time to log the data (e.g., through coalescing, through persistent cache, through hardware support, through eager+lazy versioning methods, etc.) for both undo and redo logs. However, our focus is on taking a different approach of dynamically switching between eager and lazy versioning at runtime to exploit advantages of both the versioning methods and our extensive experimental evaluation detailed in Section 6 confirms this exploitation. Our approach obviates the need of a priori knowledge on the workload as well as contention scenario to decide whether to use eager versioning or lazy versioning.

3. Model and Preliminaries

We assume that the execution starts at time

t_{0} = 0

. We measure in execution time the time for all the transactions within a benchmark to finish execution and commit, except for micro-benchmarks where we consider time to execute and commit 10,000 transactions. We also assume that only a single-version of data is stored in each eager, lazy, and adaptive versioning, which is essentially different from techniques, such as those given in [59], of storing multiple versions.

3.1. Persistent Memory Model

We consider a computer system with unlimited persistent memory, many processing cores, and no hard disk drive (HDD). All persistent memory is cache-able and caches are volatile and coherent. The system may include limited size DRAM (but we do not assume its necessity). We assume that all the writes of a committed transaction can be accommodated in the volatile cache, i.e., once a transaction commits but before the commit is reflected in original memory locations in persistent memory, all its newly modified data is in volatile cache. The system restarts and resumes its computation after experiencing failures/crashes. Therefore, the task after restart is to bring the data to a consistent state, removing effects of uncommitted transactions and applying the missing effects of the committed ones. In the experimental evaluation, we simulate crashes by periodically wiping out the volatile logs, and use the data stored in undo or redo logs in persistent memory to recover consistency. We employ a function that checks and maintains consistency while under execution.

3.2. Non-Persistent Memory Model

We consider a computer system with volatile shared main memory, many processing cores, and a HDD. All shared main memory is cache-able and caches are volatile and coherent. We assume that all the writes of a committed transaction can be accommodated in the cache, i.e., once a transaction commits but before the commit is reflected in original memory locations in main memory, all its newly modified data is in volatile cache. We run workloads using the TinySTM execution model [26,27].

3.3. Eager Versioning

Eager versioning is supported through so-called undo logs. Undo logs are stored in cachable main memory. In this method, a transaction works by first copying the data in original memory locations to a undo log area and then performs updates in-place in the original data locations (in main memory). In the event the transaction aborts, any modifications to the original memory locations are rolled back using the old data stored in the undo log. Figure 1a and Figure 2a illustrate eager versioning in persistent and non-persistent TM systems, respectively.

3.4. Lazy Versioning

Lazy versioning is supported through so-called redo logs. The operation of lazy versioning is slightly different in persistent and non-persistent TM systems. In non-persistent memories, redo logs are stored only in cache. But, in persistent memories, the redo logs are also persisted in the persistent memory before updating on original memory locations.

Using lazy versioning in non-persistent memories, a transaction copies all the data that it is going to write from original memory location to a redo log area, appends all its data updates to that log area, and then writes the updated data back to the original memory locations when the transaction commits. In persistent memories, the transaction additionally copies the updated data from redo log area in cache to the redo log area in persistent memory before writing back to the original memory locations. If the transaction fails, the updates in log area in cache are simply discarded. Therefore, the writing of data in redo log back to the original memory locations happens only when transaction commits. Figure 1b and Figure 2b illustrate lazy versioning in persistent and non-persistent memories, respectively.

3.5. Conflict Detection and Resolution

Conflict detection and resolution comes into play when concurrently executing transactions on both persistent/non-persistent TM systems read/write the same memory locations and certain combinations of read/write patterns cannot allow multiple transactions to proceed to commit. Conflict detection mechanism signals such an overlap between the write set (data written) of one transaction and the write set or read set (data read) of other concurrent transactions. Conflict detection is called eager if it detects offending loads or stores immediately and lazy if it defers detection until later when transactions commit. Conflict detection depends on whether lazy versioning is used or eager versioning. Table 2 illustrates some existing non-persistent TM systems that use lazy versus eager conflict detection with the versioning mechanism (lazy or eager) they use. For example, TCC [11] uses lazy conflict detection with lazy versioning and LogTM [12] uses eager conflict detection with eager versioning. Contention management technique is then used to decide on which conflicting transaction(s) to continue and which transaction(s) to wait (or abort and restart) the execution. This is typically done through a contention management strategy. There is a huge amount of work in this area giving many different strategies with and without provable properties on the guarantees they provide, e.g., [26,60,61,62,63,64,65,66,67,68].

3.6. Supporting Durable Transactions in TinySTM

We implemented durable transactions using TinySTM [26], a widely used lightweight STM implementation, as follows. For eager versioning, we maintain a undo log in persistent memory. When a transaction starts, each variable accessed by the transaction is added to the log before any modification is performed to it. Any update to the variable during the execution of the transaction is written directly to the variable’s original address. When the transaction commits, respective log records for the transactions are freed and the memory is made available to use by other transactions. If the transaction aborts, all the changes made by the transaction to the variables are written back with the previous values stored in the respective undo logs. Then the log records are flushed.

For lazy versioning, when a transaction starts, all the variables accessed by the transaction are loaded to volatile cache and modified. The new (or updated) values written by the transactions are then added to a redo log in persistent memory and also buffered in the volatile cache before the transaction commits. When the transaction commits, the new values are written back to the original memory addresses and the log records are flushed. We attach a timestamp based version number to each transactional log to make sure that the last committed value is used in the restart process.

4. Basic Adaptive Versioning

We now describe our approach, Adaptive, that runs transactions using either eager or lazy versioning, switching between them dynamically at runtime. Figure 4 compares Adaptive with eager and lazy versioning. The pseudocode of Adaptive is given in Algorithm 1.

Algorithm 1: Adaptive for a transaction T at any time

t \geq 0

₁

N_{E c o m m i t} \leftarrow

number of transaction commits until t executed using

E a g e r

versioning;
₂

N_{L c o m m i t} \leftarrow

number of transaction commits until t executed using

L a z y

versioning;
₃

N_{E a b o r t} \leftarrow

number of transaction aborts until t executed using

E a g e r

versioning;
₄

N_{L a b o r t} \leftarrow

number of transaction aborts until t executed using

L a z y

versioning;
₅

N_{c o m m i t} \leftarrow N_{E c o m m i t} + N_{L c o m m i t}

;

N_{a b o r t} \leftarrow N_{E a b o r t} + N_{L a b o r t}

;
₆

V_{c u r} \leftarrow

current versioning method for transactions running in a thread;
//

V_{c u r} \in {E a g e r, L a z y}

₇

t m s \leftarrow

Transactional Memory System type; //

t m s \in {p e r s i s t e n t, n o n_p e r s i s t e n t}

/* Initially, at

t = 0

, choose the versioning method based on
workload size (if available), otherwise randomly */
₈ if

N_{c o m m i t} + N_{a b o r t} = 0

then
₉ if information on read and write sets is available then
₁₀

W s e t (T) \leftarrow

write set of transaction T;
₁₁

R s e t (T) \leftarrow

read set of transaction T;
₁₂ if

| W s e t (T) | > | R s e t (T) |

then
₁₃

V_{c u r} \leftarrow L a z y

; Execute T using

L a z y

versioning;
₁₄ else

V_{c u r} \leftarrow E a g e r

; Execute T using

E a g e r

versioning;
₁₅ else Execute T using either eager or lazy versioning;
/* At any time

t > 0

, choose the versioning method based on
abort-to-commit ratio (ACR) */
₁₆ else if

t m s = = p e r s i s t e n t

then // Adaptivefor persistent memories
₁₇

A A R \leftarrow \frac{N_{a b o r t}}{N_{c o m m i t} + N_{a b o r t}}

,

A A R_{E a g e r} \leftarrow \frac{N_{E a b o r t}}{N_{E c o m m i t} + N_{E a b o r t}}

;
₁₈

A C R_{E a g e r} \leftarrow \frac{N_{E a b o r t}}{N_{E c o m m i t}}

,

A C R_{L a z y} \leftarrow \frac{N_{L a b o r t}}{N_{L c o m m i t}}

;
₁₉ if (

A A R \geq \frac{2}{3}

) ∨ ((

A C R_{E a g e r} > A C R_{L a z y}

) ∧ (

A A R_{E a g e r} \geq \frac{2}{3}

)) then
₂₀ Execute T using

L a z y

versioning;
₂₁ else
₂₂ Execute T using

E a g e r

versioning;
₂₃ else if

t m s = = n o n_p e r s i s t e n t

then // Adaptivefor non-persistent memories
₂₄

A C R_{E a g e r} \leftarrow \frac{N_{E a b o r t}}{N_{E c o m m i t}}

,

A C R_{L a z y} \leftarrow \frac{N_{L a b o r t}}{N_{L c o m m i t}}

;
₂₅ if ((

V_{c u r} = E a g e r

) ∧ (

A C R_{E a g e r} > \frac{1}{2}

)) then
₂₆

V_{c u r} \leftarrow L a z y

;
₂₇ else if ((

V_{c u r} = L a z y

) ∧ (

A C R_{L a z y} < 2

)) then
₂₈

V_{c u r} \leftarrow E a g e r

;
₂₉ Execute T using

V_{c u r}

versioning method;

4.1. High Level Overview

The high level idea in Adaptive is to switch the versioning method depending on performance. That is, if the versioning method currently used is hampering the performance, then switch the versioning to improve the performance. The fundamental question is how to identify and measure an indicator that reflects appropriately the effect of the versioning method on performance. Fortunately, in TM systems, if the number of aborts are increasing compared to the number of commits, then it is be a valid indicator of performance degradation due to the versioning method currently in use. Therefore, we pick abort to commit ratio (ACR) as a performance indicator. ACR has also been used quite heavily in the TM literature as a vital indicator of performance, for example, see [69].

Formally, ACR can be defined at any time

t > 0

as follows:

A C R = \frac{N_{a b o r t}}{N_{c o m m i t}},

(1)

where

N_{a b o r t}

is the total number of aborted transactions and

N_{c o m m i t}

is the total number of committed transactions from time 0 up to t. Ideally, the goal is to have no aborts, i.e.,

A C R = 0

. However, in practice, this may not be feasible and the goal is to minimize ACR as much as possible.

Let T be a transaction that comes to the system at time

t \geq 0

; we assume that the execution starts at time

t_{0} = 0

. Let

N_{E c o m m i t} (N_{L c o m m i t})

be the number of transaction commits in Adaptive from time

t_{0} = 0

until the current time

t > t_{0}

executed using eager (lazy) versioning. Similarly, let

N_{E a b o r t} (N_{L a b o r t})

be the total number of transaction aborts in Adaptive from time

t_{0} = 0

until time

t > t_{0}

executed using eager (lazy) versioning. Furthermore, let

N_{c o m m i t}

and

N_{a b o r t}

be the total number of commits and aborts in Adaptive from

t_{0} = 0

until time

t > t_{0}

. Notice that

N_{c o m m i t} = N_{E c o m m i t} + N_{L c o m m i t}

and

N_{a b o r t} = N_{E a b o r t} + N_{L a b o r t} .

The concept in Adaptive is to decide on which versioning method to use for executing T based on the parameters

N_{E c o m m i t}, N_{L c o m m i t}, N_{E a b o r t}

, and

N_{L a b o r t}

learned from the system at runtime. However, if T comes to the system at time

t_{0} = 0

, we have all

N_{E c o m m i t}, N_{L c o m m i t}, N_{E a b o r t}

, and

N_{L a b o r t}

zero. We treat this as a special case. In the special case of

t_{0} = 0

, a simple approach is to execute T using either lazy or eager versioning. However, if some information regarding the workload is available, then we can decide on which versioning method to use. Suppose, the read and write sets of T are available. Let

W s e t (T)

be the write set of T which is essentially the memory locations that T would modify while executing. Similarly, let

R s e t (T)

be the read set of T which is essentially the memory locations that T would read (but not modify) while executing.

R W (T) = R s e t (T) + W s e t (T)

, where

R W (T)

denotes the total number of memory locations that T reads and modifies while executing. If

| W s e t (T) | > | R s e t (T) |

, then T is executed using lazy versioning, otherwise using eager versioning.

For any transaction T arriving at time

t > t_{0}

, we have two different models of Adaptive described in the following sub-sections, one for persistent memories and another for non-persistent memories.

4.2. Adaptive for Persistent Memories

The idea we employ in Adaptive for persistent memories is to compute the number of data movements for eager and lazy versioning, separately, and switch between these methods when the data movement increases. Ideally, we would like to use the versioning method that gives optimum data movement performance for any specific workload. We use the following notions. Let N be the total number of transactions in any workload. When the workload finishes execution and all transactions commit, we have

N_{c o m m i t} = N

number of commits and

N_{a b o r t} \geq 0

number of aborts (if each transaction commits without even aborting a single time, then

N_{a b o r t} = 0

, otherwise

N_{a b o r t} > 0

). Suppose each transaction T has read write set

R W (T)

of size S.

If T comes to the system at time

t > t_{0}

after at least a transaction finishes executing one time (irrespective of whether that transaction aborts or commits), then it is executed based on the following parameters computed in Adaptive from time

t = 0

until time t.

$A A R = \frac{N_{a b o r t}}{N_{c o m m i t} + N_{a b o r t}}$ ; average abort ratio of transactions in Adaptive (using total aborts in both eager and lazy versioning).
$A A R_{E a g e r} = \frac{N_{E a b o r t}}{N_{E c o m m i t} + N_{E a b o r t}}$ ; average abort ratio of transactions in Adaptive executed using eager versioning.
$A C R_{E a g e r} = \frac{N_{E a b o r t}}{N_{E c o m m i t}}$ ; abort to commit ratio of transactions in Adaptive executed using eager versioning.
$A C R_{L a z y} = \frac{N_{L a b o r t}}{N_{L c o m m i t}}$ ; abort to commit ratio of transactions in Adaptive executed using lazy versioning.

At any time

t \geq 0

,

0 \leq A A R \leq 1

and

0 \leq A A R_{E a g e r} \leq 1

.

At any time

t > t_{0}

in Adaptive, T is executed using lazy versioning if (i)

A A R \geq \frac{2}{3}

or (ii)

A C R_{E a g e r} > A C R_{L a z y}

and

A A R_{E a g e r} \geq \frac{2}{3}

. Otherwise, T is executed using eager versioning. We call the value

\frac{2}{3}

switching threshold and we describe later how this switching threshold

\frac{2}{3}

is computed. The motivation behind using

\frac{2}{3}

as switching threshold in Adaptive for persistent memories is that it works on all the benchmarks we experimented our framework against. We now discuss how the switching threshold is computed.

Computation of Switching Threshold in Persistent Memories

The computation of switching threshold is based on the total movements of data from one memory location to the other (i.e., total number of loads and stores). The motivation behind using this metric for the computation of switching threshold is that the time spent by a transaction to load and store data from and to the memory addresses plays significant role in total execution time. Our objective in the design of Adaptive is to dynamically switch between the two versioning methods to obtain less number of data movements, and in return minimize execution time.

Let

W_{E a g e r}

be the total number of operations of moving data in eager versioning (i) from the original persistent memory locations to the undo log area and (ii) from the undo log area back to the original persistent memory locations. The first kind of moves are shown as ➀ in Figure 1a and the second kind of moves are shown as ➁ in Figure 1a. The first kind of moves are always done in eager versioning and the second kind of moves are done only when the transaction aborts. Therefore,

W_{E a g e r} = (N_{c o m m i t} + 2 N_{a b o r t}) \cdot S

(2)

Let

W_{L a z y}

be the total number of operations of moving data in lazy versioning (i) from the original persistent memory locations to the redo log area (in volatile cache), (ii) from the redo log area (in volatile cache) to persistent memory locations to persist the redo log, and (iii) finally, writing the data back to the original persistent memory locations either from redo log area in persistent memory after restart or from redo log area in volatile cache. The first kind of moves are shown as ➀ in Figure 1b, and the second and third kind of moves are shown as ➁ and ➂ in Figure 1b, respectively. The first kind of moves are always done in lazy versioning and the second and third kind of moves are done only when the transaction commits. Therefore,

W_{L a z y} = (3 N_{c o m m i t} + N_{a b o r t}) \cdot S

(3)

Notice that a transaction can run using either eager or lazy versioning when

W_{E a g e r} = W_{L a z y}

as the selection of a versioning method does not have impact on the total number of movements. Therefore, from Equations (2) and (3), we have that

\begin{matrix} (N_{c o m m i t} + 2 N_{a b o r t}) \cdot S & = & (3 N_{c o m m i t} + N_{a b o r t}) \cdot S \end{matrix}

(4)

\begin{matrix} N_{c o m m i t} + 2 N_{a b o r t} & = & 3 N_{c o m m i t} + N_{a b o r t} \end{matrix}

(5)

\begin{matrix} N_{a b o r t} & = & 2 N_{c o m m i t} \end{matrix}

(6)

Also, we have that

N \leq N_{a b o r t} + N_{c o m m i t}

. This implies that

\begin{matrix} \frac{N_{a b o r t}}{N} + \frac{N_{c o m m i t}}{N} & \geq & 1 \end{matrix}

(7)

\begin{matrix} \frac{2 N_{c o m m i t}}{N} + \frac{N_{c o m m i t}}{N} & \geq & 1 \end{matrix}

(8)

\begin{matrix} \frac{N_{c o m m i t}}{N} & \geq & \frac{1}{3} \end{matrix}

(9)

Therefore,

\frac{N_{a b o r t}}{N} < \frac{2}{3}

. That is, if the value of

N_{a b o r t}

is such that

\frac{N_{a b o r t}}{N}

is higher than or equals to

\frac{2}{3}

, then

W_{E a g e r} > W_{L a z y}

. Thus, Adaptive switches execution to lazy versioning when

\frac{N_{a b o r t}}{N} \geq \frac{2}{3}

and stays with eager versioning, otherwise.

4.3. Computation of Total Number of Stores to Persistent Memories

The total number of writes (i.e., stores) to the persistent memories are different when transactions are executed using different versioning methods.

In eager versioning, transactional (undo) logs are stored in the persistent memory and in-place memory updates are performed. Each memory location accessed by a transaction is added to the log. Thus, in default, there are two stores for each memory location accessed by a transaction. Additionally, if the transaction aborts, it needs to be rolled back to the previous consistent state using the undo log, which requires two additional stores to the persistent memory. Let

S T_{E a g e r}

be the total number of stores to the persistent memory in eager versioning,

N_{c o m m i t}

be the total number of commits and

N_{a b o r t}

be the total number of aborts, then,

S T_{E a g e r} = (2 N_{c o m m i t} + 2 N_{a b o r t}) \cdot S

(10)

where S is the size of RW set of the transactions.

In lazy versioning, all the computations are performed in volatile cache. If a transaction commits, then the changes are first persisted to the transactional (redo) logs and then updated to the original memory locations. That means, in lazy versioning, only committing transactions account for PM stores. Let

S T_{L a z y}

be the total number of stores to the persistent memory in lazy versioning and

N_{c o m m i t}

be the total number of commits, then,

S T_{L a z y} = (2 N_{c o m m i t}) \cdot S

(11)

From Equations (10) and (11), we can see that the total number of PM stores in lazy versioning is always less than that in eager versioning. So, from the perspective of minimizing total stores to persistent memories, lazy versioning seems better. However, this metric alone can not guarantee the better performance of transactions. Thus, we also consider other performance metrics such as execution time, abort rate and total data movements.

4.4. Adaptive for Non-Persistent Transactional Memories

The idea in the design of Adaptive for non-persistent memories is to compute the total time spent by transactions executing using eager and lazy versioning, separately, and switch between the versioning methods when the execution time increases. Ideally, again, we would like to use the versioning method in Adaptive that gives optimum performance in terms of execution time for any specific workload. From the working principle of a TM system, we can see that a transaction spends significant amount of time on moving data between the original memory location and the log areas in addition to the constant computation time. Figure 2 illustrates the working principle of a TM system. Moreover, it is likely that the total execution time increases with the increase in total number of aborts requiring more number of transaction restarts. Thus, we use abort to commit ratio (ACR) in the design of Adaptive for non-persistent memories.

For eager (and lazy) versioning, we can compute

A C R_{E a g e r}

(and

A C R_{L a z y}

) based on the number of transactions committed and aborted using eager (lazy) versioning as follows.

\begin{matrix} A C R_{E a g e r} = \frac{N_{E a b o r t}}{N_{E c o m m i t}} \end{matrix}

(12)

\begin{matrix} A C R_{L a z y} = \frac{N_{L a b o r t}}{N_{L c o m m i t}} \end{matrix}

(13)

To facilitate when to switch from one to another, we identify a threshold on ACR for both eager and lazy. We denote them by

T h r e s h o l d_{E a g e r}

and

T h r e s h o l d_{L a z y}

, respectively. Let a transaction T be running at current time t using lazy versioning. If

A C R_{L a z y} < T h r e s h o l d_{L a z y}

, then the versioning method is switched to Eager for transactions that start (or restart) execution after time

t^{'} > t

. An analogous approach is used if currently T is executing using eager versioning.

Based on

N_{E c o m m i t}, N_{L c o m m i t}, N_{E a b o r t},

and

N_{L a b o r t}

, we compute

A C R_{E a g e r}

and

A C R_{L a z y}

at each time step

t > t_{0}

. These ratios

A C R_{E a g e r}

and

A C R_{L a z y}

are then compared with

T h r e s h o l d_{E a g e r}

and

T h r e s h o l d_{L a z y}

parameters (computed in the next section). Therefore, at any time

t > t_{0}

, the transaction T that is ready-to-execute will be executed as follows.

Suppose the versioning currently used is $V_{c u r} = E a g e r$ . If $A C R_{E a g e r} > T h r e s h o l d_{E a g e r}$ , then $V_{c u r}$ is switched to $L a z y$ (i.e., $V_{c u r} \leftarrow L a z y$ ) and T will be executed using lazy versioning.
Suppose the versioning method currently used is $V_{c u r} = L a z y$ . If $A C R_{L a z y} <$ $T h r e s h o l d_{L a z y}$ , then $V_{c u r}$ is switched to $E a g e r$ (i.e., $V_{c u r} \leftarrow E a g e r$ ) and T will be executed using eager versioning.

Computing Switching Thresholds $T h r e s h o l d_{E a g e r}$ and $T h r e s h o l d_{L a z y}$

Let N be the total number of transactions in any workload. When the workload finishes execution and all transactions commit, we have that

N_{c o m m i t} = N

and

N_{a b o r t} \geq 0

(if each transaction commits without aborting, then

N_{a b o r t} = 0

, otherwise

N_{a b o r t} > 0

).

Suppose, each transaction T spends

α

amount of time while moving data from one memory location to other. Consider the case of executing T using eager versioning. Let

τ_{E a g e r}

be the total amount of time spent while (i) versioning data from the original memory locations to the undo log area and (ii) updating data from the undo log area back to the original memory locations. The first kind of operations are shown as ➀ in Figure 2a and the second kind of operations are shown as ➁ in Figure 2a. The first kind of operations are always done in eager versioning and the second kind of operations are done only when the transaction aborts. That means, for an aborted transaction, data movement is performed two times, one for versioning, other for rollback. Therefore, for eager versioning,

\begin{matrix} τ_{E a g e r} = (N_{c o m m i t} + 2 N_{a b o r t}) \cdot α \end{matrix}

(14)

Similarly, consider the case of executing T using lazy versioning. Let

τ_{L a z y}

be the total amount of time spent while (i) versioning data from the original memory locations to the redo log area and (ii) writing the data from the redo log area back to the original memory locations. The first kind of operations are shown as ➀ in Figure 2b and the second kind of operations are shown as ➁ in Figure 2b, respectively. The first kind of operations are always done in lazy versioning and the second kind of operations are done only when the transaction commits. That means, for a committed transaction, data movement is performed two times. Therefore, for lazy versioning,

\begin{matrix} τ_{L a z y} = (2 N_{c o m m i t} + N_{a b o r t}) \cdot α \end{matrix}

(15)

Based on 3 different cases below, we can see 3 scenarios for

τ_{E a g e r}

and

τ_{L a z y}

:

Case 1: If $N_{c o m m i t} = N_{a b o r t}$ , then $τ_{E a g e r} = τ_{L a z y}$
Case 2: If $N_{c o m m i t} > N_{a b o r t}$ , then $τ_{E a g e r} < τ_{L a z y}$
Case 3: If $N_{c o m m i t} < N_{a b o r t}$ , then $τ_{E a g e r} > τ_{L a z y}$

Moreover, equation for

τ_{E a g e r}

suggests that in eager versioning, total time spent for an aborted transaction is twice as much as the time spent for a committed transaction. Then it is immediate that the eager versioning performs better until

N_{c o m m i t} \geq 2 N_{a b o r t}

; i.e.,

\begin{matrix} \frac{N_{a b o r t}}{N_{c o m m i t}} \leq \frac{1}{2} \end{matrix}

(16)

Thus, we get

T h r e s h o l d_{E a g e r} = \frac{1}{2}

and switch to lazy versioning when

A C R_{E a g e r} > \frac{1}{2}

.

Equation for

τ_{L a z y}

suggests that the lazy versioning performs better until

2 N_{c o m m i t} \leq N_{a b o r t}

; i.e.,

\begin{matrix} \frac{N_{a b o r t}}{N_{c o m m i t}} \geq 2 \end{matrix}

(17)

Then, we get

T h r e s h o l d_{L a z y} = 2

and switch to eager versioning when

A C R_{L a z y} < 2

.

4.5. Contention Management

A transaction T is said to be conflicted with another transaction

T_{j}

in two cases: (i)

R s e t (T)

shares at least a memory location with

W s e t (T_{j})

, i.e.,

R s e t (T) \cap W s e t (T_{j}) \neq \emptyset,

and (ii)

W s e t (T)

shares at least a memory location either with

R s e t (T_{j})

(i.e.,

W s e t (T) \cap R s e t (T_{j}) \neq \emptyset

) or with

W s e t (T_{j})

(i.e.,

W s e t (T) \cap W s e t (T_{j}) \neq \emptyset

). Any contention management technique requires at least

x - 1

transactions out of the

x \geq 2

conflicted transactions to abort. There has been an extensive study on contention management and several techniques with different performance properties are available, e.g., [26,27,60,61,62,63,64,65,66,67,68]. We use the following strategies for resolving conflict of a transaction T with transaction

T_{j}

, which are discussed in detail in [26,27,65].

suicide: T aborts and restarts immediately.
kill (aka aggressive):T kills $T_{j}$ and continues its execution.
delay: T aborts immediately but waits until the contended memory location is released before restarting. This increases the chance of the transaction to commit with no interruption upon retry, but may increase total execution time.
back-off: T uses an exponential back-off mechanism to resolve conflict.

4.6. Time Barrier Requirement and Design

The ideal scenario in Adaptive is to let each transaction T run Algorithm 1 and decide which versioning (eager or lazy) to use for it to execute individually based on the parameters obtained at runtime. Let

S_{j}

be a set of transactions arrived before T. Suppose current versioning method for executing the transactions in

S_{j}

is

V_{c u r} = L a z y

and the transaction T satisfies for switching the versioning method to

V_{n e w} = E a g e r

. Suppose the versioning changed to

E a g e r

from

L a z y

after the transactions in

S_{j}

started execution but before T. If we run T using

E a g e r

immediately and T conflicts with any of the transaction

T_{j} \in S_{j}

, then the conflict detection and resolution mechanisms interfere, hampering consistency. A simple approach to handle this situation is to ask T to wait until all transactions in

S_{j}

finish execution, which we call a basic time barrier (as shown in Figure 4). The pseudocode is given in Algorithm 2. The barrier reduces total number of aborts but due to a time delay before switching, it may increase total execution time [24]. We provide a better time barrier design appropriate for non-persistent TM systens (described in Section 5) that will minimize this overhead.

Algorithm 2: Basic Time Barrier Design

₁

T \leftarrow

new transaction arrived for execution;
₂

S_{j} \leftarrow

set of other transactions arrived before T;
₃

V_{c u r} \leftarrow

versioning method for currently running transactions;
₄

V_{n e w} \leftarrow

new versioning method computed for the transaction T;
₅ if (

V_{n e w} \neq V_{c u r}

) then // i.e., $V_{c u r}$ and $V_{n e w}$ are complement of each other in the set { $E a g e r, L a z y$ }
₆ if (there is no transaction

T_{j} \in S_{j}

such that

T_{j}

is executing using

V_{c u r}

when T wants to execute) then
₇

V_{c u r} \leftarrow V_{n e w}

;
₈ Start executing T using

V_{n e w}

;
₉ else
₁₀ Wait until each transaction

T_{j} \in S_{j}

finish execution;

5. Optimizations on Basic Adaptive Versioning

5.1. Limitation of Basic Adaptive in Non-Persistent Memories

In basic Adaptive, no two transactions can execute simultaneously with different versioning methods, i.e., if a new transaction tries to execute with a different versioning method than the currently executing one, the basic time barrier prevents it to execute simultaneously. The design of basic time barrier in Adaptive requires a transaction T to wait until all the previous transactions finish their executions if T wants to execute with different versioning method than the previous transactions. This also prevents to execute two non-conflicting transactions concurrently with different versioning methods. Thus, to alleviate these problems, we provide two optimizations to basic Adaptive. The first optimization is on time barrier design. The second optimization is on switching mechanism.

5.2. Better Time Barrier Design

The pseudocode of the better time barrier design is given in Algorithm 3. The objective of better time barrier design is to increase concurrency as opposed to the basic time barrier design. For this, we allow each transaction to start its execution (with possibly new versioning method) as soon as it becomes ready rather than waiting for other in-flight transactions to complete. Figure 5 illustrates the idea of better time barrier design. Consider a transaction T. Let

S_{j}

be a set of transactions arrived before T. Suppose current versioning method for executing transactions in

S_{j}

is

V_{c u r} = L a z y

and new versioning method computed for transaction T is

V_{n e w} = E a g e r

. Suppose the versioning method changed to

E a g e r

from

L a z y

after the transactions in

S_{j}

started execution (but not completed yet) and before T starts execution. In the basic time barrier design, T has to wait until all the transactions in

S_{j}

finish execution. In the better time barrier design, we ask T to start execution as soon as it is ready. If T does not conflict with any transaction in

S_{j}

, then T continues its execution until it commits, otherwise, T aborts. In order to detect the possible conflict of T with the transactions in

S_{j}

, we add a 1-bit modify flag to each memory address that is going to be updated by the transactions in

S_{j}

. The modify bit associated to a memory address is set to 1 at the start of a transaction if it is going to be updated and is reset back to 0 at the time of commit. If T conflicts with

T^{'} \notin S_{j}

, it is handled as per the contention management technique adapted in the design (e.g., suicide, kill etc).

Algorithm 3: Better Time Barrier Design

₁

T \leftarrow

new transaction arrived for execution;
₂

S_{j} \leftarrow

set of other transactions arrived before T;
₃

V_{c u r} \leftarrow

versioning method for currently running transactions;
₄

V_{n e w} \leftarrow

new versioning method computed for the transaction T;
₅ if (

V_{c u r} \neq V_{n e w}

) then
₆

V_{c u r} \leftarrow V_{n e w}

;
₇ Execute T using

V_{c u r}

;
₈ if (T conflicts with

T_{j} \in S_{j}

) then
₉ Abort T;
₁₀ else if (T conflicts with

T^{'} \notin S_{j}

) then
₁₁ Handle conflict between T and

T^{'}

using the contention management technique;

5.3. Better Switching Mechanism Design

Switching between Eager and Lazy versioning should ideally be done with no overhead. However, there might be a possibility of continuous switching between the versioning methods for every new transaction. This may result a significant amount of overhead in total execution time of the transactions. Thus, the idea is to minimize the total number of switching between the versioning methods without affecting the total execution time of the transactions. The better switching mechanism avoids the possible continuous switching between the versioning methods for every new transaction, thus helps to reduce the overhead. The versioning method is switched from one to another only if the switching condition is satisfied continuously for a specified number of times (which we call a switching interval threshold

S W_I N T

). Let the current versioning method

V_{c u r} = E a g e r

. Suppose at time t, Adaptive decides to switch to

V_{n e w} = L a z y

. With better switching mechanism, Adaptive does not switch to

V_{n e w} = L a z y

at t but waits until a switching interval threshold

S W_I N T

. We define

S W_I N T

as the number of transactions that execute using the current versioning method

V_{c u r}

before switching to the new versioning method

V_{n e w}

after t. When

S W_I N T = 0

, Adaptive does not wait for switching the versioning method. We use

S W_I N T > 0

in the better switching mechanism design. Let

λ

be the time interval during which

S W_I N T

number of transactions finish execution using the current versioning method

V_{c u r} = E a g e r

. For every next (new or restarted) transaction arrived during the interval

λ

, if

V_{n e w} = L a z y

satisfies (i.e.,

V_{n e w} \neq V_{c u r}

), then the versioning method switches to

V_{n e w} = L a z y

at time

t + λ

. Otherwise, versioning method remains to

V_{c u r}

. We determine the switching interval threshold

S W_I N T

by using an empirical method, i.e., varying the value of

S W_I N T

from 2 up to 10 and picking the one with the best performance. Note that the time interval

λ

denotes the time elapsed until the consecutive

S W_I N T

number of transactions satisfy for the switching of the versioning method. So, the value of

λ

may not be necessarily the same for all switching events. Figure 6 illustrates the design of the better switching mechanism. The pseudocode is given in Algorithm 4.

6. Experimental Evaluation

We now evaluate the performance of Adaptive using 5 micro-benchmarks and 8 complex benchmarks from STAMP and STAMPEDE benchmark suites. The evaluation is performed in an STM implementation using TinySTM [26,27] modified appropriately to incorporate Adaptive. For persistent TM, the tests were executed on an Intel Core i7-7700K 4.20 GHz, 64-bit Haswell processor with 4 cores, each with 2 hyper threads. Each core has private L1 and L2 caches, whose sizes are 256 KB and 1 MB, respectively. There is also an 8 MB L3 cache shared by all 4 cores and 32 GB main memory. The results reported are the average of 10 runs varying the number of threads from 1 to 16. For non-persistent TM, the tests were executed on an Intel Xeon(R) E5-2620 v4 @ 4.20 GHz, 64-bit processor with 32 cores. Each core has private L1 and L2 caches, whose sizes are 64 KB and 256 KB, respectively. There is also an 20 MB L3 cache shared by all 32 cores and 32 GB main memory. The results reported are the average of 10 experimental runs. The results are for varying number of threads from 1 to 32. First, we present the experimental results for basic Adaptive in persistent memories. We also provide the execution time overhead in the basic Adaptive. Later, we provide the experimental results for optimized Adaptive with better time barrier using suicide contention management technique in non-persistent memories. And finally, we extend the results in non-persistent memories using both better time barrier and switching mechanism. We also compare the performance of optimized Adaptive against four different contention management techniques.

6.1. Experimental Setup

We developed an STM-based implementation using TinySTM [26,27]. TinySTM has implemented separately both lazy and eager versioning (called

L a z y

and

E a g e r

) through

W r i t e_B a c k

and

W r i t e_T h r o u g h

designs, respectively. With

W r i t e_T h r o u g h

design, transactions directly write to original memory locations and revert their updates in case the transactions abort. However, with

W r i t e_B a c k

design, transactions work on a copy of data and delay their updates to the original memory locations until commit [26,27]. Furthermore,

W r i t e_B a c k

design has two different implementations:

W r i t e_B a c k_E T L

and

W r i t e_B a c k_C T L

. Encounter-time locking (ETL) detects conflicts early at the time of write and acquires the lock on the memory address before it is written. Commit-time locking (CTL) defers conflict detection on memory address until commit, i.e., the lock is acquired on the memory address at the commit time. Therefore, there are two different implementations of

L a z y

in TinySTM: one based on

E T L

called

L a z y_E T L

and another based on

C T L

called

L a z y_C T L

. We obtain adaptive design

A d a p t i v e_E T L

using

L a z y_E T L

and

E a g e r

versioning. Similarly, we obtain adaptive design

A d a p t i v e_C T L

using

L a z y_C T L

and

E a g e r

versioning. We run experiments with five different designs

L a z y_E T L

,

L a z y_C T L

,

E a g e r

,

A d a p t i v e_E T L

, and

A d a p t i v e_C T L

.

Algorithm 4: Better Switching Mechanism Design

₁

T \leftarrow

new transaction arrived for execution;
₂

V_{c u r} \leftarrow

versioning method for currently running transactions;
₃

V_{n e w} \leftarrow

new versioning method computed for the transaction T;
₄

t \leftarrow

the timestamp at which the versioning satisfies to switch from

V_{c u r}

to

V_{n e w}

;
₅

S W_I N T \leftarrow

switching interval threshold;
₆

λ \leftarrow

time taken by the

S W_I N T

transactions arrived on or after timestamp t to finish their executions;
₇ if (

V_{c u r} \neq V_{n e w}

for time

t + λ

) then
₈

V_{c u r} \leftarrow V_{n e w}

; // switch the versioning method to

V_{n e w}

at timestamp

t + λ

₉ Execute T using

V_{c u r}

;

6.2. Persistent Memory Emulation

We emulate persistent memory using DRAM in our experiments following Avni et al. [43]. We separate 500 MB region of DRAM for the persistent memory emulation. All the original data are kept in this region. Moreover, we use this region for keeping the persistent undo log when a transaction runs using eager versioning and to persist the redo log when transaction runs using lazy versioning. To emulate the power failure and crash in persistent memory, we leave the power on and wipe out all the volatile log records so that the rollback (in case of abort in eager versioning) and update (in case of commit but not yet written to memory in lazy versioning) operations will be handled by those persistent log records. We use spin loop for this purpose that runs for around 200 ms (which is sufficient to load and store the log records).

6.3. Benchmarks

We use both micro and complex benchmarks from the TM literature. A summary of some prominent properties of these benchmarks such as targeted applications, short description of the applications, the size of the read write (RW) set, and the amount of contention is given in Table 3.

Micro-Benchmarks: We use five micro-benchmarks, namely bank, red black tree, hash set, linked list, and skip list used in several previous studies, e.g., [21,22,26,27,28]. These benchmarks simulate the basic concurrent execution scenario for transactions with (relatively) small read/write sets.
STAMP: STAMP is a well-known and widely-used benchmark. It consists of eight applications: bayes, genome, intruder, kmeans, labyrinth, ssca2, vacation, and yada of varying complexity. These applications span a variety of computing domains as well as runtime transactional characteristics such as varying transaction lengths, read and write set sizes, and amounts of contention [23].
STAMPEDE: Recently, Nguyen et al. [29] argued that the programming model and data structures used in STAMP benchmarks suffer from performance bottlenecks. They then modified the programming structure of these benchmarks in a way the bottlenecks can be removed. They finally provided a set of rewritten STAMP benchmarks calling them STAMPEDE benchmarks.

6.4. Evaluation of Adaptive in Persistent Memories

For persistent memories, we ran the experiments using up to 16 threads and report the results accordingly with an average of 10 runs for each thread. We present 4 different types of results for each benchmark suite: (i) total data movements (including both loads from and stores to the persistent memory), (ii) total number of stores to the persistent memory, (iii) total number of aborts, and (iv) total execution time. We first discuss results for metrics (i)–(iii) separately for each benchmark suite and then we discuss results for metric (iv) together for all the benchmark suites.

Results on Micro-benchmarks. All the transactions in these benchmarks were run with update rate of 20%. When transactions were executed with less number of threads, we found that the transaction commit rate is higher than the transaction abort rate and the cost in lazy versioning is higher than the cost in eager versioning. With the increase in number of threads, the abort rate is also increased. Figure 7, Figure 8 and Figure 9 provide the experimental results on all five micro-benchmarks for total data movements, total number of stores to PM and total number of aborts, respectively. We noticed that

L a z y_C T L

has consistently better performance than

L a z y_E T L

on all the five micro-benchmarks. This is because the early detection of conflict and locking the memory addresses has increased abort rate than detecting the conflicts and locking the memory addresses at the commit time. We observed that the total number of aborts in Adaptive versioning has been decreased compared to that in both eager and lazy versioning. To be specific,

A d a p t i v e_E T L

has up to 1.5× less number of aborts than

L a z y_E T L

and up to 1.7× less number of aborts than

E a g e r

. Similarly,

A d a p t i v e_C T L

has upto 1.3× less number of aborts than

L a z y_C T L

and upto 3.8× less number of aborts than

E a g e r

. Figure 7 shows that the total data movements to and from the persistent memory (i.e., loads and stores to the PM) has been decreased in Adaptive versioning.

A d a p t i v e_E T L

has up to 3.4× less data movements than

L a z y_E T L

and up to 1.1× less data movements than

E a g e r

.

A d a p t i v e_C T L

has up to 3× and 1.3× less number of data movements compared to that in

L a z y_C T L

and

E a g e r

, respectively. Figure 8 shows the total number of stores to the PM. We can see that lazy versioning has less number of stores to the persistent memory. This is because the aborts in lazy versioning do not participate in stores to the persistent memory. On the other hand, in eager versioning, an abort requires the memory addresses to be rolled back to the previous consistent states, thus increasing the total number of stores to the PM. We observed that the total number of stores in Adaptive is always less than the

E a g e r

and is greater than

L a z y

in most of the cases. Compared to

E a g e r

,

A d a p t i v e_E T L

has up to 1.4× less number of PM stores and

A d a p t i v e_C T L

has up to 2.1× less number of PM stores. Compared to

L a z y

,

A d a p t i v e_E T L

has up to 1.8× more number of PM stores and

A d a p t i v e_C T L

has up to 1.6× more number of PM stores. We also noticed that

A d a p t i v e_C T L

performs better than

A d a p t i v e_E T L

in each micro-benchmark. This is because

L a z y_C T L

performs better than

L a z y_E T L

and

A d a p t i v e_E T L

was designed using

E a g e r

and

L a z y_E T L

whereas

A d a p t i v e_C T L

was designed using

E a g e r

and

L a z y_C T L

, respectively.

Results on STAMP Benchmarks.Figure 10 provides total data movement results. It is obvious that when transactions are executed with less number of threads, transaction commit rate is higher and there is less number of total data movements

E a g e r

than

L a z y

. With the increase in number of threads, transaction abort rate also increases and total number of data movements in

E a g e r

also starts to increase due to the frequent requirement of rollbacks. The results obtained for genome and kmeans-low show that

E a g e r

starts to encounter more data movements than

L a z y

beyond 8 threads. The same scenario starts beyond 4 threads in Intruder and yada. Irrespective of the abort rate change, Adaptive always has less number of total data movements compared to the respective eager and lazy versioning. Specifically,

A d a p t i v e_E T L

has up to 6× less data movements than

L a z y_E T L

and up to 2× less data movements than

E a g e r

.

A d a p t i v e_C T L

achieved up to 3× less data movements compared to

L a z y_C T L

and up to 35× less data movements (in yada) compared to

E a g e r

.

Figure 11 shows the results for total number of stores to the persistent memory. Similar to micro-benchmarks,

L a z y

versioning has less number of PM stores than the

E a g e r

versioning in STAMP benchmarks as well. In Adaptive versioning, total number of PM stores decreases compared to

E a g e r

(up to 28×) and increases compared to

L a z y

(up to 2×).

Figure 12 shows the results for total number of aborts. Similar to micro-benchmarks, the total number of aborts in STAMP benchmarks also decreases when executing the transactions using Adaptive versioning. This is because the Adaptive versioning always tries to minimize the total data movements by adapting a suitable versioning method between

E a g e r

and

L a z y

. In

A d a p t i v e_E T L

, the total number of aborts are up to 3× and 17× less than

E a g e r

and

L a z y_E T L

, respectively. In

A d a p t i v e_C T L

, the total number of aborts are up to

240 \times

and

2.8 \times

less than

E a g e r

and

L a z y_C T L

, respectively.

Results on STAMPEDE Benchmarks.Figure 13 provides the experimental results for total data movements. Similar to micro- and STAMP benchmarks, Adaptive has less number of total data movements compared to

E a g e r

and

L a z y

in STAMPEDE benchmarks as well. We observed that

A d a p t i v e_E T L

has up to 3.6× less data movements than

L a z y_E T L

and

A d a p t i v e_C T L

phas up to 6× less data movements than

L a z y_C T L

. Compared to

E a g e r

,

A d a p t i v e_E T L

achieved up to 4.6× less data movements and

A d a p t i v e_C T L

achieved up to 3.1× less data movements.

Figure 14 shows the experimental results for total number of PM stores in STAMPEDE benchmarks. It also follows the results obtained for micro- and STAMP benchmarks where

L a z y

versioning has less number of PM stores than the

E a g e r

versioning and Adaptive versioning lies between the two values. To be precise,

A d a p t i v e_E T L

has up to 64× less number of PM stores than

E a g e r

and up to 2.7× more number of PM stores than

L a z y_E T L

whereas

A d a p t i v e_C T L

has up to 9× less number of PM stores than

E a g e r

and up to 18× more number of PM stores than

L a z y_C T L

The experimental results for the total number of aborts are shown in Figure 15. Again, similar to micro- and STAMP benchmarks, the total number of aborts in Adaptive versioning has been decreased compared to that in

E a g e r

and

L a z y

versioning. Precisely,

A d a p t i v e_E T L

has up to 14.3× and 14.7× less number of aborts compared to

E a g e r

and

L a z y_E T L

, respectively. Similarly,

A d a p t i v e_C T L

has up to 2.7× and 9.2× less number of aborts compared to

E a g e r

and

L a z y_C T L

, respectively.

To summarize the above results, in all three benchmark suites, we observed that the total movements of data (i.e., loads and stores) and the total number of aborts in Adaptive versioning decrease compared to that in

E a g e r

and

L a z y

versioning. This helps us to achieve better execution time in Adaptive. Particularly, as the number of aborts decrease in Adaptive design compared to the non-adaptive baselines, there will be less number of transaction restarts as well as less number of data movements which ultimately reduces the total execution time of the transactions. We present the execution time results for all three benchmark suites in the next sub-section. We also observed that the total number of stores to persistent memories in Adaptive versioning are decreased compared to

E a g e r

. However, compared to

L a z y

, they are increased in Adaptive versioning.

Execution Time Results for Persistent Memories. Execution time is impacted in Adaptive due to the switching between eager and lazy versioning at runtime. Additionally, the design of time barrier also introduces time delay in some benchmarks. In most of the benchmarks, the delay due to time barrier is compensated as Adaptive lowers the data movements and the number of aborts. We were interested to see the maximum increase on time in any benchmark that we used in our experimentation.

The results obtained for micro-benchmarks are shown in Figure 16. Recall that, for micro-benchmarks, we measured the execution time for 10,000 transactions, each executed with an update rate of 20%. All the 5 micro-benchmarks were executed with the five different versioning designs and the total number of transactions for each design were counted.

We noticed that, in most of the applications, execution time in Adaptive decreases compared to that in both

E a g e r

and

L a z y

versioning. This is due to the decrease in total number of data movements and total number of aborts in Adaptive versioning. In

A d a p t i v e_E T L

, execution time decreases by up to 21% compared to

L a z y_E T L

and up to 17% compared to

E a g e r

. In

A d a p t i v e_C T L

, execution time decreases by up to 28% compared to

L a z y_C T L

and up to 33% compared to

E a g e r

. However, in some applications (for example see the results for bank and linked list), the execution time in Adaptive increases compared to that in

E a g e r

or

L a z y

versioning. This is because the decrease in number of aborts is significantly less and is insufficient to compensate the overhead due to barrier and switching between the versioning methods. In bank micro-benchmark, we noticed that the execution time in

A d a p t i v e_C T L

increases by up to 9% compared to

L a z y_C T L

. In linked list, the execution time in

A d a p t i v e_C T L

increases by up to 12% compared to

L a z y_C T L

and the execution time in

A d a p t i v e_E T L

increases by up to 10% compared to

L a z y_E T L

.

For the STAMP and STAMPEDE benchmarks, we measured the execution time for each of the applications. Figure 17 and Figure 18 illustrate the execution time results for the STAMP and STAMPEDE benchmarks, respectively. As Adaptive lowers the data movements and the number of aborts, most of the applications (e.g., bayes, kmeans high, labyrinth, ssca2 and vacation high in Figure 17 and Figure 18) have decreased execution time in Adaptive than in

E a g e r

or

L a z y

designs. However, in some applications (e.g., genome, intruder and yada), we noticed that the execution time in Adaptive increases by at most 16% more compared to the execution time of

E a g e r

or

L a z y

.

We observed that the experimental results presented above for the execution time in persistent TM barely scale in throughput beyond 8 threads. This occurred due to the experimental environment used for conducting the tests. For persistent TM, the tests were executed on an Intel Corei7-7700K 4.20 GHz, 64-bit Haswell processor with 4 cores, each with 2 hyper threads. That means, it has 4 physical cores and 8 logical cores. Now, up to 8 threads, each process core handles an individual thread. But when executing with 16 threads, threads spend more time waiting for their turn to be handled, thus increasing execution time and decreasing throughput.

To summarize, in all of the benchmark suites, Adaptive performs better for total number of data movements and total number of aborts compared to individual

E a g e r

and

L a z y

designs. Also, in most of the applications in each benchmark suite, the total execution time in Adaptive decreases compared to that in

E a g e r

and

L a z y

. However, in some cases, the decrease in data movements and the number of aborts is insufficient to lower the overhead due to barrier and switching between the versioning methods and that increases the execution time in Adaptive by at most 16% compared to that of using

E a g e r

and

L a z y

designs.

6.5. Evaluation of Adaptive in Non-Persistent Memories

We report the results from experiments performed using up to 32 threads. The evaluation is on performance metrics metrics execution time and number of aborts.

Results on Micro-benchmarks. The execution time results in five different micro-benchmarks are provided in Figure 19. Figure 20 provides the result for the number of aborts. The results are for 10,000 transactions, each executed with update rate of 20%. Figure 19 shows that the execution time decreases notably in Adaptive as compared to the other versioning methods with the increase in number of threads for all the micro-benchmarks. Specifically,

A d a p t i v e_E T L

achieved up to

6.3 \times

better execution time than

L a z y_E T L

and

A d a p t i v e_C T L

achieved up to

3.7 \times

better execution time than

L a z y_C T L

. Compared to

E a g e r

,

A d a p t i v e_E T L

achieved up to

5.5 \times

better execution time and

A d a p t i v e_C T L

achieved up to

5 \times

better execution time. The minimum execution gain for

A d a p t i v e_E T L

beyond 4 number of threads is 1.23 and for

A d a p t i v e_C T L

is 1.20. Due to high contention for memory access when transactions are executed with more number of threads, the number of aborts increases with the increasing number of threads. Figure 20 shows that Adaptive minimizes number of aborts. Specifically,

A d a p t i v e_E T L

achieved up to

2.6 \times

less number of aborts than

L a z y_E T L

and up to

5.8 \times

less number of aborts than

E a g e r

.

A d a p t i v e_C T L

achieved up to

2.2 \times

less number of aborts than

L a z y_C T L

and up to

8 \times

less number of aborts than

E a g e r

.

Results on STAMP Benchmarks.Figure 21 and Figure 22, respectively, provide the execution time and number of aborts results. Regarding execution time,

A d a p t i v e_E T L

has up to

1.78 \times

better time than

L a z y_E T L

and

A d a p t i v e_C T L

has up to

1.74 \times

better time than

L a z y_C T L

. Compared to

E a g e r

, the execution time improvement in

A d a p t i v e_E T L

and

A d a p t i v e_C T L

is up to

2.36 \times

and

2 \times

, respectively. The minimum execution gain obtained in

A d a p t i v e_E T L

is 1.13 and in

A d a p t i v e_C T L

is 1.12 with the threads grater than 4. From Figure 22, we observed that the number of aborts significantly increases in all the applications of STAMP benchmark when transactions are executed in more than 8 number of threads. Still, Adaptive has significantly less aborts compared to

L a z y

and

E a g e r

.

A d a p t i v e_E T L

has up to

16 \times

less aborts than

L a z y_E T L

and up to

13 \times

less aborts than

E a g e r

. Similarly,

A d a p t i v e_C T L

has up to

2.5 \times

less aborts than

L a z y_C T L

and up to

170 \times

less aborts than

E a g e r

.

Results on STAMPEDE Benchmarks. Similar to micro and STAMP benchmarks, Adaptive has better performance compared to

L a z y

and

E a g e r

in STAMPEDE benchmarks, for both execution time and number of aborts (Figure 23 and Figure 24). For execution time,

A d a p t i v e_E T L

performed up to 1.72× better than

L a z y_E T L

and

A d a p t i v e_C T L

performed up to 1.54× better than

L a z y_C T L

. Compared to

E a g e r

,

A d a p t i v e_E T L

performed up to 1.68× better and

A d a p t i v e_C T L

performed up to 1.91× better. The minimum execution gain obtained in

A d a p t i v e_E T L

is 1.14 and in

A d a p t i v e_C T L

is 1.12 with the threads greater than 4. For number of aborts,

A d a p t i v e_E T L

performed up to 4.1× better than

L a z y_E T L

and

A d a p t i v e_C T L

performed up to 72× better than

L a z y_C T L

. Compared to

E a g e r

,

A d a p t i v e_E T L

performed up to 10× better and

A d a p t i v e_C T L

performed up to 124× better.

In all the benchmarks, the minimum execution time gain for Adaptive ranges between 1 and 1.16 when running with threads up to 4 numbers. It is interesting to mention here that the Adaptive versioning technique outperforms both eager and lazy versioning for most of the applications in all the benchmark suites. This is mainly due to the decrease in number of aborts and the better time barrier design where non-conflicting transactions can execute and commit in parallel. Furthermore, the delay due to writing data in persistent log is not the concern in non-persistent TMs which also helps in reducing the total execution time.

Further Results. The results in Figure 19, Figure 20, Figure 21, Figure 22, Figure 23 and Figure 24 only considered optimized Adaptive w.r.t. better time barrier. We also performed experiments for Adaptive using both, better time barrier and better switching mechanism. We varied the switching interval threshold (

S W_I N T

) from 2 up to 10. The results indicate that instead of switching versioning immediately, using the better switch mechanism increases the performance. However, for

S W_I N T > 2

, the performance gradually reduces and becomes worse while reaching

S W_I N T = 10

. Figure 25 and Figure 26 show the execution time and total number of aborts, respectively for STAMP benchmarks when executed with both better time barrier and better switch mechanism (

S W_I N T = 2

). The improvement is up to 1.09× compared to Adaptive with better time barrier. Alongwith decreasing the total number of aborts, the better switch mechanism decreases the total number of switches between the versioning methods which helps to get the improvement on execution time. Figure 27 illustrates the reduction of total number of switches using better switch mechanism for STAMP benchmarks. The experiments on micro-benchmarks and STAMPEDE showed similar results.

The experiments so far use

T h r e s h o l d_{E a g e r} = \frac{1}{2}

and

T h r e s h o l d_{L a z y} = 2

as computed in Section 4. It is natural to ask whether these are the ideal threshold values. Therefore, for

T h r e s h o l d_{E a g e r}

, we used

\frac{1}{4}

and

\frac{3}{4}

, whereas for

T h r e s h o l d_{L a z y}

, we used 1 and 3. We performed experiments by using two different combinations of

T h r e s h o l d_{E a g e r}

and

T h r e s h o l d_{L a z y}

, (

\frac{1}{4}

, 1) and (

\frac{3}{4}

, 3). We noticed the increase in both execution time and number of aborts in all the benchmarks for both the combinations. This suggests that the threshold values computed in Section 4 are appropriate.

The results reported in Figure 19, Figure 20, Figure 21, Figure 22, Figure 23, Figure 24, Figure 25, Figure 26 and Figure 27 use suicide as a contention management technique. We were interested to see whether other strategies perform better than suicide. Therefore, we performed experiments using 4 different contention management techniques suicide, delay, back-off, and kill for the comparison. The execution time is shown in Figure 28 and the number of aborts is shown in Figure 29 for

A d a p t i v e_E T L

in STAMP benchmarks. The results showed not significant change on performance in some of the benchmarks, while in the rest, the selection of contention management technique affected the performance. For example, genome and intruder performed better with suicide whereas, kmeans performed better with back-off. In overall, suicide performed better than the rest in most of the benchmarks.

Finally, we performed experiments starting the execution initially using eager and lazy versioning. We observed that the initial selection of versioning does not affect performance significantly in both micro and complex benchmarks except intruder and kmeans from STAMP in which Adaptive performed better when starting with

E a g e r

than

L a z y

for upto 4 threads. This is mainly because transactions have almost constant abort rate and versioning method change is not necessary.

7. Concluding Remarks

Transactional memory has been receiving much attention from both academia and industry. One of the most challenging issues is how to ensure consistency of the shared data through speculative execution. Eager and lazy versioning have been used individually to support speculative execution in existing TM systems. However, whether to use eager or lazy versioning is better is not clear and previous studies contradict on the recommendations. In this article, we have presented an adaptive framework that dynamically switches between eager and lazy versioning at runtime through appropriate transaction abort/commit data collected at runtime, obviating the need of a priori knowledge on the workload and contention scenario to pick the appropriate versioning method for better performance. Our framework is quite simple and applicable in both persistent and non-persistent TM systems. The framework achieves significantly better performance in terms of execution time and number of aborts for both persistent and non-persistent memories compared to eager and lazy versioning running individually in 5 micro-benchmarks and 8 applications from STAMP and STAMPEDE suites. In persistent TM systems, the adaptive framework achieved performance improvements as much as 1.5× for execution time and as much as 240× for number of aborts, whereas in non-persistent TM systems, it achieved performance improvements as much as 6.3× for execution time and as much as 170× for number of aborts. We believe that our results and techniques will be helpful in choosing proper versioning for TM systems.

For the future work, it will be interesting to see whether there is a better technique on making decision on when to switch between eager and lazy versioning and how to minimize the time gap of switching from one versioning method to another. It will also be interesting to run experiments on the real persistent memory such as Optane DC persistent memory [47] and provide the comparison against more state-of-the-art STM, Durable STM, and HTM implementations.

Author Contributions

Conceptualization, G.S.; methodology, P.P. and G.S.; software, P.P.; validation, G.S.; formal analysis, P.P. and G.S.; investigation, P.P. and G.S.; resources, G.S.; writing—original draft preparation, P.P.; writing—review and editing, G.S.; visualization, P.P.; supervision, G.S.; project administration, G.S.; funding acquisition, G.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Science Foundation grant number CCF-1936450.

Data Availability Statement

Not applicable.

Acknowledgments

The authors thank the anonymous reviewers for their valuable comments.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Herlihy, M.; Moss, J.E.B. Transactional Memory: Architectural Support for Lock-free Data Structures. In Proceedings of the ISCA, San Diego, CA, USA, 16–19 May 1993; pp. 289–300. [Google Scholar]
Shavit, N.; Touitou, D. Software Transactional Memory. Distrib. Comput. 1997, 10, 99–116. [Google Scholar] [CrossRef]
Intel. 2012. Available online: http://software.intel.com/en-us/blogs/2012/02/07/transactional-synchronization-in-haswell (accessed on 4 February 2021).
Haring, R.; Ohmacht, M.; Fox, T.; Gschwind, M.; Satterfield, D.; Sugavanam, K.; Coteus, P.; Heidelberger, P.; Blumrich, M.; Wisniewski, R.; et al. The IBM Blue Gene/Q Compute Chip. IEEE Micro 2012, 32, 48–60. [Google Scholar] [CrossRef]
Nakaike, T.; Odaira, R.; Gaudet, M.; Michael, M.M.; Tomari, H. Quantitative comparison of hardware transactional memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8. In Proceedings of the ISCA, Portland, Oregon, 13–17 June 2015; pp. 144–157. [Google Scholar] [CrossRef]
Cain, H.W.; Michael, M.M.; Frey, B.; May, C.; Williams, D.; Le, H.Q. Robust architectural support for transactional memory in the power architecture. In Proceedings of the ISCA, Tel Aviv, Israel, 23–27 June 2013; pp. 225–236. [Google Scholar] [CrossRef]
Bocchino, R.L.; Adve, V.S.; Chamberlain, B.L. Software transactional memory for large scale clusters. In Proceedings of the PPoPP, Salt Lake City, UT, USA, 20–23 February 2008; pp. 247–258. [Google Scholar]
Fung, W.W.L.; Singh, I.; Brownsword, A.; Aamodt, T.M. Hardware transactional memory for GPU architectures. In Proceedings of the MICRO, Porto Alegre, Brazil, 3–7 December 2011; pp. 296–307. [Google Scholar]
Manassiev, K.; Mihailescu, M.; Amza, C. Exploiting Distributed Version Concurrency in a Transactional Memory Cluster. In Proceedings of the PPoPP, Manhattan, NY, USA, 29–31 March 2006; pp. 198–208. [Google Scholar]
Ananian, C.S.; Asanovic, K.; Kuszmaul, B.C.; Leiserson, C.E.; Lie, S. Unbounded Transactional Memory. In Proceedings of the HPCA, San Francisco, CA, USA, 12–16 February 2005; pp. 316–327. [Google Scholar]
Hammond, L.; Wong, V.; Chen, M.; Carlstrom, B.D.; Davis, J.D.; Hertzberg, B.; Prabhu, M.K.; Wijaya, H.; Kozyrakis, C.; Olukotun, K. Transactional Memory Coherence and Consistency. SIGARCH Comput. Archit. News 2004, 32, 102. [Google Scholar] [CrossRef] [Green Version]
Moore, K.E. LogTM: Log-Based Transactional Memory. In Proceedings of the HPCA, Austin, TA, USA, 11–15 February 2006; pp. 258–269. [Google Scholar]
Rajwar, R.; Herlihy, M.; Lai, K. Virtualizing Transactional Memory. In Proceedings of the ISCA; IEEE Computer Society: Washington, DC, USA, 2005; pp. 494–505. [Google Scholar] [CrossRef] [Green Version]
RSTM. Available online: http://www.cs.rochester.edu/research/synchronization/rstm/index.shtml (accessed on 14 February 2019).
Dalessandro, L.; Spear, M.F.; Scott, M.L. NOrec: Streamlining STM by Abolishing Ownership Records. In Proceedings of the PPOPP, Bangalore, India, 9–14 January 2010; pp. 67–78. [Google Scholar] [CrossRef]
Dragojevic, A.; Guerraoui, R.; Kapalka, M. Stretching Transactional Memory. In Proceedings of the PLDI, Dublin, Ireland, 15–20 June 2009; pp. 155–165. [Google Scholar] [CrossRef] [Green Version]
Zhao, J.; Li, S.; Yoon, D.H.; Xie, Y.; Jouppi, N.P. Kiln: Closing the Performance Gap Between Systems with and without Persistence Support. In Proceedings of the MICRO, Shanghai, China, 22–26 April 2013; pp. 421–432. [Google Scholar]
Coburn, J.; Caulfield, A.M.; Akel, A.; Grupp, L.M.; Gupta, R.K.; Jhala, R.; Swanson, S. NV-Heaps: Making Persistent Objects Fast and Safe with Next-generation, Non-volatile Memories. In Proceedings of the ASPLOS, Newport Beach, CA, USA, 5–11 May 2011; pp. 105–118. [Google Scholar]
Volos, H.; Tack, A.J.; Swift, M.M. Mnemosyne: Lightweight Persistent Memory. In Proceedings of the ASPLOS, Newport Beach, CA, USA, 5–11 May 2011; pp. 91–104. [Google Scholar]
Narayanan, D.; Hodson, O. Whole-system Persistence. In Proceedings of the ASPLOS, London, UK, 3–7 March 2012; pp. 401–410. [Google Scholar]
Liu, M.; Zhang, M.; Chen, K.; Qian, X.; Wu, Y.; Zheng, W.; Ren, J. DudeTM: Building Durable Transactions with Decoupling for Persistent Memory. In Proceedings of the ASPLOS, Xi’an, China, 8–12 April 2017; pp. 329–343. [Google Scholar]
Wan, H.; Lu, Y.; Xu, Y.; Shu, J. Empirical study of redo and undo logging in persistent memory. In Proceedings of the NVMSA, Deagu, Korea, 17–19 August 2016; pp. 1–6. [Google Scholar]
Minh, C.C.; Chung, J.; Kozyrakis, C.; Olukotun, K. STAMP: Stanford Transactional Applications for Multi-Processing. In Proceedings of the IISWC, Seattle, WA, USA, 14–16 September 2008; pp. 35–46. [Google Scholar]
Poudel, P.; Sharma, G. An Adaptive Logging Framework for Persistent Memories. In Proceedings of the SSS, Tokyo, Japan, 4–7 November 2018; pp. 32–49. [Google Scholar] [CrossRef]
Poudel, P.; Sharma, G. Adaptive Versioning in Transactional Memories. In Proceedings of the SSS, Pisa, Italy, 22–25 October 2019; pp. 277–295. [Google Scholar] [CrossRef]
Felber, P.; Fetzer, C.; Marlier, P.; Riegel, T. Time-Based Software Transactional Memory. IEEE Trans. Parallel Distrib. Syst. 2010, 21, 1793–1807. [Google Scholar] [CrossRef] [Green Version]
Felber, P.; Fetzer, C.; Riegel, T. Dynamic performance tuning of word-based software transactional memory. In Proceedings of the PPOPP, Salt Lake City, UT, USA, 20–23 February 2008; pp. 237–246. [Google Scholar]
Herlihy, M.; Luchangco, V.; Moir, M.; Scherer, W.N., III. Software Transactional Memory for Dynamic-sized Data Structures. In Proceedings of the PODC, Boston, MA, USA, 13–16 July 2003; pp. 92–101.
Nguyen, D.; Pingali, K. What Scalable Programs Need from Transactional Memory. In Proceedings of the ASPLOS, Xi’an, China, 8–12 April 2017; pp. 105–118. [Google Scholar]
Liu, H.; Chen, D.; Jin, H.; Liao, X.; He, B.; Hu, K.; Zhang, Y. A Survey of Non-Volatile Main Memory Technologies: State-of-the-Arts, Practices, and Future Directions. J. Comput. Sci. Technol. 2021, 36, 4–32. [Google Scholar] [CrossRef]
Asadinia, M.; Sarbazi-Azad, H. Chapter One—Introduction to non-volatile memory technologies. In Durable Phase-Change Memory Architectures; Asadinia, M., Sarbazi-Azad, H., Eds.; Advances in Computers; Elsevier: Amsterdam, The Netherlands, 2020; Volume 118, pp. 1–13. [Google Scholar]
Damron, P.; Fedorova, A.; Lev, Y.; Luchangco, V.; Moir, M.; Nussbaum, D. Hybrid transactional memory. In Proceedings of the ASPLOS, San Jose, CA, USA, 21–25 October 2006; pp. 336–346. [Google Scholar] [CrossRef]
Kumar, S.; Chu, M.; Hughes, C.J.; Kundu, P.; Nguyen, A.D. Hybrid transactional memory. In Proceedings of the PPOPP, Manhattan, NY, USA, 29–31 March 2006; pp. 209–220. [Google Scholar]
Dalessandro, L.; Carouge, F.; White, S.; Lev, Y.; Moir, M.; Scott, M.L.; Spear, M.F. Hybrid NOrec: A case study in the effectiveness of best effort hardware transactional memory. In Proceedings of the ASPLOS, Newport Beach, CA, USA, 5–11 May 2011; pp. 39–52. [Google Scholar]
Alistarh, D.; Kopinsky, J.; Kuznetsov, P.; Ravi, S.; Shavit, N. Inherent limitations of hybrid transactional memory. Distrib. Comput. 2018, 31, 167–185. [Google Scholar] [CrossRef] [Green Version]
Ruan, W.; Spear, M.F. Hybrid Transactional Memory Revisited. In Proceedings of the DISC, Tokyo, Japan, 5–9 October 2015; pp. 215–231. [Google Scholar]
Riegel, T.; Marlier, P.; Nowack, M.; Felber, P.; Fetzer, C. Optimizing hybrid transactional memory: The importance of nonspeculative operations. In Proceedings of the SPAA, San Jose, CA, USA, 4–6 June 2011; pp. 53–64. [Google Scholar]
Lev, Y.; Moir, M.; Nussbaum, D. PhTM: Phased transactional memory. In Proceedings of the Workshop on Transactional Computing (Transact), Portland, Oregon, 16 August 2007. [Google Scholar]
De Carvalho, J.P.L.; Araujo, G.; Baldassin, A. Revisiting phased transactional memory. In Proceedings of the ICS, Florence, Italy, 12–15 September 2017; pp. 25:1–25:10. [Google Scholar] [CrossRef]
De Carvalho, J.P.L.; Araujo, G.; Baldassin, A. The Case for Phase-Based Transactional Memory. IEEE Trans. Parallel Distrib. Syst. 2019, 30, 459–472. [Google Scholar] [CrossRef]
The Persistent Memory Development Kit (PMDK). Available online: https://github.com/pmem/pmdk/ (accessed on 14 February 2019).
Dulloor, S.R.; Kumar, S.; Keshavamurthy, A.; Lantz, P.; Reddy, D.; Sankaran, R.; Jackson, J. System Software for Persistent Memory. In Proceedings of the EuroSys, Amsterdam, The Netherlands, 14–16 April 2014; pp. 15:1–15:15. [Google Scholar]
Avni, H.; Levy, E.; Mendelson, A. Hardware Transactions in Nonvolatile Memory. In Proceedings of the DISC, Tokyo, Japan, 5–9 October 2015; pp. 617–630. [Google Scholar]
Genç, K.; Bond, M.D.; Xu, G.H. Crafty: Efficient, HTM-Compatible Persistent Transactions. In Proceedings of the PLDI; Association for Computing Machinery: New York, NY, USA, 2020; pp. 59–74. [Google Scholar] [CrossRef]
Joshi, A.; Nagarajan, V.; Cintra, M.; Viglas, S. DHTM: Durable Hardware Transactional Memory. In Proceedings of the ISCA; IEEE Press: New York City, NY, USA, 2018; pp. 452–465. [Google Scholar] [CrossRef] [Green Version]
Castro, D.; Baldassin, A.; Barreto, J.; Romano, P. SPHT: Scalable Persistent Hardware Transactions. In Proceedings of the FAST, Santa Clara, CA, USA, 22–25 February 2021. [Google Scholar]
Optane DC Persistent Memory. Available online: https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html (accessed on 25 February 2021).
Baldassin, A.; Murari, R.; de Carvalho, J.P.L.; Araujo, G.; Castro, D.; Barreto, J.; Romano, P. NV-PhTM: An Efficient Phase-Based Transactional System for Non-volatile Memory. In Proceedings of the Euro-Par; Malawski, M., Rzadca, K., Eds.; Springer: New York City, NY, USA, 2020; Volume 12247, pp. 477–492. [Google Scholar]
Krishnan, R.M.; Kim, J.; Mathew, A.; Fu, X.; Demeri, A.; Min, C.; Kannan, S. Durable Transactional Memory Can Scale with Timestone. In Proceedings of the ASPLOS; Association for Computing Machinery: New York, NY, USA, 2020; pp. 335–349. [Google Scholar] [CrossRef] [Green Version]
Gu, J.; Yu, Q.; Wang, X.; Wang, Z.; Zang, B.; Guan, H.; Chen, H. Pisces: A Scalable and Efficient Persistent Transactional Memory. In Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference, Renton, WA, USA, 10–12 July 2019; pp. 913–928. [Google Scholar]
Kolli, A.; Pelley, S.; Saidi, A.G.; Chen, P.M.; Wenisch, T.F. High-Performance Transactions for Persistent Memories. In Proceedings of the ASPLOS, Atlanta, GA, USA, 2–3 April 2016; pp. 399–411. [Google Scholar] [CrossRef] [Green Version]
Lu, Y.; Shu, J.; Sun, L.; Mutlu, O. Loose-Ordering Consistency for persistent memory. In Proceedings of the ICCD, Seoul, Korea, 19–22 Ocotober 2014; pp. 216–223. [Google Scholar] [CrossRef] [Green Version]
Memaripour, A.; Badam, A.; Phanishayee, A.; Zhou, Y.; Alagappan, R.; Strauss, K.; Swanson, S. Atomic In-Place Updates for Non-Volatile Main Memories with Kamino-Tx. In Proceedings of the Twelfth European Conference on Computer Systems; Association for Computing Machinery: New York, NY, USA, 2017; pp. 499–512. [Google Scholar] [CrossRef]
Izraelevitz, J.; Kelly, T.; Kolli, A. Failure-Atomic Persistent Memory Updates via JUSTDO Logging. In Proceedings of the ASPLOS, Atlanta, GA, USA, 2–3 April 2016; pp. 427–442. [Google Scholar]
Coburn, J.; Bunker, T.; Schwarz, M.; Gupta, R.; Swanson, S. From ARIES to MARS: Transaction Support for Next-generation, Solid-state Drives. In Proceedings of the SOSP, Farmington, PA, USA, 3–6 November 2013; pp. 197–212. [Google Scholar]
Shin, S.; Tirukkovalluri, S.K.; Tuck, J.; Solihin, Y. Proteus: A Flexible and Fast Software Supported Hardware Logging Approach for NVM. In Proceedings of the MICRO, Boston, MA, USA, 14–18 Ocotober 2017; pp. 178–190. [Google Scholar]
Chatzistergiou, A.; Cintra, M.; Viglas, S. REWIND: Recovery Write-Ahead System for In-Memory Non-Volatile Data-Structures. PVLDB 2015, 8, 497–508. [Google Scholar]
Lu, Y.; Shu, J.; Sun, L. Blurred Persistence: Efficient Transactions in Persistent Memory. Trans. Storage 2016, 12, 3:1–3:29. [Google Scholar] [CrossRef]
Keidar, I.; Perelman, D. Multi-versioning in Transactional Memory. In Transactional Memory. Foundations, Algorithms, Tools, and Applications: COST Action Euro-TM IC1001; Guerraoui, R., Romano, P., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 150–165. [Google Scholar] [CrossRef]
Sharma, G.; Busch, C. A Competitive Analysis for Balanced Transactional Memory Workloads. Algorithmica 2012, 63, 296–322. [Google Scholar] [CrossRef] [Green Version]
Sharma, G.; Busch, C. Window-Based Greedy Contention Management for Transactional Memory: Theory and Practice. Distrib. Comput. 2012, 25, 225–248. [Google Scholar] [CrossRef]
Attiya, H.; Gramoli, V.; Milani, A. A Provably Starvation-Free Distributed Directory Protocol. In Proceedings of the SSS, New York City, NY, USA, 20–22 September 2010; pp. 405–419. [Google Scholar]
Guerraoui, R.; Herlihy, M.; Pochon, B. Toward a Theory of Transactional Contention Managers. In Proceedings of the PODC, Las Vegas, NA, USA, 17–20 July 2005; pp. 258–264. [Google Scholar]
Ansari, M.; Luján, M.; Kotselidis, C.; Jarvis, K.; Kirkham, C.C.; Watson, I. Steal-on-Abort: Improving Transactional Memory Performance through Dynamic Transaction Reordering. In Proceedings of the HiPEAC, Munich Germany, 2–4 June 2009; pp. 4–18. [Google Scholar]
III, W.N.S.; Scott, M.L. Advanced contention management for dynamic software transactional memory. In Proceedings of the PODC, Las Vegas, NA, USA, 17–20 July 2005; pp. 240–248. [Google Scholar]
Schneider, J.; Wattenhofer, R. Bounds on contention management algorithms. Theor. Comput. Sci. 2011, 412, 4151–4160. [Google Scholar] [CrossRef] [Green Version]
Attiya, H.; Epstein, L.; Shachnai, H.; Tamir, T. Transactional Contention Management as a Non-Clairvoyant Scheduling Problem. Algorithmica 2010, 57, 44–61. [Google Scholar] [CrossRef]
Attiya, H.; Milani, A. Transactional scheduling for read-dominated workloads. J. Parallel Distrib. Comput. 2012, 72, 1386–1396. [Google Scholar] [CrossRef] [Green Version]
Keidar, I.; Perelman, D. On Avoiding Spare Aborts in Transactional Memory. Theory Comput. Syst. 2015, 57, 261–285. [Google Scholar] [CrossRef] [Green Version]

Figure 1. An illustration of how a transaction

T_{x}

is executed in persistent memory using (a) eager versioning and (b) lazy versioning. Figure (a) depicts two kinds of operations in eager versioning. The first operation is to copy the data from original memory locations to a log area (called undo log) in the persistent main memory and the second is to copy the data back from the log area to the original memory locations, in case

T_{x}

aborts. If

T_{x}

commits, the data in the log area is simply discarded. Figure (b) depicts three kinds of operations in lazy versioning. The first operation is to copy the data from original memory locations to a log area (called redo log). Transaction

T_{x}

updates on this redo log. The second operation is to persist the redo log in persistent memory and the third operation is to copy the updated data back from redo log to the original memory locations. The second and third operations are required only in case

T_{x}

commits. If

T_{x}

aborts, the data in the redo log area in cache is simply discarded.

Figure 1. An illustration of how a transaction

T_{x}

is executed in persistent memory using (a) eager versioning and (b) lazy versioning. Figure (a) depicts two kinds of operations in eager versioning. The first operation is to copy the data from original memory locations to a log area (called undo log) in the persistent main memory and the second is to copy the data back from the log area to the original memory locations, in case

T_{x}

aborts. If

T_{x}

commits, the data in the log area is simply discarded. Figure (b) depicts three kinds of operations in lazy versioning. The first operation is to copy the data from original memory locations to a log area (called redo log). Transaction

T_{x}

updates on this redo log. The second operation is to persist the redo log in persistent memory and the third operation is to copy the updated data back from redo log to the original memory locations. The second and third operations are required only in case

T_{x}

commits. If

T_{x}

aborts, the data in the redo log area in cache is simply discarded.

Figure 2. An illustration of how a transaction

T_{x}

is executed in non-persistent memory using (a) eager versioning and (b) lazy versioning. Figure (a) depicts two kinds of operations in eager versioning. The first operation is to copy the data from original memory locations to a log area (called undo log) in cache. Transaction

T_{x}

then performs in-place updates to the original memory locations. Now, the second operation is to copy the data back from the log area to the original memory locations, in case

T_{x}

aborts. If

T_{x}

commits, the data in the log area is simply discarded. Figure (b) depicts two kinds of operations in lazy versioning. The first operation is to copy the data from original memory locations to a log area (called redo log) in cache. Transaction

T_{x}

updates on this redo log. Then, the second operation is to copy updated data from the log area to the original memory locations, in case

T_{x}

commits. If

T_{x}

aborts, the data in the log area is simply discarded.

Figure 2. An illustration of how a transaction

T_{x}

is executed in non-persistent memory using (a) eager versioning and (b) lazy versioning. Figure (a) depicts two kinds of operations in eager versioning. The first operation is to copy the data from original memory locations to a log area (called undo log) in cache. Transaction

T_{x}

then performs in-place updates to the original memory locations. Now, the second operation is to copy the data back from the log area to the original memory locations, in case

T_{x}

aborts. If

T_{x}

commits, the data in the log area is simply discarded. Figure (b) depicts two kinds of operations in lazy versioning. The first operation is to copy the data from original memory locations to a log area (called redo log) in cache. Transaction

T_{x}

updates on this redo log. Then, the second operation is to copy updated data from the log area to the original memory locations, in case

T_{x}

commits. If

T_{x}

aborts, the data in the log area is simply discarded.

Figure 3. An illustration of performance discrepancies in execution time (left) and number of aborts (right) in

g e n o m e

and

k m e a n s

benchmarks using eager and lazy versioning in non-persistent TM systems.

Figure 3. An illustration of performance discrepancies in execution time (left) and number of aborts (right) in

g e n o m e

and

k m e a n s

benchmarks using eager and lazy versioning in non-persistent TM systems.

Figure 4. An illustration of (a) eager, (b) lazy, and (c) basic adaptive versioning. The time gap

σ_{*}

while switching from eager (lazy) to lazy (eager) is to let finish executing in-flight transactions. This helps in avoiding potential data versioning inconsistencies.

Figure 4. An illustration of (a) eager, (b) lazy, and (c) basic adaptive versioning. The time gap

σ_{*}

while switching from eager (lazy) to lazy (eager) is to let finish executing in-flight transactions. This helps in avoiding potential data versioning inconsistencies.

Figure 5. An illustration of the better time barrier design. The interval

δ_{*}

between

E a g e r

and

L a z y

represents the time taken by in-flight transactions to finish their executions after versioning method is switched. The new transaction that do not conflict with transactions using previous versioning can execute concurrently with in-flight transactions.

Figure 5. An illustration of the better time barrier design. The interval

δ_{*}

between

E a g e r

and

L a z y

represents the time taken by in-flight transactions to finish their executions after versioning method is switched. The new transaction that do not conflict with transactions using previous versioning can execute concurrently with in-flight transactions.

Figure 6. An illustration of the better switching mechanism.

λ_{*}

represents the time interval in which versioning is not switched.

δ_{*}

resembles better time barrier of Figure 5.

Figure 6. An illustration of the better switching mechanism.

λ_{*}

represents the time interval in which versioning is not switched.

δ_{*}

resembles better time barrier of Figure 5.

Figure 7. Data movements in micro-benchmarks and bayes from STAMP executing in persistent TM.

Figure 8. Total number of stores to the PM in micro-benchmarks and bayes from STAMP.

Figure 9. Number of aborts in micro-benchmarks and bayes from STAMP in persistent TM.

Figure 10. Data movements in STAMP benchmarks executing in persistent TM.

Figure 11. Total number of stores to the PM in STAMP benchmarks.

Figure 12. Total number of aborts in STAMP benchmarks in persistent TM.

Figure 13. Data movements in STAMPEDE benchmarks executing in persistent TM.

Figure 14. Total number of stores to the PM in STAMPEDE benchmarks.

Figure 15. Total number of aborts in STAMPEDE benchmarks in persistent TM.

Figure 16. Execution time in micro-benchmarks and bayes from STAMP executing in persistent TM.

Figure 17. Execution time in STAMP benchmarks executing in persistent TM.

Figure 18. Execution time in STAMPEDE benchmarks executing in persistent TM.

Figure 19. Execution time in micro-benchmarks and bayes from STAMP using better time barrier in non-persistent TM.

Figure 20. Number of aborts in micro-benchmarks and bayes from STAMP using better time barrier in non-persistent TM.

Figure 21. Execution time in STAMP benchmarks using better time barrier in non-persistent TM.

Figure 22. Number of aborts in STAMP benchmarks using better time barrier in non-persistent TM.

Figure 23. Execution time in STAMPEDE benchmarks using better time barrier in non-persistent TM.

Figure 24. Number of aborts in STAMPEDE benchmarks using better time barrier in non-persistent TM.

Figure 25. Execution time in STAMP benchmarks using better barrier and better switch in non-persistent TM.

Figure 26. Aborts in STAMP benchmarks using better barrier and better switch in non-persistent TM.

Figure 27. Decrease in total number of switches between versioning methods using better switch mechanism in non-persistent TM.

Figure 28. Execution time in STAMP benchmarks for

A d a p t i v e_E T L

for four different contention management techniques in non-persistent TM.

Figure 28. Execution time in STAMP benchmarks for

A d a p t i v e_E T L

for four different contention management techniques in non-persistent TM.

Figure 29. Aborts in STAMP benchmarks for

A d a p t i v e_E T L

for four different contention management techniques in non-persistent TM.

Figure 29. Aborts in STAMP benchmarks for

A d a p t i v e_E T L

for four different contention management techniques in non-persistent TM.

Table 1. A comparison of eager and lazy versioning in persistent and non-persistent TM systems (the row in bold text is only for persistent TM systems).

Constraint	Eager Versioning	Lazy Versioning
Memory Update	performs in-place memory update	updates are written to memory at the commit time
	allows to read most recent data	reads are intercepted and redirected
Reading Overhead	directly from in-place	to the redo log area to read recent
	memory	uncommitted data
Persist Ordering	requires to ensure persist ordering for	requires only one persist ordering
(only for persistent TMs)	each memory write in a transaction [19]	for each transaction [19,21]
	transaction aborts are costly as the	transaction commits are costly as the
Data Movement	memory updates need to be rolled back	updates need to be written back to
	to consistent state using undo log	original memory using redo log

Table 2. Versioning and conflict detection mechanisms used in some non-persistent TM systems.

		Versioning
		Lazy	Eager
Conflict	Lazy	TCC [11], Norec [15], RSTM [14], SwissTM [16]	None
	Eager	UTM [10], VTM [13], RSTM [14], SwissTM [16]	UTM [10], LogTM [12], RSTM [14]

Table 3. Summary of five micro-benchmarks and eight complex benchmarks in STAMP (and STAMPEDE) benchmark suite, including some of their properties.

Benchmark	Application	Description	RW Set	Contention
bank	banking	implements banking transactions	small	low
red black tree	BST	balances the nodes of binary tree	small	low
hash set	data structure	stores information using hashing	medium	low
linked list	data structure	maintains linear collection of data	medium	medium
skip list	data structure	maintains linked hierarchy of subsequences	medium	low
bayes	machine learning	learns a structure of a Bayesian network	large	high
genome	bioinformatics	performs gene sequencing	medium	low
intruder	security	detects network intrusions	medium	high
kmeans	data mining	implements k-means clustering	small	low
labyrinth	engineering	routes paths in maze	large	high
ssca2	scientific	creates efficient graph representation	small	low
vacation	OLTP	emulates travel reservation system	medium	low/medium
yada	scientific	refines a Delaunay mesh	large	medium

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Poudel, P.; Sharma, G. Adaptive Versioning in Transactional Memory Systems. Algorithms 2021, 14, 171. https://doi.org/10.3390/a14060171

AMA Style

Poudel P, Sharma G. Adaptive Versioning in Transactional Memory Systems. Algorithms. 2021; 14(6):171. https://doi.org/10.3390/a14060171

Chicago/Turabian Style

Poudel, Pavan, and Gokarna Sharma. 2021. "Adaptive Versioning in Transactional Memory Systems" Algorithms 14, no. 6: 171. https://doi.org/10.3390/a14060171

APA Style

Poudel, P., & Sharma, G. (2021). Adaptive Versioning in Transactional Memory Systems. Algorithms, 14(6), 171. https://doi.org/10.3390/a14060171

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Versioning in Transactional Memory Systems †

Abstract

1. Introduction

1.1. Contributions

1.2. Organization

2. Related Work

2.1. Non-Persistent (Volatile) TM Systems

2.2. Persistent TM Systems

3. Model and Preliminaries

3.1. Persistent Memory Model

3.2. Non-Persistent Memory Model

3.3. Eager Versioning

3.4. Lazy Versioning

3.5. Conflict Detection and Resolution

3.6. Supporting Durable Transactions in TinySTM

4. Basic Adaptive Versioning

4.1. High Level Overview

4.2. Adaptive for Persistent Memories

Computation of Switching Threshold in Persistent Memories

4.3. Computation of Total Number of Stores to Persistent Memories

4.4. Adaptive for Non-Persistent Transactional Memories

Computing Switching Thresholds T h r e s h o l d E a g e r and T h r e s h o l d L a z y

4.5. Contention Management

4.6. Time Barrier Requirement and Design

5. Optimizations on Basic Adaptive Versioning

5.1. Limitation of Basic Adaptive in Non-Persistent Memories

5.2. Better Time Barrier Design

5.3. Better Switching Mechanism Design

6. Experimental Evaluation

6.1. Experimental Setup

6.2. Persistent Memory Emulation

6.3. Benchmarks

6.4. Evaluation of Adaptive in Persistent Memories

6.5. Evaluation of Adaptive in Non-Persistent Memories

7. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Adaptive Versioning in Transactional Memory Systems^†

Computing Switching Thresholds $T h r e s h o l d_{E a g e r}$ and $T h r e s h o l d_{L a z y}$